rajarshisarkar edited a comment on issue #2991:
URL: https://github.com/apache/iceberg/issues/2991#issuecomment-926014364
@alex-shchetkov @fcvr1010 @jackye1995 I was able to reproduce the issue
(Configurations: EMR 6.2.0, Iceberg 0.11.1, Master node: 1 instance, Core node:
10 instances, Spark 3.0.1). When `--conf
spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/driver` is not passed then
the `tmp` container folder looks something like this:
```
/mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0025/container_1632368478936_0025_02_000021/tmp
/mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0025/container_1632368478936_0025_02_000021/tmp/liblz4-java-4965470198489888535.so.lck
/mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0025/container_1632368478936_0025_02_000021/tmp/liblz4-java-4965470198489888535.so
```
However, I still got the issue even though the folder and temp files were
there:
```
System.getProperty("java.io.tmpdir"):
/mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0025/container_1632368478936_0025_02_000021/tmp
at
org.apache.iceberg.aws.s3.S3OutputFile.createOrOverwrite(S3OutputFile.java:61)
at
org.apache.iceberg.parquet.ParquetIO$ParquetOutputFile.createOrOverwrite(ParquetIO.java:153)
at
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:293)
at
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:259)
at
org.apache.iceberg.parquet.ParquetWriter.<init>(ParquetWriter.java:101)
at
org.apache.iceberg.parquet.Parquet$WriteBuilder.build(Parquet.java:250)
at
org.apache.iceberg.spark.source.SparkAppenderFactory.newAppender(SparkAppenderFactory.java:110)
at
org.apache.iceberg.spark.source.SparkAppenderFactory.newDataWriter(SparkAppenderFactory.java:139)
at
org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.newWriter(BaseTaskWriter.java:310)
at
org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.newWriter(BaseTaskWriter.java:303)
at
org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.openCurrent(BaseTaskWriter.java:271)
at
org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.<init>(BaseTaskWriter.java:233)
at
org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.<init>(BaseTaskWriter.java:223)
at
org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.<init>(BaseTaskWriter.java:305)
at
org.apache.iceberg.io.PartitionedWriter.write(PartitionedWriter.java:73)
at
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$7(WriteToDataSourceV2Exec.scala:441)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:477)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:385)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2026)
at
org.apache.iceberg.aws.s3.S3OutputStream.newStream(S3OutputStream.java:181)
at
org.apache.iceberg.aws.s3.S3OutputStream.<init>(S3OutputStream.java:115)
at
org.apache.iceberg.aws.s3.S3OutputFile.createOrOverwrite(S3OutputFile.java:58)
... 26 more
```
Also, the executor tasks running on the node where the driver is running do
not fail. I will continue the analysis.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]