[ https://issues.apache.org/jira/browse/MAPREDUCE-7378?focusedWorklogId=771117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771117 ]
ASF GitHub Bot logged work on MAPREDUCE-7378: --------------------------------------------- Author: ASF GitHub Bot Created on: 17/May/22 02:10 Start Date: 17/May/22 02:10 Worklog Time Spent: 10m Work Description: klaus-xiong commented on PR #4303: URL: https://github.com/apache/hadoop/pull/4303#issuecomment-1128325122 > -1 to this., or to any other changes to FileOutputCommitter which aren't critical bugs over the correct functioning of it against hdfs for a job with exclusive access to the table. sorry > > we aren't going to make changes to FileOutputCommitter. consider it stable and too critical to change. you will able to make changes to the Manifest committer of #4075 which should be safer. and the change there would be "only delete the job attempt dir (with the job unique id), not all of _temporary > > see the jira I am linking your JIRA to for past discussion and future options Thank you for your reply. I will consider fo the other perfect way to handle this problem. Once i find , i will cc to you, thanks again ^_^ Issue Time Tracking ------------------- Worklog Id: (was: 771117) Time Spent: 1h 40m (was: 1.5h) > An error occurred while concurrently writing to a path > ------------------------------------------------------ > > Key: MAPREDUCE-7378 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7378 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: jingpan xiong > Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When we use FileOutputCommitter as the base class of Job Committer for other > components, we may meet an concurrently writing problem. > Like HadoopMapReduceCommitProtocol in Spark, when there have multiple > application to write data in same path, they will commit job and task in the > "_temporary" dir. Once a Job finished ,it will delete the "_temporary" dir, > make the other jobs failed. > > error message: > {code:java} > // code placeholder > 21/11/04 19:01:21 ERROR Utils: Aborting task ExitCodeException exitCode=1: > chmod: cannot access > '/data/spark-examples/spark-warehouse/test/temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_000001 > > 4/dt=2021-11-03/hour=10/.part-00001-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc': > No such file or directory at > org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1307) at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1289) at > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:324) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:294) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:437) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175) at > org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74) > at > org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:329) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:36) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:150) > at > org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:290) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(FileFormatDataWriter.scala:357) > at > org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85) > at > org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:304) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) 21/11/04 19:01:21 WARN > FileOutputCommitter: Could not delete > file:/data/spark-examples/spark-warehouse/test/temporary/0/_temporary/attempt_202111041901182933014038999149736 > 0001_m_000001_4 > {code} > > The spark Issue is > [SPARK-37210](https://issues.apache.org/jira/browse/SPARK-37210) -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org