Ryan Luo created SPARK-32742:
--------------------------------

             Summary: FileOutputCommitter warns "No Output found for attempt"
                 Key: SPARK-32742
                 URL: https://issues.apache.org/jira/browse/SPARK-32742
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 2.4.0
         Environment: Hadoop 2.6.0-cdh5.16.2

YARN(MR2 included)

 
            Reporter: Ryan Luo


Hi team,

This is my first time to report an issue here.

We submitted and ran the spark job on the cluster. 

We found that one of the parquet output partition is missing in the output 
directory. We checked the spark job log, all the tasks status are showing 
success. The output record size matches expected number.

However, we checked the container log, found that there was a warning says *No 
Output found for attempt_20200819094307_0003_m_000002_11*, which stopped moving 
the output from taskAttemptPath to output directory. As a result, we are 
missing some of the output rows.

Re-run the job helped to solve the issue, however the report is critical for 
us. It is appreciated if you can advise the cause for the issue.

 

Below are the container logs:

20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip 
cleanup _temporary folders under output directory:false, ignore cleanup 
failures: false

20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using user 
defined output committer class parquet.hadoop.ParquetOutputCommitter

20/08/19 09:44:57 INFO output.FileOutputCommitter: File Output Committer 
Algorithm version is 2

20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip 
cleanup _temporary folders under output directory:false, ignore cleanup 
failures: false

20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using 
output committer class parquet.hadoop.ParquetOutputCommitter

20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.370642 ms

20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 6.927118 ms

20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.004204 ms

20/08/19 09:44:57 INFO parquet.ParquetWriteSupport: Initialized Parquet 
WriteSupport with Catalyst schema:

..... (skipped)

{color:#FF0000}20/08/19 09:44:57 WARN output.FileOutputCommitter: No Output 
found for attempt_20200819094307_0003_m_000002_11{color}

{color:#FF0000}20/08/19 09:44:57 INFO mapred.SparkHadoopMapRedUtil: 
attempt_20200819094307_0003_m_000002_11: Committed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to