[jira] [Commented] (SPARK-11328) Correctly propagate error message in the case of failures when writing parquet
[ https://issues.apache.org/jira/browse/SPARK-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034847#comment-15034847 ] Apache Spark commented on SPARK-11328: -- User 'nongli' has created a pull request for this issue: https://github.com/apache/spark/pull/10080 > Correctly propagate error message in the case of failures when writing parquet > -- > > Key: SPARK-11328 > URL: https://issues.apache.org/jira/browse/SPARK-11328 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai >Assignee: Nong Li >Priority: Critical > > When saving data to S3 (e.g. saving to parquet), if there is an error during > the query execution, the partial file generated by the failed task will be > uploaded to S3 and the retries of this task will throw file already exist > error. It is very confusing to users because they may think that file already > exist error is the error causing the job failure. They can only find the real > error in the spark ui (in the stage page). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11328) Correctly propagate error message in the case of failures when writing parquet
[ https://issues.apache.org/jira/browse/SPARK-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975361#comment-14975361 ] Yin Huai commented on SPARK-11328: -- The file already exists error was thrown from [this line | https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L237] when we try to create a record writer. > Correctly propagate error message in the case of failures when writing parquet > -- > > Key: SPARK-11328 > URL: https://issues.apache.org/jira/browse/SPARK-11328 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai > > When saving data to S3 (e.g. saving to parquet), if there is an error during > the query execution, the partial file generated by the failed task will be > uploaded to S3 and the retries of this task will throw file already exist > error. It is very confusing to users because they may think that file already > exist error is the error causing the job failure. They can only find the real > error in the spark ui (in the stage page). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11328) Correctly propagate error message in the case of failures when writing parquet
[ https://issues.apache.org/jira/browse/SPARK-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975556#comment-14975556 ] Yin Huai commented on SPARK-11328: -- [~nongli] Looks this issue is also related to DirectParquetOutputCommitter. Right now, the abortTask method is a no-op. > Correctly propagate error message in the case of failures when writing parquet > -- > > Key: SPARK-11328 > URL: https://issues.apache.org/jira/browse/SPARK-11328 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai > > When saving data to S3 (e.g. saving to parquet), if there is an error during > the query execution, the partial file generated by the failed task will be > uploaded to S3 and the retries of this task will throw file already exist > error. It is very confusing to users because they may think that file already > exist error is the error causing the job failure. They can only find the real > error in the spark ui (in the stage page). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org