[ 
https://issues.apache.org/jira/browse/SPARK-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Chambers updated SPARK-16234:
----------------------------------
    Description: 
given spark.speculative set to true, I'm running a large spark job with parquet 
and savemode overwrite.

Spark will speculatively try to create a task to deal with a straggler. 
However, doing this comes with risk because EVEN THOUGH savemode overwrite is 
selected, if the straggler completes before the original task or the original 
task completes before the straggler then the job will fail due to the file 
already existing.

java.io.IOException: 
/...some-file.../part-r-00049-401da178-3343-43a4-9c8d-277cc0173bf9.gz.parquet 
already exists

  was:
given spark.speculative set to true, I'm running a large spark job with parquet 
and savemode overwrite.

Spark will speculatively try to create a task to deal with this straggler. 
However, doing this comes with risk because EVEN THOUGH savemode overwrite is 
selected, if the straggler completes before the original task or the original 
task completes before the straggler then the job will fail due to the file 
already existing.

java.io.IOException: 
/...some-file.../part-r-00049-401da178-3343-43a4-9c8d-277cc0173bf9.gz.parquet 
already exists


> Speculative Task may not be able to overwrite file
> --------------------------------------------------
>
>                 Key: SPARK-16234
>                 URL: https://issues.apache.org/jira/browse/SPARK-16234
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>            Reporter: Bill Chambers
>
> given spark.speculative set to true, I'm running a large spark job with 
> parquet and savemode overwrite.
> Spark will speculatively try to create a task to deal with a straggler. 
> However, doing this comes with risk because EVEN THOUGH savemode overwrite is 
> selected, if the straggler completes before the original task or the original 
> task completes before the straggler then the job will fail due to the file 
> already existing.
> java.io.IOException: 
> /...some-file.../part-r-00049-401da178-3343-43a4-9c8d-277cc0173bf9.gz.parquet 
> already exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to