turboFei edited a comment on issue #26086: [SPARK-29302] Make the file name of 
a task for dynamic partition overwrite be unique
URL: https://github.com/apache/spark/pull/26086#issuecomment-546703904
 
 
   @srowen  Thanks for your reply.
   I think the risks are:
   - For dynamicPartitionOverwrite,  before this PR, a task's filename would 
conflict with its speculation name.
   - For the case that non-dynamicPartitionOverwrite and 
non-FileoutputCommitter,  if a task's filename if not same with its 
attempt-task/speculation task, if a task abort without clean up output 
gracefully, it would cause duplicate result. So, in this PR, I only name a task 
file with taskId and attemptId only for dynamicPartitionOverwrite.
   
   But for the above non-dynamicPartitionOverwrite and non-FileoutputCommitter 
case, a task's filename also would conflict with its speculation task.
   
   As shown below,  before this PR,  there are risks for 
dynamicPartitionOverwrite and non-FileOutputCommitter.
   And this PR fix the issue for dynamicPartitionOverwrite case.
   In fact,  there are rarely case for non-FileOutputCommitter.
   
https://github.com/apache/spark/blob/077fb99a26a9e92104503fade25c0a095fec5e5d/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L104-L125

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to