zhouyifan279 commented on PR #41628:
URL: https://github.com/apache/spark/pull/41628#issuecomment-2100365610

   To eliminate data inconsistency issue, we should handle custom partitions in 
`HadoopMapReduceCommitProtocol.commitJob`  instead of writing to the final 
output path then moving partition dir to custom location:
   1.  Get all partitionPaths from 
`TaskCommitMessage.obj._2`(`TaskCommitMessage.obj._1` is empty as we do not 
have `customPartitionLocations` at this step)
   2. Use partitionPaths to get matchingPartitions, then get 
customPartitionLocations like what we do in this PR.
   3. Move partitionPaths to final location according to 
customPartitionLocations
   
   @jeanlyn @bowenliang123 @attilapiros what do you think ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to