cloud-fan commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r652345736



##########
File path: 
core/src/main/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.java
##########
@@ -68,8 +72,11 @@
    *    for that partition id.
    * <p>
    * 2) An optional metadata blob that can be used by shuffle readers.
+   *
+   * @param checksums The checksum values for each partition if shuffle 
checksum enabled.
+   *                  Otherwise, it's empty.
    */
-  MapOutputCommitMessage commitAllPartitions() throws IOException;
+  MapOutputCommitMessage commitAllPartitions(long[] checksums) throws 
IOException;

Review comment:
       TBH I don't think the current shuffle API provides enough abstraction to 
do checksum. I'm OK with this change as the shuffle API is still private, but 
we should revisit the shuffle API later, so that checksum can be done at the 
shuffle implementation side.
   
   The current issue I see is, Spark writes local spill files and then asks the 
shuffle implementation to "transfer" the spill files. Then Spark has to do 
checksum by itself during spill file writing, to reduce the perf overhead.
   
   We can discuss it later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to