[GitHub] [spark] cloud-fan commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

GitBox Tue, 15 Jun 2021 21:44:43 -0700


cloud-fan commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r652345736




##########
File path: 
core/src/main/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.java
##########
@@ -68,8 +72,11 @@
    *    for that partition id.
    * <p>
    * 2) An optional metadata blob that can be used by shuffle readers.
+   *
+   * @param checksums The checksum values for each partition if shuffle 
checksum enabled.
+   *                  Otherwise, it's empty.
    */
-  MapOutputCommitMessage commitAllPartitions() throws IOException;
+  MapOutputCommitMessage commitAllPartitions(long[] checksums) throws 
IOException;

Review comment:
       TBH I don't think the current shuffle API provides enough abstraction to 
do checksum. I'm OK with this change as the shuffle API is still private, but 
we should revisit the shuffle API later, so that checksum can be done at the 
shuffle implementation side.
   
   The current issue I see is, Spark writes local spill files and then asks the 
shuffle implementation to "transfer" the spill files. Then Spark has to do 
checksum by itself during spill file writing, to reduce the perf overhead.
   
   We can discuss it later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

Reply via email to