cloud-fan commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r652345736
########## File path: core/src/main/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.java ########## @@ -68,8 +72,11 @@ * for that partition id. * <p> * 2) An optional metadata blob that can be used by shuffle readers. + * + * @param checksums The checksum values for each partition if shuffle checksum enabled. + * Otherwise, it's empty. */ - MapOutputCommitMessage commitAllPartitions() throws IOException; + MapOutputCommitMessage commitAllPartitions(long[] checksums) throws IOException; Review comment: TBH I don't think the current shuffle API provides enough abstraction to do checksum. I'm OK with this change as the shuffle API is still private, but we should revisit the shuffle API later, so that checksum can be done at the shuffle implementation side. The current issue I see is, Spark writes local spill files and then asks the shuffle implementation to "transfer" the spill files. Then Spark has to do checksum by itself during spill file writing, to reduce the perf overhead. We can discuss it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org