Ngone51 commented on a change in pull request #33034: URL: https://github.com/apache/spark/pull/33034#discussion_r661965448
########## File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java ########## @@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq( handler.addRpcRequest(requestId, callback); RpcChannelListener listener = new RpcChannelListener(requestId, callback); channel.writeAndFlush( - new MergedBlockMetaRequest(requestId, appId, shuffleId, reduceId)).addListener(listener); + new MergedBlockMetaRequest(requestId, appId, shuffleId, shuffleSequenceId, reduceId)).addListener(listener); Review comment: > Since the metadata about what's the latest shuffleSeqID for a previously merge finalized shuffle is already cleaned due to merge finalization, we would have the following options.... I prefer option 3. And we can clean up merged shuffle files at the time of https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L173-L181 whether clean it at executor side directly or send a message to ESS to ask clean. (I currently not sure which way is better.) And I agree to add the seq id at the fetching side as well (although an older fetch should never happen), which seems more consistent with the pushing side to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org