Ngone51 commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r661965448



##########
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
     handler.addRpcRequest(requestId, callback);
     RpcChannelListener listener = new RpcChannelListener(requestId, callback);
     channel.writeAndFlush(
-      new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+      new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
       > Since the metadata about what's the latest shuffleSeqID for a 
previously merge finalized shuffle is already cleaned due to merge 
finalization, we would have the following options....
   
   I prefer option 3. And we can clean up merged shuffle files at the time of
   
   
https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L173-L181
 
   
   whether clean it at executor side directly or send a message to ESS to ask 
clean. (I currently not sure which way is better.)
   
   And I agree to add the seq id at the fetching side as well (although an 
older fetch should never happen), which seems more consistent with the pushing 
side to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to