pltbkd commented on code in PR #3637:
URL: https://github.com/apache/celeborn/pull/3637#discussion_r3015520002


##########
worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/MapPartitionDataReader.java:
##########
@@ -138,8 +138,34 @@ public void open(FileChannel dataFileChannel, FileChannel 
indexFileChannel, long
       this.dataFileChannel = dataFileChannel;
       this.dataFileChannelSize = dataFileChannel.size();
       this.indexFileChannel = indexFileChannel;
+
+      int numSubpartitions = mapFileMeta.getNumSubpartitions();
+      // If numSubpartitions is 0, it means pushDataHandShake was never 
successfully called.
+      // This can happen when the first handshake failed before any data was 
written.
+      // In this case, check if data file is empty, and if so, treat this as 
an empty partition.
+      if (numSubpartitions == 0) {
+        if (dataFileChannelSize == 0) {
+          logger.warn(
+              "Partition {} has numSubpartitions=0 and empty data file, this 
indicates a failed "
+                  + "handshake before any data was written. Treating as empty 
partition.",
+              fileInfo.getFilePath());
+          isOpen = true;
+          // Mark as finished and notify consumer with backlog=1 so they can 
complete normally
+          closeReader();
+          notifyBacklog(1);

Review Comment:
   notifyBacklog(1) is indeed not a standard operation, but I suppose this 
should be fine as a fallback mechanism, considering it's a rare case, and 
compared to endless failover without this fix. We need the client to send a 
credit and trigger data sending, so the recycle can be processed normally. 
notifyBacklog(0) won't make the client send the credit. Besides, I tried to 
recycle the reader here but it caused a stuck state since the client cannot 
properly finish reading. I suppose this should be the only approach to handle 
this issue without client-side changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to