saurabhd336 commented on code in PR #3132:
URL: https://github.com/apache/celeborn/pull/3132#discussion_r1986120691
##########
client/src/main/java/org/apache/celeborn/client/read/WorkerPartitionReader.java:
##########
@@ -210,14 +224,34 @@ public PartitionLocation getLocation() {
return location;
}
+ @Override
+ public WorkerPartitionReaderCheckpointMetadata
getPartitionReaderCheckpointMetadata() {
+ return isCheckpointEnabled
+ ? new WorkerPartitionReaderCheckpointMetadata(chunkIdsAlreadyReturned)
+ : null;
+ }
+
+ @Override
+ public void updateCheckpointMetadata(WorkerPartitionReaderCheckpointMetadata
checkpointMetadata) {
+ chunkIdsAlreadyReturned = checkpointMetadata.getReturnedChunks();
+ }
+
private void fetchChunks() throws IOException, InterruptedException {
final int inFlight = chunkIndex - startChunkIndex - returnedChunks;
if (inFlight < fetchMaxReqsInFlight) {
- final int toFetch =
- Math.min(fetchMaxReqsInFlight - inFlight + 1, endChunkIndex + 1 -
chunkIndex);
- for (int i = 0; i < toFetch; i++) {
- if (testFetch && fetchChunkRetryCnt < fetchChunkMaxRetry - 1 &&
chunkIndex == 3) {
+ int toFetch = Math.min(fetchMaxReqsInFlight - inFlight + 1,
endChunkIndex + 1 - chunkIndex);
+
+ while (toFetch > 0 && chunkIndex <= endChunkIndex) {
+ if (chunkIdsAlreadyReturned.contains(chunkIndex)) {
+ logger.info(
+ "Skipping chunk {} as it has already been returned,"
+ + " likely by a previous reader for the same partition.",
+ chunkIndex);
+ chunkIndex++;
+ returnedChunks++;
Review Comment:
Actually, come to think of it, incrementing toFetch here would be wrong and
cause an infinite wait. I added a comment explaining it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]