saurabhd336 commented on code in PR #3132:
URL: https://github.com/apache/celeborn/pull/3132#discussion_r1984898298
##########
client/src/main/java/org/apache/celeborn/client/read/WorkerPartitionReader.java:
##########
@@ -210,14 +224,34 @@ public PartitionLocation getLocation() {
return location;
}
+ @Override
+ public WorkerPartitionReaderCheckpointMetadata
getPartitionReaderCheckpointMetadata() {
+ return isCheckpointEnabled
+ ? new WorkerPartitionReaderCheckpointMetadata(chunkIdsAlreadyReturned)
+ : null;
+ }
+
+ @Override
+ public void updateCheckpointMetadata(WorkerPartitionReaderCheckpointMetadata
checkpointMetadata) {
+ chunkIdsAlreadyReturned = checkpointMetadata.getReturnedChunks();
+ }
+
private void fetchChunks() throws IOException, InterruptedException {
final int inFlight = chunkIndex - startChunkIndex - returnedChunks;
if (inFlight < fetchMaxReqsInFlight) {
- final int toFetch =
- Math.min(fetchMaxReqsInFlight - inFlight + 1, endChunkIndex + 1 -
chunkIndex);
- for (int i = 0; i < toFetch; i++) {
- if (testFetch && fetchChunkRetryCnt < fetchChunkMaxRetry - 1 &&
chunkIndex == 3) {
+ int toFetch = Math.min(fetchMaxReqsInFlight - inFlight + 1,
endChunkIndex + 1 - chunkIndex);
+
+ while (toFetch > 0 && chunkIndex <= endChunkIndex) {
+ if (chunkIdsAlreadyReturned.contains(chunkIndex)) {
+ logger.info(
+ "Skipping chunk {} as it has already been returned,"
+ + " likely by a previous reader for the same partition.",
+ chunkIndex);
+ chunkIndex++;
+ returnedChunks++;
Review Comment:
The way I read `toFetch`, it seems like it's trying to ensure no more than
`fetchMaxReqsInFlight` requests are submitted at once, while ensuring we fetch
as many chunks as possible at once.
My thought process is that if we're skipping certain chunks, we could
instead fetch other chunks in the list. WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]