nfsantos commented on PR #2604:
URL: https://github.com/apache/jackrabbit-oak/pull/2604#issuecomment-3510021387

   I'm just starting to explore the segment store, so I may be missing the full 
context, but while looking at this PR and at the recovery process, I think the 
logic can be further improved. IIUC, the current logic will process a batch of 
blobs, wait until it is fully processed, then advance to the next one. This 
will cause a burst of quick requests, followed by a slow down until the last 
elements of the current batch are processed. Any stragglers in a batch will 
slow down the total time taken by the batch.
   
   One possible alternative is to instead of batching, we keep a sliding window 
of requests that are being processed. We could keep a count of how many ongoing 
requests are being processed and launch requests until reaching this value. 
When we reach this limit, wait until one request completes and then launch the 
next one. And repeat until all blobs are processed. This way, as soon as the 
earliest request is processed, we can launch a new one. 
   
   Ideally, we should wait for all sync pollers at the same time, to send a new 
request as soon as the fastest one completes, but I'm not sure it is possible 
to do this without launching a thread for each sync poller, which would add a 
lot of complexity, for potentially small gains. Or does Azure support callbacks 
when a request is completed?
   
   Anyway, the code looks good as is, but I just wanted to share this idea. 
Maybe it is worth considering in the future. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to