nfsantos commented on PR #2604: URL: https://github.com/apache/jackrabbit-oak/pull/2604#issuecomment-3510021387
I'm just starting to explore the segment store, so I may be missing the full context, but while looking at this PR and at the recovery process, I think the logic can be further improved. IIUC, the current logic will process a batch of blobs, wait until it is fully processed, then advance to the next one. This will cause a burst of quick requests, followed by a slow down until the last elements of the current batch are processed. Any stragglers in a batch will slow down the total time taken by the batch. One possible alternative is to instead of batching, we keep a sliding window of requests that are being processed. We could keep a count of how many ongoing requests are being processed and launch requests until reaching this value. When we reach this limit, wait until one request completes and then launch the next one. And repeat until all blobs are processed. This way, as soon as the earliest request is processed, we can launch a new one. Ideally, we should wait for all sync pollers at the same time, to send a new request as soon as the fastest one completes, but I'm not sure it is possible to do this without launching a thread for each sync poller, which would add a lot of complexity, for potentially small gains. Or does Azure support callbacks when a request is completed? Anyway, the code looks good as is, but I just wanted to share this idea. Maybe it is worth considering in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
