rishabhdaim commented on PR #1795: URL: https://github.com/apache/jackrabbit-oak/pull/1795#issuecomment-2416552037
> I'm wondering if we shouldn't also (or primarily) make PROGRESS_BATCH_SIZE configurable. Currently it's hardcoded to 10000 and we'd only be able to fiddle with the batch size, not the overall one. The overall one is however the one which controls how long fullGC occupies the scheduler thread. And in extremis we could set it to a ridiculously low value (say 10, then batch size also 10) and thus have near zero realistic performance impact on an otherwise very busy system. (Not suggesting 10 is a good value - but currently we wouldn't be able to configure that at all). yes, let us add a config for PROGRESS_BATCH_SIZE as well. >The default for classic delayFactor is 0 - which I think is not ideal - zero means it does not delay at all. In production system that should or is rather set to 0.5 or 1 etc. I'm not suggesting to change the default for classic GC. But for the new full GC one, we could perhaps indeed use a different default. Perhaps even as high as 5 (which would mean max 20% time usage)? I don't have any strong opinion here. Using default value 0 allows us to run it fast locally while testing. If we use a different default value, even local testing would be slower. But for local, we can always change the code. So I am fine with other default values. >Another aspect is that delayOnModifications (where the magic happens) is only invoked when deletion happens. But deletion isn't the only load on DocumentStore. There's also the reading and for fullGC the embedded verification that does many reads. For classic GC that probably applies as well to some extend - but the difference is : classic GC only ever works on deltas (as it existed since early oak days), it hardly has to scan a large amount. But full GC will do a one-off scan of large repositories thus do loads of reads. I'd suggest we introduce something like delayOnBatch - in addition to or perhaps actually in replacement of delayOnModification use. Yes, we should add `dealyOnModifications` or `delayOnBatch` (or any other name) while collecting garbage as well. Regarding `embeddedVerifiacation`, it is called before removing garbage, thus is already covered in the current code. > for collectGarbageOnDocument (which is used by [oak-run](https://issues.apache.org/jira/browse/OAK-run)) it doesn't seem useful for it to sleep, as it only ever runs on 1 document For oak run, we could set the default value to 0. It would allow us to reuse the existing code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
