rishabhdaim commented on PR #1795:
URL: https://github.com/apache/jackrabbit-oak/pull/1795#issuecomment-2416552037

   > I'm wondering if we shouldn't also (or primarily) make PROGRESS_BATCH_SIZE 
configurable. Currently it's hardcoded to 10000 and we'd only be able to fiddle 
with the batch size, not the overall one. The overall one is however the one 
which controls how long fullGC occupies the scheduler thread. And in extremis 
we could set it to a ridiculously low value (say 10, then batch size also 10) 
and thus have near zero realistic performance impact on an otherwise very busy 
system. (Not suggesting 10 is a good value - but currently we wouldn't be able 
to configure that at all).
   
   yes, let us add a config for PROGRESS_BATCH_SIZE as well.
   
   >The default for classic delayFactor is 0 - which I think is not ideal - 
zero means it does not delay at all. In production system that should or is 
rather set to 0.5 or 1 etc. I'm not suggesting to change the default for 
classic GC. But for the new full GC one, we could perhaps indeed use a 
different default. Perhaps even as high as 5 (which would mean max 20% time 
usage)?
   
   I don't have any strong opinion here. Using default value 0 allows us to run 
it fast locally while testing. 
   If we use a different default value, even local testing would be slower. But 
for local, we can always change the code.
   So I am fine with other default values.
   
   >Another aspect is that delayOnModifications (where the magic happens) is 
only invoked when deletion happens. But deletion isn't the only load on 
DocumentStore. There's also the reading and for fullGC the embedded 
verification that does many reads. For classic GC that probably applies as well 
to some extend - but the difference is : classic GC only ever works on deltas 
(as it existed since early oak days), it hardly has to scan a large amount. But 
full GC will do a one-off scan of large repositories thus do loads of reads. 
I'd suggest we introduce something like delayOnBatch - in addition to or 
perhaps actually in replacement of delayOnModification use.
   
   Yes, we should add `dealyOnModifications` or `delayOnBatch` (or any other 
name) while collecting garbage as well.
   Regarding `embeddedVerifiacation`, it is called before removing garbage, 
thus is already covered in the current code.
   
   > for collectGarbageOnDocument (which is used by 
[oak-run](https://issues.apache.org/jira/browse/OAK-run)) it doesn't seem 
useful for it to sleep, as it only ever runs on 1 document
   
   For oak run, we could set the default value to 0. It would allow us to reuse 
the existing code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to