[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high
[ https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492469#comment-16492469 ] Tommaso Teofili commented on LUCENE-8162: - {quote}but many users index at full speed for a long time and suppressing merges in that case is dangerous {quote} yes, that might make search degrade. To mitigate that the proposed MP has a maximum number of segments allowed for throttling. So for example if the throttling algorithm makes the number of segments go beyond a configurable threshold (e.g. 20), the throttling algorithm doesn't kick in in the next merge and until the number of segments gets back beyond the threshold. I have been trying to use [https://github.com/mikemccand/luceneutil] to make some benchmarks. However it seems the tool only creates one index per benchmark. > Make it possible to throttle (Tiered)MergePolicy when commit rate is high > - > > Key: LUCENE-8162 > URL: https://issues.apache.org/jira/browse/LUCENE-8162 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Tommaso Teofili >Priority: Major > Fix For: trunk > > Attachments: LUCENE-8162.0.patch > > > As discussed in a recent mailing list thread [1] and observed in a project > using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle > the aggressiveness of (Tiered)MergePolicy when commit rate is high. > In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was > implemented [2]. > That MP doesn't merge in case the number of segments is below a certain > threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high > (e.g. above 1000 doc / sec , 5MB / sec). > In such impl, the commit rate thresholds adapt to average commit rate by > means of single exponential smoothing. > The results in that specific case looked encouraging as it brought a 5% perf > improvement in querying and ~10% reduced IO. However Oak has some specifics > which might not fit in other scenarios. Anyway it could be interesting to see > how this behaves in plain Lucene scenario. > [1] : [http://markmail.org/message/re3ifmq2664bqfjk] > [2] : > [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high
[ https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468743#comment-16468743 ] Tommaso Teofili commented on LUCENE-8162: - [~mikemccand] any suggestions on how to make "reliable" tests with different merge policies ? Even though this merge policy was designed for a specific use case, I would still be curious to do some experiments on how it behaves in a more common case (e.g. benchmarking indexing / queries on wikipedia). > Make it possible to throttle (Tiered)MergePolicy when commit rate is high > - > > Key: LUCENE-8162 > URL: https://issues.apache.org/jira/browse/LUCENE-8162 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Tommaso Teofili >Priority: Major > Fix For: trunk > > > As discussed in a recent mailing list thread [1] and observed in a project > using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle > the aggressiveness of (Tiered)MergePolicy when commit rate is high. > In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was > implemented [2]. > That MP doesn't merge in case the number of segments is below a certain > threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high > (e.g. above 1000 doc / sec , 5MB / sec). > In such impl, the commit rate thresholds adapt to average commit rate by > means of single exponential smoothing. > The results in that specific case looked encouraging as it brought a 5% perf > improvement in querying and ~10% reduced IO. However Oak has some specifics > which might not fit in other scenarios. Anyway it could be interesting to see > how this behaves in plain Lucene scenario. > [1] : [http://markmail.org/message/re3ifmq2664bqfjk] > [2] : > [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high
[ https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355374#comment-16355374 ] Tommaso Teofili commented on LUCENE-8162: - the class in Oak is a fork of TMP, but the one in Lucene would extend TMP (see [https://gist.github.com/tteofili/f60bd633557b93be106dc8e806d2b8fa).] the logic uses doc/sec and mb/sec so you're right that the no. of _commits_ is not measured. {quote}So if I index at a high rate but don't commit, the throttling logic can still kick in? {quote} yes > Make it possible to throttle (Tiered)MergePolicy when commit rate is high > - > > Key: LUCENE-8162 > URL: https://issues.apache.org/jira/browse/LUCENE-8162 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Tommaso Teofili >Priority: Major > Fix For: trunk > > > As discussed in a recent mailing list thread [1] and observed in a project > using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle > the aggressiveness of (Tiered)MergePolicy when commit rate is high. > In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was > implemented [2]. > That MP doesn't merge in case the number of segments is below a certain > threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high > (e.g. above 1000 doc / sec , 5MB / sec). > In such impl, the commit rate thresholds adapt to average commit rate by > means of single exponential smoothing. > The results in that specific case looked encouraging as it brought a 5% perf > improvement in querying and ~10% reduced IO. However Oak has some specifics > which might not fit in other scenarios. Anyway it could be interesting to see > how this behaves in plain Lucene scenario. > [1] : [http://markmail.org/message/re3ifmq2664bqfjk] > [2] : > [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high
[ https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355352#comment-16355352 ] Michael McCandless commented on LUCENE-8162: The class looks like a fork of TMP, but it looks like it could be done instead as a subclass, i.e. calling super.findMerges, but then implementing its logic to return null if it wants to throttle? It would make it easier to see what logic it's changing. It seems to use docs/sec, not commit rate, right? So if I index at a high rate but don't commit, the throttling logic can still kick in? I think the logic is dangerous for general usage: it seems to throttle merges when indexing rate is high? This may work well for Oak usage, as long as sometimes indexing rate falls to a slow rate, but many users index at full speed for a long time and suppressing merges in that case is dangerous. > Make it possible to throttle (Tiered)MergePolicy when commit rate is high > - > > Key: LUCENE-8162 > URL: https://issues.apache.org/jira/browse/LUCENE-8162 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Tommaso Teofili >Priority: Major > Fix For: trunk > > > As discussed in a recent mailing list thread [1] and observed in a project > using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle > the aggressiveness of (Tiered)MergePolicy when commit rate is high. > In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was > implemented [2]. > That MP doesn't merge in case the number of segments is below a certain > threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high > (e.g. above 1000 doc / sec , 5MB / sec). > In such impl, the commit rate thresholds adapt to average commit rate by > means of single exponential smoothing. > The results in that specific case looked encouraging as it brought a 5% perf > improvement in querying and ~10% reduced IO. However Oak has some specifics > which might not fit in other scenarios. Anyway it could be interesting to see > how this behaves in plain Lucene scenario. > [1] : [http://markmail.org/message/re3ifmq2664bqfjk] > [2] : > [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org