[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high

2018-05-28 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492469#comment-16492469
 ] 

Tommaso Teofili commented on LUCENE-8162:
-

{quote}but many users index at full speed for a long time and suppressing 
merges in that case is dangerous
{quote}
yes, that might make search degrade. To mitigate that the proposed MP has a 
maximum number of segments allowed for throttling. So for example if the 
throttling algorithm makes the number of segments go beyond a configurable 
threshold (e.g. 20), the throttling algorithm doesn't kick in in the next merge 
and until the number of segments gets back beyond the threshold.

I have been trying to use [https://github.com/mikemccand/luceneutil] to make 
some benchmarks. However it seems the tool only creates one index per 
benchmark. 

> Make it possible to throttle (Tiered)MergePolicy when commit rate is high
> -
>
> Key: LUCENE-8162
> URL: https://issues.apache.org/jira/browse/LUCENE-8162
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Tommaso Teofili
>Priority: Major
> Fix For: trunk
>
> Attachments: LUCENE-8162.0.patch
>
>
> As discussed in a recent mailing list thread [1] and observed in a project 
> using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle 
> the aggressiveness of (Tiered)MergePolicy when commit rate is high.
> In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was 
> implemented [2].
> That MP doesn't merge in case the number of segments is below a certain 
> threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high 
> (e.g. above 1000 doc / sec , 5MB / sec).
> In such impl, the commit rate thresholds adapt to average commit rate by 
> means of single exponential smoothing.
> The results in that specific case looked encouraging as it brought a 5% perf 
> improvement in querying and ~10% reduced IO. However Oak has some specifics 
> which might not fit in other scenarios. Anyway it could be interesting to see 
> how this behaves in plain Lucene scenario.
> [1] : [http://markmail.org/message/re3ifmq2664bqfjk]
> [2] : 
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high

2018-05-09 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468743#comment-16468743
 ] 

Tommaso Teofili commented on LUCENE-8162:
-

[~mikemccand] any suggestions on how to make "reliable" tests with different 
merge policies ? Even though this merge policy was designed for a specific use 
case, I would still be curious to do some experiments on how it behaves in a 
more common case (e.g. benchmarking indexing / queries on wikipedia).

> Make it possible to throttle (Tiered)MergePolicy when commit rate is high
> -
>
> Key: LUCENE-8162
> URL: https://issues.apache.org/jira/browse/LUCENE-8162
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Tommaso Teofili
>Priority: Major
> Fix For: trunk
>
>
> As discussed in a recent mailing list thread [1] and observed in a project 
> using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle 
> the aggressiveness of (Tiered)MergePolicy when commit rate is high.
> In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was 
> implemented [2].
> That MP doesn't merge in case the number of segments is below a certain 
> threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high 
> (e.g. above 1000 doc / sec , 5MB / sec).
> In such impl, the commit rate thresholds adapt to average commit rate by 
> means of single exponential smoothing.
> The results in that specific case looked encouraging as it brought a 5% perf 
> improvement in querying and ~10% reduced IO. However Oak has some specifics 
> which might not fit in other scenarios. Anyway it could be interesting to see 
> how this behaves in plain Lucene scenario.
> [1] : [http://markmail.org/message/re3ifmq2664bqfjk]
> [2] : 
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high

2018-02-07 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355374#comment-16355374
 ] 

Tommaso Teofili commented on LUCENE-8162:
-

the class in Oak is a fork of TMP, but the one in Lucene would extend TMP (see 
[https://gist.github.com/tteofili/f60bd633557b93be106dc8e806d2b8fa).]

the logic uses doc/sec and mb/sec so you're right that the no. of _commits_ is 
not measured.
{quote}So if I index at a high rate but don't commit, the throttling logic can 
still kick in?
{quote}
yes

> Make it possible to throttle (Tiered)MergePolicy when commit rate is high
> -
>
> Key: LUCENE-8162
> URL: https://issues.apache.org/jira/browse/LUCENE-8162
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Tommaso Teofili
>Priority: Major
> Fix For: trunk
>
>
> As discussed in a recent mailing list thread [1] and observed in a project 
> using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle 
> the aggressiveness of (Tiered)MergePolicy when commit rate is high.
> In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was 
> implemented [2].
> That MP doesn't merge in case the number of segments is below a certain 
> threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high 
> (e.g. above 1000 doc / sec , 5MB / sec).
> In such impl, the commit rate thresholds adapt to average commit rate by 
> means of single exponential smoothing.
> The results in that specific case looked encouraging as it brought a 5% perf 
> improvement in querying and ~10% reduced IO. However Oak has some specifics 
> which might not fit in other scenarios. Anyway it could be interesting to see 
> how this behaves in plain Lucene scenario.
> [1] : [http://markmail.org/message/re3ifmq2664bqfjk]
> [2] : 
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8162) Make it possible to throttle (Tiered)MergePolicy when commit rate is high

2018-02-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355352#comment-16355352
 ] 

Michael McCandless commented on LUCENE-8162:


The class looks like a fork of TMP, but it looks like it could be done instead 
as a subclass, i.e. calling super.findMerges, but then implementing its logic 
to return null if it wants to throttle?  It would make it easier to see what 
logic it's changing.

It seems to use docs/sec, not commit rate, right?  So if I index at a high rate 
but don't commit, the throttling logic can still kick in?

I think the logic is dangerous for general usage: it seems to throttle merges 
when indexing rate is high?  This may work well for Oak usage, as long as 
sometimes indexing rate falls to a slow rate, but many users index at full 
speed for a long time and suppressing merges in that case is dangerous.

> Make it possible to throttle (Tiered)MergePolicy when commit rate is high
> -
>
> Key: LUCENE-8162
> URL: https://issues.apache.org/jira/browse/LUCENE-8162
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Tommaso Teofili
>Priority: Major
> Fix For: trunk
>
>
> As discussed in a recent mailing list thread [1] and observed in a project 
> using Lucene (see OAK-5192 and OAK-6710), it is sometimes helpful to throttle 
> the aggressiveness of (Tiered)MergePolicy when commit rate is high.
> In the case of Apache Jackrabbit Oak a dedicated {{MergePolicy}} was 
> implemented [2].
> That MP doesn't merge in case the number of segments is below a certain 
> threshold (e.g. 30) and commit rate (docs per sec and MB per sec) is high 
> (e.g. above 1000 doc / sec , 5MB / sec).
> In such impl, the commit rate thresholds adapt to average commit rate by 
> means of single exponential smoothing.
> The results in that specific case looked encouraging as it brought a 5% perf 
> improvement in querying and ~10% reduced IO. However Oak has some specifics 
> which might not fit in other scenarios. Anyway it could be interesting to see 
> how this behaves in plain Lucene scenario.
> [1] : [http://markmail.org/message/re3ifmq2664bqfjk]
> [2] : 
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/CommitMitigatingTieredMergePolicy.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org