[jira] [Commented] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit

Bruno Roustant (Jira) Tue, 25 Feb 2020 02:23:32 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044309#comment-17044309
 ]


Bruno Roustant commented on LUCENE-9237:
----------------------------------------

I measured the term dictionary size on disk (wikimediumall):
For Lucene84 it takes 30.6 MB of tip files (sum of multiple segment files)
For UniformSplit it takes 19.5 MB of ustd files (sum of multiple segment files) 
(+ 6.1 MB of lucene84 tip which are for facets?)
I suppose I should discount 6.1 MB for facets for Lucene84, which gives 
30.6-6.1 = 24.5 MB
So in my benchmark UniformSplit has a smaller term dictionary (expected -20%).

I'll do another benchmark with a block size of 26 terms for UniformSplit 
(instead of 32), which should give us same term dictionary size (it is quite 
linear). And I'll force FST-on-heap for Lucene84.

> Faster TermsEnum intersect for UniformSplit
> -------------------------------------------
>
>                 Key: LUCENE-9237
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9237
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Assignee: Bruno Roustant
>            Priority: Major
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> New version of TermsEnum intersect for UniformSplit. It is 75% more efficient 
> than the previous version for FuzzyQuery.
> Compared to BlockTree IntersectTermsEnum:
>  - It is still slower for FuzzyQuery (-37%) but it is faster than the 
> previous version (which was -65%).
>  - It is slightly slower for WildcardQuery (-5%).
>  - It is slightly faster for PrefixQuery (+5%). Sometimes benchmarks show 
> more improvement (I've seen up to +17% a fourth of the time).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit

Reply via email to