[
https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044309#comment-17044309
]
Bruno Roustant commented on LUCENE-9237:
----------------------------------------
I measured the term dictionary size on disk (wikimediumall):
For Lucene84 it takes 30.6 MB of tip files (sum of multiple segment files)
For UniformSplit it takes 19.5 MB of ustd files (sum of multiple segment files)
(+ 6.1 MB of lucene84 tip which are for facets?)
I suppose I should discount 6.1 MB for facets for Lucene84, which gives
30.6-6.1 = 24.5 MB
So in my benchmark UniformSplit has a smaller term dictionary (expected -20%).
I'll do another benchmark with a block size of 26 terms for UniformSplit
(instead of 32), which should give us same term dictionary size (it is quite
linear). And I'll force FST-on-heap for Lucene84.
> Faster TermsEnum intersect for UniformSplit
> -------------------------------------------
>
> Key: LUCENE-9237
> URL: https://issues.apache.org/jira/browse/LUCENE-9237
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Bruno Roustant
> Assignee: Bruno Roustant
> Priority: Major
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> New version of TermsEnum intersect for UniformSplit. It is 75% more efficient
> than the previous version for FuzzyQuery.
> Compared to BlockTree IntersectTermsEnum:
> - It is still slower for FuzzyQuery (-37%) but it is faster than the
> previous version (which was -65%).
> - It is slightly slower for WildcardQuery (-5%).
> - It is slightly faster for PrefixQuery (+5%). Sometimes benchmarks show
> more improvement (I've seen up to +17% a fourth of the time).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]