[
https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575802#comment-13575802
]
Michael McCandless commented on LUCENE-4769:
--------------------------------------------
Full (6.6M) wikibig index, 7 facet dims:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
Respell 46.60 (3.4%) 45.82 (4.1%)
-1.7% ( -8% - 6%)
HighSpanNear 3.49 (1.7%) 3.51 (2.2%)
0.8% ( -3% - 4%)
HighPhrase 17.13 (10.5%) 17.42 (11.0%)
1.7% ( -17% - 26%)
Fuzzy2 53.25 (2.8%) 54.19 (3.1%)
1.8% ( -4% - 7%)
AndHighLow 587.43 (2.3%) 597.84 (2.6%)
1.8% ( -3% - 6%)
LowSloppyPhrase 20.30 (2.3%) 20.68 (2.3%)
1.9% ( -2% - 6%)
LowSpanNear 8.24 (2.3%) 8.42 (2.9%)
2.1% ( -3% - 7%)
AndHighHigh 23.36 (1.3%) 23.95 (0.9%)
2.5% ( 0% - 4%)
HighSloppyPhrase 0.92 (5.1%) 0.94 (6.1%)
2.8% ( -7% - 14%)
LowPhrase 21.02 (6.2%) 21.63 (6.7%)
2.9% ( -9% - 16%)
MedSpanNear 28.31 (1.3%) 29.20 (1.5%)
3.1% ( 0% - 6%)
MedSloppyPhrase 25.98 (1.7%) 26.79 (1.7%)
3.1% ( 0% - 6%)
MedTerm 47.54 (1.9%) 49.49 (3.4%)
4.1% ( -1% - 9%)
Fuzzy1 47.28 (2.2%) 49.27 (2.6%)
4.2% ( 0% - 9%)
AndHighMed 105.55 (0.9%) 112.03 (1.2%)
6.1% ( 3% - 8%)
Wildcard 27.63 (1.2%) 30.03 (1.6%)
8.7% ( 5% - 11%)
MedPhrase 109.43 (5.6%) 122.45 (7.4%)
11.9% ( 0% - 26%)
LowTerm 110.94 (1.9%) 128.73 (1.8%)
16.0% ( 12% - 20%)
OrHighLow 17.11 (2.2%) 22.44 (3.7%)
31.1% ( 24% - 37%)
OrHighMed 16.63 (2.1%) 21.89 (3.8%)
31.6% ( 25% - 38%)
HighTerm 19.17 (1.9%) 26.30 (3.5%)
37.2% ( 31% - 43%)
OrHighHigh 8.77 (2.4%) 12.45 (4.7%)
42.1% ( 34% - 50%)
Prefix3 13.06 (1.8%) 18.66 (2.2%)
42.9% ( 38% - 47%)
IntNRQ 3.59 (1.6%) 6.45 (3.3%)
79.8% ( 73% - 86%)
{noformat}
trunk DVs take 61.4 MB while the int[] cache takes 202.9 MB (3.3X
more). Also, if users use the int[] cache they must remember to use
(and maybe we check / warn about it somehow) a disk-backed DV else
it's silly since you'd be double-caching in RAM.
Curiously these gains are not that much better (except IntNRQ) than
LUCENE-4764, which was only ~31% larger... which is odd because we had
previously tested [close to] LUCENE-4764 against int[] cache and it
was faster.
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
> Key: LUCENE-4769
> URL: https://issues.apache.org/jira/browse/LUCENE-4769
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a
> CachedInts structure on LUCENE-4609. I ported it to the new facets API, as a
> FacetsAggregator. I think we should offer users the means to use such a
> cache, even if it consumes more RAM. Mike tests show that this cache consumed
> x2 more RAM than if the DocValues were loaded into memory in their raw form.
> Also, a PackedInts version of such cache took almost the same amount of RAM
> as straight int[], but the gains were minor.
> I will post the patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]