[
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4602:
---------------------------------------
Attachment: LUCENE-4602.patch
OK good news! I hacked up a way to index the byte[] into DocValues
field instead of payloads, and modified the previous
CachingFacetsCollector to use DocValues instead of its own hacked
cache (renamed it to DocValuesFacetsCollector):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
HighTerm 1.27 (2.9%) 2.29 (2.7%)
80.2% ( 72% - 88%)
MedTerm 4.79 (1.3%) 14.83 (4.0%)
209.5% ( 201% - 217%)
LowTerm 10.50 (0.8%) 33.84 (1.9%)
222.3% ( 217% - 226%)
{noformat}
This is only a bit slower than my original hacked up
CachingFacetsCollector results, so net/net DocValues looks to be just
as good.
That was for in-RAM DocValues. Then I tested with DirectSource
(leaves DocValues on disk, but the file is hot (in OS's IO cache) in
this test):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
HighTerm 1.26 (1.1%) 1.43 (1.0%)
13.8% ( 11% - 16%)
MedTerm 4.78 (0.5%) 10.22 (1.7%)
113.9% ( 111% - 116%)
LowTerm 10.49 (0.4%) 27.95 (1.4%)
166.6% ( 164% - 168%)
{noformat}
Not bad! Only a bit slower than in RAM ... so net/net I think we
should cutover facets to DVs?
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
> Key: LUCENE-4602
> URL: https://issues.apache.org/jira/browse/LUCENE-4602
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: LUCENE-4602.patch, LUCENE-4602.patch
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads. I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
> Task QPS base StdDev QPS comp StdDev
> Pct diff
> HighTerm 0.53 (0.9%) 1.00 (2.5%)
> 87.3% ( 83% - 91%)
> LowTerm 7.59 (0.6%) 26.75 (12.9%)
> 252.6% ( 237% - 267%)
> MedTerm 3.35 (0.7%) 12.71 (9.0%)
> 279.8% ( 268% - 291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]