[ 
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4602:
---------------------------------------

    Attachment: LUCENE-4602.patch

OK good news!  I hacked up a way to index the byte[] into DocValues
field instead of payloads, and modified the previous
CachingFacetsCollector to use DocValues instead of its own hacked
cache (renamed it to DocValuesFacetsCollector):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                HighTerm        1.27      (2.9%)        2.29      (2.7%)   
80.2% (  72% -   88%)
                 MedTerm        4.79      (1.3%)       14.83      (4.0%)  
209.5% ( 201% -  217%)
                 LowTerm       10.50      (0.8%)       33.84      (1.9%)  
222.3% ( 217% -  226%)
{noformat}

This is only a bit slower than my original hacked up
CachingFacetsCollector results, so net/net DocValues looks to be just
as good.

That was for in-RAM DocValues.  Then I tested with DirectSource
(leaves DocValues on disk, but the file is hot (in OS's IO cache) in
this test):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                HighTerm        1.26      (1.1%)        1.43      (1.0%)   
13.8% (  11% -   16%)
                 MedTerm        4.78      (0.5%)       10.22      (1.7%)  
113.9% ( 111% -  116%)
                 LowTerm       10.49      (0.4%)       27.95      (1.4%)  
166.6% ( 164% -  168%)
{noformat}

Not bad!  Only a bit slower than in RAM ... so net/net I think we
should cutover facets to DVs?

                
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
>                 Key: LUCENE-4602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4602
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4602.patch, LUCENE-4602.patch
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads.  I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                 HighTerm        0.53      (0.9%)        1.00      (2.5%)   
> 87.3% (  83% -   91%)
>                  LowTerm        7.59      (0.6%)       26.75     (12.9%)  
> 252.6% ( 237% -  267%)
>                  MedTerm        3.35      (0.7%)       12.71      (9.0%)  
> 279.8% ( 268% -  291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to