[jira] [Updated] (LUCENE-5015) Unexpected performance difference between SamplingAccumulator and StandardFacetAccumulator

Gilad Barkai (JIRA) Sun, 26 May 2013 00:42:23 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gilad Barkai updated LUCENE-5015:
---------------------------------

    Attachment: LUCENE-5015.patch

True, looking at overSampleFactor is enough, but it's not obvious that 
TakmiFixer should be used with overSampleFactor > 1, to better the chances of 
the result top-k being accurate.
I'll add some documentation w.r.t this issue, I hope it will do.

New patch defaults to {{NoopSampleFixer}} which does not touch the results at 
all - if the need is only for a top-k and their counts does not matter, this is 
the least expensive one. 
Also if instead of counts, a percentage sould be displayed (as how much of the 
results match this category), the sampled valued out of the sample size would 
yield the same result as the amortized fixed results out of the actual result 
set size. That might render the amortized fixer moot..

New patch account of {{SampleFixer}} being set in {{SamplingParams}}
                
> Unexpected performance difference between SamplingAccumulator and 
> StandardFacetAccumulator
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5015
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5015
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>    Affects Versions: 4.3
>            Reporter: Rob Audenaerde
>            Assignee: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-5015.patch, LUCENE-5015.patch, LUCENE-5015.patch, 
> LUCENE-5015.patch
>
>
> I have an unexpected performance difference between the SamplingAccumulator 
> and the StandardFacetAccumulator. 
> The case is an index with about 5M documents and each document containing 
> about 10 fields. I created a facet on each of those fields. When searching to 
> retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is 
> about twice as fast as the StandardFacetAccumulator. This is expected and a 
> nice speed-up. 
> However, when I use more CountFacetRequests to retrieve facet-counts for more 
> than one field, the speeds of the SampingAccumulator decreases, to the point 
> where the StandardFacetAccumulator is faster. 
> {noformat} 
> FacetRequests  Sampling    Standard
>  1               391 ms     1100 ms
>  2               531 ms     1095 ms 
>  3               948 ms     1108 ms
>  4              1400 ms     1110 ms
>  5              1901 ms     1102 ms
> {noformat} 
> Is this behaviour normal? I did not expect it, as the SamplingAccumulator 
> needs to do less work? 
> Some code to show what I do:
> {code}
>       searcher.search( facetsQuery, facetsCollector );
>       final List<FacetResult> collectedFacets = 
> facetsCollector.getFacetResults();
> {code}
> {code}
> final FacetSearchParams facetSearchParams = new FacetSearchParams( 
> facetRequests );
> FacetsCollector facetsCollector;
> if ( isSampled )
> {
>       facetsCollector =
>               FacetsCollector.create( new SamplingAccumulator( new 
> RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) );
> }
> else
> {
>       facetsCollector = FacetsCollector.create( FacetsAccumulator.create( 
> facetSearchParams, searcher.getIndexReader(), taxo ) );
> {code}
>                       

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5015) Unexpected performance difference between SamplingAccumulator and StandardFacetAccumulator

Reply via email to