[jira] [Commented] (SOLR-10634) Move calculation of some aggregations to collection phase

ASF subversion and git services (JIRA) Tue, 23 May 2017 17:53:48 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022159#comment-16022159
 ]


ASF subversion and git services commented on SOLR-10634:
--------------------------------------------------------

Commit d60c72f34ca9c63ac6075e00dac844c6f052d0a8 in lucene-solr's branch 
refs/heads/master from [[email protected]]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d60c72f ]

SOLR-10634: calc metrics in first phase if limit=-1 and no subfacets


> Move calculation of some aggregations to collection phase
> ---------------------------------------------------------
>
>                 Key: SOLR-10634
>                 URL: https://issues.apache.org/jira/browse/SOLR-10634
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>            Reporter: Yonik Seeley
>         Attachments: SOLR-10634.patch, SOLR-10634.patch
>
>
> From http://markmail.org/message/pwgnt7iqxkzcnckh
> {quote}
> The current code is more optimized for finding the top K buckets from
> a total of N.
> When one asks to return the top 10 buckets when there are potentially
> millions of buckets, it makes sense to defer calculating other metrics
> for those buckets until we know which ones they are.  After we
> identify the top 10 buckets, we calculate the domain for that bucket
> and use that to calculate the remaining metrics.
> The current method is obviously much slower when one is requesting
> *all* buckets.  We might as well just calculate all metrics in the
> first pass rather than trying to defer them.
> {quote}
> So we should move aggregations from the second pass to the first pass under 
> the following conditions:
> - no limit (or a high limit compared to the number of potential buckets?)
> - no sub-facets (or anything else) that will need the domain calculated anyway
> - aggregation is not really memory intensive per-slot (i.e. moving some 
> calculations from per-bucket in the second phase, to all-buckets-in-parallel 
> in the first phase could be really bad for peak memory usage)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10634) Move calculation of some aggregations to collection phase

Reply via email to