[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true

Hua Jiang (JIRA) Wed, 04 Jun 2014 04:20:14 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017623#comment-14017623
 ]


Hua Jiang commented on SOLR-4763:
---------------------------------

Hello, Varun. Thanks for your feedback. 

I rebuild lucene_solr on my laptop, and every tests just pass. I made this 
patch base on revision 1553089. If you are using a different revision, you may 
have to do some modification yourself. I will explain the patch a little more, 
and hope it helps.

In the unpatched code, the groupedFacetHits is a list of GroupedFacetHit 
objects, which stores unique combinations of values of the group field and the 
facet field in the previous segments. When a new segment is opened, this list 
is traversed first to recalculate the segmentGroupedFacetsIndex, because that 
value may differ from segment to segment. That's what the loop you mentioned in 
the setNextReader() is doing.

During the the recalculation, the lookupTerm() method is invoked on 
facetFieldTermsIndex and groupFieldTermsIndex. This method uses binary search 
to lookup values among all the values that appears in the group/facet field in 
the current segment.

Let's assume that we have D documents distributed in S segments. And the 
docments are distributed evenly, so that we have G and F unique values in each 
segment for the group and facet field, and that the length of the 
groupedFacetHits list after the nth segment is processed is n*L. Then the 
complexity of the recalculation is (logG + logF) * (L + 2L + ... + (S-1)L) ~ 
O((LogG + LogF)*L*S^2). It's proportion to S squared. As S grows, performance 
drops rapidly.

In the patched version, I changed groupedFacetHits from a list to a set. So the 
recalculation can be avoided, because when you get a GroupFacetHit, you just 
add it the to set without worrying about that some other GroupFacetHit with the 
same group and facet field values has been added before, because it is a set. 
The add() method on a set will return false, when the same values is already 
added.


> Performance issue when using group.facet=true
> ---------------------------------------------
>
>                 Key: SOLR-4763
>                 URL: https://issues.apache.org/jira/browse/SOLR-4763
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.2
>            Reporter: Alexander Koval
>         Attachments: SOLR-4763.patch, SOLR-4763.patch
>
>
> I do not know whether this is bug or not. But calculating facets with 
> {{group.facet=true}} is too slow.
> I have query that:
> {code}
> "matches": 730597,
> "ngroups": 24024,
> {code}
> 1. All queries with {{group.facet=true}}:
> {code}
> "QTime": 5171
> "facet": {
>     "time": 4716
> {code}
> 2. Without {{group.facet}}:
> * First query:
> {code}
> "QTime": 3284
> "facet": {
>     "time": 3104
> {code}
> * Next queries:
> {code}
> "QTime": 230,
> "facet": {
>     "time": 76
> {code}
> So I think with {{group.facet=true}} Solr doesn't use cache to calculate 
> facets.
> Is it possible to improve performance of facets when {{group.facet=true}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true

Reply via email to