[
https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017623#comment-14017623
]
Hua Jiang commented on SOLR-4763:
---------------------------------
Hello, Varun. Thanks for your feedback.
I rebuild lucene_solr on my laptop, and every tests just pass. I made this
patch base on revision 1553089. If you are using a different revision, you may
have to do some modification yourself. I will explain the patch a little more,
and hope it helps.
In the unpatched code, the groupedFacetHits is a list of GroupedFacetHit
objects, which stores unique combinations of values of the group field and the
facet field in the previous segments. When a new segment is opened, this list
is traversed first to recalculate the segmentGroupedFacetsIndex, because that
value may differ from segment to segment. That's what the loop you mentioned in
the setNextReader() is doing.
During the the recalculation, the lookupTerm() method is invoked on
facetFieldTermsIndex and groupFieldTermsIndex. This method uses binary search
to lookup values among all the values that appears in the group/facet field in
the current segment.
Let's assume that we have D documents distributed in S segments. And the
docments are distributed evenly, so that we have G and F unique values in each
segment for the group and facet field, and that the length of the
groupedFacetHits list after the nth segment is processed is n*L. Then the
complexity of the recalculation is (logG + logF) * (L + 2L + ... + (S-1)L) ~
O((LogG + LogF)*L*S^2). It's proportion to S squared. As S grows, performance
drops rapidly.
In the patched version, I changed groupedFacetHits from a list to a set. So the
recalculation can be avoided, because when you get a GroupFacetHit, you just
add it the to set without worrying about that some other GroupFacetHit with the
same group and facet field values has been added before, because it is a set.
The add() method on a set will return false, when the same values is already
added.
> Performance issue when using group.facet=true
> ---------------------------------------------
>
> Key: SOLR-4763
> URL: https://issues.apache.org/jira/browse/SOLR-4763
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.2
> Reporter: Alexander Koval
> Attachments: SOLR-4763.patch, SOLR-4763.patch
>
>
> I do not know whether this is bug or not. But calculating facets with
> {{group.facet=true}} is too slow.
> I have query that:
> {code}
> "matches": 730597,
> "ngroups": 24024,
> {code}
> 1. All queries with {{group.facet=true}}:
> {code}
> "QTime": 5171
> "facet": {
> "time": 4716
> {code}
> 2. Without {{group.facet}}:
> * First query:
> {code}
> "QTime": 3284
> "facet": {
> "time": 3104
> {code}
> * Next queries:
> {code}
> "QTime": 230,
> "facet": {
> "time": 76
> {code}
> So I think with {{group.facet=true}} Solr doesn't use cache to calculate
> facets.
> Is it possible to improve performance of facets when {{group.facet=true}}?
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]