[ https://issues.apache.org/jira/browse/SOLR-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204065#comment-13204065 ]
Russell Black commented on SOLR-3109: ------------------------------------- Martijn, I also noticed that {{TopGroupsShardResponseProcessor}} can't to deal with multiple ShardRequests (although it looks like it wouldn't be to hard to add this ability). At any rate, your approach of returning a single ShardRequest containing all relevant shards sounds like the right one. I went one step further and refactored {{TopGroupsShardRequestFactory.java}} because there was significant code duplication in the class's two primary methods. In my testing I also discovered a closely related problem. The bug is in the data structure used to map search groups to the shards which contain them. {{ResponseBuilder.searchGroupToShard}} assumes that a given search group only resides on one shard. I could not find this assumption documented anywhere, nor can I find a reason such a restriction need be imposed. This structure is populated by {{SearchGroupShardResponseProcessor}}. There is a race condition there, wherein the last shard to report a search group will be assumed to be the only shard containing the search group. This data structure is used in {{TopGroupsShardRequestFactory.createRequestForSpecificShards()}} to known which shards to query. This means you can get a different set of shards to query depending on shard query order. I have changed the structure to allow a search group to be present in multiple shards. Patch to follow. > group=true requests result in numerous redundant shard requests > --------------------------------------------------------------- > > Key: SOLR-3109 > URL: https://issues.apache.org/jira/browse/SOLR-3109 > Project: Solr > Issue Type: Bug > Components: search > Affects Versions: 3.5, 4.0 > Environment: 64-bit Linux, sharded environment > Reporter: Russell Black > Assignee: Martijn van Groningen > Priority: Critical > Labels: patch, performance > Attachments: SOLR-3109.patch, SOLR-3109.patch > > > During the second phase of a group query, the collator sends a query to each > of the shards. The purpose of this query is for shards to respond with the > doc ids that match the set of group ids returned from the first phase. The > problem is that it sends this second query to each shard multiple times. > Specifically, in an environment with n shards, each shard will be hit with an > identical query n times during the second phase of query processing, > resulting in O(_n_ ^2^) performance where _n_ is the number of shards. > I have traced this bug down to a single line in > {{TopGroupsShardRequestFactory.java}}, and I am attaching a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org