[
https://issues.apache.org/jira/browse/SOLR-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204065#comment-13204065
]
Russell Black edited comment on SOLR-3109 at 2/8/12 10:47 PM:
--------------------------------------------------------------
Martijn, I also noticed that {{TopGroupsShardResponseProcessor}} can't deal
with multiple ShardRequests (although it looks like it wouldn't be to hard to
add this ability). At any rate, your approach of returning a single
ShardRequest containing all relevant shards sounds like the right one. I went
one step further and refactored {{TopGroupsShardRequestFactory.java}} because
there was significant code duplication in the class's two primary methods.
In my testing I also discovered a closely related problem. The bug is in the
data structure used to map search groups to the shards which contain them.
{{ResponseBuilder.searchGroupToShard}} assumes that a given search group only
resides on one shard. I could not find this assumption documented anywhere,
nor can I find a reason such a restriction need be imposed. This structure is
populated by {{SearchGroupShardResponseProcessor}}. There is a race condition
there, wherein the last shard to report a search group will be assumed to be
the only shard containing the search group. This data structure is used in
{{TopGroupsShardRequestFactory.createRequestForSpecificShards()}} to know which
shards to query. This means you can get a different set of shards to query
depending on shard query order.
I have changed the structure to allow a search group to be present in multiple
shards.
Patch to follow.
was (Author: rblack):
Martijn, I also noticed that {{TopGroupsShardResponseProcessor}} can't to
deal with multiple ShardRequests (although it looks like it wouldn't be to hard
to add this ability). At any rate, your approach of returning a single
ShardRequest containing all relevant shards sounds like the right one. I went
one step further and refactored {{TopGroupsShardRequestFactory.java}} because
there was significant code duplication in the class's two primary methods.
In my testing I also discovered a closely related problem. The bug is in the
data structure used to map search groups to the shards which contain them.
{{ResponseBuilder.searchGroupToShard}} assumes that a given search group only
resides on one shard. I could not find this assumption documented anywhere,
nor can I find a reason such a restriction need be imposed. This structure is
populated by {{SearchGroupShardResponseProcessor}}. There is a race condition
there, wherein the last shard to report a search group will be assumed to be
the only shard containing the search group. This data structure is used in
{{TopGroupsShardRequestFactory.createRequestForSpecificShards()}} to know which
shards to query. This means you can get a different set of shards to query
depending on shard query order.
I have changed the structure to allow a search group to be present in multiple
shards.
Patch to follow.
> group=true requests result in numerous redundant shard requests
> ---------------------------------------------------------------
>
> Key: SOLR-3109
> URL: https://issues.apache.org/jira/browse/SOLR-3109
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.5, 4.0
> Environment: 64-bit Linux, sharded environment
> Reporter: Russell Black
> Assignee: Martijn van Groningen
> Priority: Critical
> Labels: patch, performance
> Attachments: SOLR-3109.patch, SOLR-3109.patch, SOLR-3109.patch
>
>
> During the second phase of a group query, the collator sends a query to each
> of the shards. The purpose of this query is for shards to respond with the
> doc ids that match the set of group ids returned from the first phase. The
> problem is that it sends this second query to each shard multiple times.
> Specifically, in an environment with n shards, each shard will be hit with an
> identical query n times during the second phase of query processing,
> resulting in O(_n_ ^2^) performance where _n_ is the number of shards.
> I have traced this bug down to a single line in
> {{TopGroupsShardRequestFactory.java}}, and I am attaching a patch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]