[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling
[ https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shikha Somani updated SOLR-8297: Attachment: SOLR-8297_Latest.patch > Allow join query over 2 sharded collections: enhance functionality and > exception handling > - > > Key: SOLR-8297 > URL: https://issues.apache.org/jira/browse/SOLR-8297 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 5.3 >Reporter: Paul Blanchaert > Attachments: SOLR-8297_Latest.patch, SOLR-8297.patch > > > h2. Proposal > h3. General Idea > Approach [~shikhasomani]'s range check algorithm to the most cases > h3. Join behavior depending on router types of joined collections > || to\\from ||CompositeId||Implicit|| > ||CompositeId| shard range check, see table below | allow | > ||Implicit| allow | shard to shard | > h3. CompositeId to CompositeId join behaviour for certain number of shards > > || to\\from ||single||>1|| > ||single| allow (as is) | allow (range check) | > ||>1| allow (as is) | per shard range check | > h3. Rules from the tables above > * joining from/to CompositeId and Implicit is blindly allowed, it pick ups > any collocated replica, because users who do that probably understand what > they do. > * when both sides are Implicit let's join shards by name. ie if request hits > collectionTO_shardY_replica2 at a node, the collocated > collectionFROM_shardY_replica* is expected. > * when both sides are CompositeId > ** from single shard to single shard - nobrainer, just needs collocated > replica; > ** from multiple shards to single shard - all "from" shards (any it's > replicas) are picked for joining > ** from single shard to multiple shards - existing SOLR-4905 functionality > ** from multiple to multiple - generic range check algorithm > ### check that fromField and toField are router.keys in these collections > ### take shard range for the current "to" collection replica (keep in mind > that request is distributed across "to" collection shards) > ### enumerate "from" collection shrads, find their subset which covers "to" > shard range (this allows to handle any number of shards at both sides) > ### pickup collocated replicas of these "from" shard subset > h3. Caveat > this is quite sensitive to shard allocation (and/or replica placement) ie > failed "from" replica cannot be collocated with the required "to" shard. > h2. Initial Description > Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail > Khludnev. > A) exception handling: > The exception "SolrCloud join: multiple shards not yet supported" thrown in > the function findLocalReplicaForFromIndex of JoinQParserPlugin is not > triggered correctly: In my use-case, I've a join on a facet.query and when my > results are only found in 1 shard and the facet.query with the join is > querying the last replica of the last slice, then the exception is not thrown. > I believe it's better to verify the nr of slices when we want to verify the > "multiple shards not yet supported" exception (so exception is thrown when > zkController.getClusterState().getSlices(fromIndex).size()>1). > B) functional enhancement: > I would expect that there is no problem to perform a cross-core join over > sharded collections when the following conditions are met: > 1) both collections are sharded with the same replicationFactor and numShards > 2) router.field of the collections is set to the same "key-field" (collection > of "fromindex" has router.field = "from" field and collection joined to has > router.field = "to" field) > The router.field setup ensures that documents with the same "key-field" are > routed to the same node. > So the combination based on the "key-field" should always be available within > the same node. > From a user perspective, I believe these assumptions seem to be a "normal" > use-case in the cross-core join in SolrCloud. > Hope this helps -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling
[ https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-8297: --- Description: h2. Proposal h3. General Idea Approach [~shikhasomani]'s range check algorithm to the most cases h3. Join behavior depending on router types of joined collections || to\\from ||CompositeId||Implicit|| ||CompositeId| shard range check, see table below | allow | ||Implicit| allow | shard to shard | h3. CompositeId to CompositeId join behaviour for certain number of shards || to\\from ||single||>1|| ||single| allow (as is) | allow (range check) | ||>1| allow (as is) | per shard range check | h3. Rules from the tables above * joining from/to CompositeId and Implicit is blindly allowed, it pick ups any collocated replica, because users who do that probably understand what they do. * when both sides are Implicit let's join shards by name. ie if request hits collectionTO_shardY_replica2 at a node, the collocated collectionFROM_shardY_replica* is expected. * when both sides are CompositeId ** from single shard to single shard - nobrainer, just needs collocated replica; ** from multiple shards to single shard - all "from" shards (any it's replicas) are picked for joining ** from single shard to multiple shards - existing SOLR-4905 functionality ** from multiple to multiple - generic range check algorithm ### check that fromField and toField are router.keys in these collections ### take shard range for the current "to" collection replica (keep in mind that request is distributed across "to" collection shards) ### enumerate "from" collection shrads, find their subset which covers "to" shard range (this allows to handle any number of shards at both sides) ### pickup collocated replicas of these "from" shard subset h3. Caveat this is quite sensitive to shard allocation (and/or replica placement) ie failed "from" replica cannot be collocated with the required "to" shard. h2. Initial Description Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail Khludnev. A) exception handling: The exception "SolrCloud join: multiple shards not yet supported" thrown in the function findLocalReplicaForFromIndex of JoinQParserPlugin is not triggered correctly: In my use-case, I've a join on a facet.query and when my results are only found in 1 shard and the facet.query with the join is querying the last replica of the last slice, then the exception is not thrown. I believe it's better to verify the nr of slices when we want to verify the "multiple shards not yet supported" exception (so exception is thrown when zkController.getClusterState().getSlices(fromIndex).size()>1). B) functional enhancement: I would expect that there is no problem to perform a cross-core join over sharded collections when the following conditions are met: 1) both collections are sharded with the same replicationFactor and numShards 2) router.field of the collections is set to the same "key-field" (collection of "fromindex" has router.field = "from" field and collection joined to has router.field = "to" field) The router.field setup ensures that documents with the same "key-field" are routed to the same node. So the combination based on the "key-field" should always be available within the same node. >From a user perspective, I believe these assumptions seem to be a "normal" >use-case in the cross-core join in SolrCloud. Hope this helps was: Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail Khludnev. A) exception handling: The exception "SolrCloud join: multiple shards not yet supported" thrown in the function findLocalReplicaForFromIndex of JoinQParserPlugin is not triggered correctly: In my use-case, I've a join on a facet.query and when my results are only found in 1 shard and the facet.query with the join is querying the last replica of the last slice, then the exception is not thrown. I believe it's better to verify the nr of slices when we want to verify the "multiple shards not yet supported" exception (so exception is thrown when zkController.getClusterState().getSlices(fromIndex).size()>1). B) functional enhancement: I would expect that there is no problem to perform a cross-core join over sharded collections when the following conditions are met: 1) both collections are sharded with the same replicationFactor and numShards 2) router.field of the collections is set to the same "key-field" (collection of "fromindex" has router.field = "from" field and collection joined to has router.field = "to" field) The router.field setup ensures that documents with the same "key-field" are routed to the same node. So the combination based on the "key-field" should always be available within the same node. >From a user perspective, I believe these assumptions seem to be a "normal" >use-case in the cross-core join in SolrCloud. Hope this helps
[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling
[ https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-8297: --- Attachment: SOLR-8297.patch squashing pull request to the single patch to simplify review > Allow join query over 2 sharded collections: enhance functionality and > exception handling > - > > Key: SOLR-8297 > URL: https://issues.apache.org/jira/browse/SOLR-8297 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 5.3 >Reporter: Paul Blanchaert > Attachments: SOLR-8297.patch > > > Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail > Khludnev. > A) exception handling: > The exception "SolrCloud join: multiple shards not yet supported" thrown in > the function findLocalReplicaForFromIndex of JoinQParserPlugin is not > triggered correctly: In my use-case, I've a join on a facet.query and when my > results are only found in 1 shard and the facet.query with the join is > querying the last replica of the last slice, then the exception is not thrown. > I believe it's better to verify the nr of slices when we want to verify the > "multiple shards not yet supported" exception (so exception is thrown when > zkController.getClusterState().getSlices(fromIndex).size()>1). > B) functional enhancement: > I would expect that there is no problem to perform a cross-core join over > sharded collections when the following conditions are met: > 1) both collections are sharded with the same replicationFactor and numShards > 2) router.field of the collections is set to the same "key-field" (collection > of "fromindex" has router.field = "from" field and collection joined to has > router.field = "to" field) > The router.field setup ensures that documents with the same "key-field" are > routed to the same node. > So the combination based on the "key-field" should always be available within > the same node. > From a user perspective, I believe these assumptions seem to be a "normal" > use-case in the cross-core join in SolrCloud. > Hope this helps -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org