subject:"\[jira\] \[Updated\] \(SOLR\-8297\) Allow join query over 2 sharded collections\: enhance functionality and exception handling"

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2017-04-24 Thread Shikha Somani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikha Somani updated SOLR-8297:

Attachment: SOLR-8297_Latest.patch

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
> Attachments: SOLR-8297_Latest.patch, SOLR-8297.patch
>
>
> h2. Proposal
> h3. General Idea
> Approach [~shikhasomani]'s range check algorithm to the most cases
> h3. Join behavior depending on router types of joined collections
> || to\\from ||CompositeId||Implicit||
> ||CompositeId| shard range check, see table below | allow |
> ||Implicit| allow | shard to shard |
> h3. CompositeId to CompositeId join behaviour for certain number of shards
>  
> || to\\from ||single||>1||
> ||single| allow (as is) | allow (range check) |
> ||>1| allow (as is) | per shard range check |
> h3. Rules from the tables above
> * joining from/to CompositeId and Implicit is blindly allowed, it pick ups 
> any collocated replica, because users who do that probably understand what 
> they do.
> * when both sides are Implicit let's join shards by name. ie if request hits 
> collectionTO_shardY_replica2 at a node, the collocated 
> collectionFROM_shardY_replica* is expected.
> * when both sides are CompositeId
> ** from single shard to single shard - nobrainer, just needs collocated 
> replica;
> ** from multiple shards to single shard - all "from" shards (any it's 
> replicas) are picked for joining 
> ** from single shard to multiple shards - existing SOLR-4905 functionality
> ** from multiple to multiple - generic range check algorithm
> ### check that fromField and toField are router.keys in these collections
> ### take shard range for the current "to" collection replica (keep in mind 
> that request is distributed across "to" collection shards)   
> ### enumerate "from" collection shrads, find their subset which covers "to" 
> shard range (this allows to handle any number of shards at both sides)
> ### pickup collocated replicas of these "from" shard subset 
> h3. Caveat 
> this is quite sensitive to shard allocation (and/or replica placement) ie 
> failed "from" replica cannot be collocated with the required "to" shard.  
> h2. Initial Description
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2017-04-04 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-8297:
---
Description: 
h2. Proposal

h3. General Idea
Approach [~shikhasomani]'s range check algorithm to the most cases

h3. Join behavior depending on router types of joined collections
|| to\\from ||CompositeId||Implicit||
||CompositeId| shard range check, see table below | allow |
||Implicit| allow | shard to shard |

h3. CompositeId to CompositeId join behaviour for certain number of shards
 
|| to\\from ||single||>1||
||single| allow (as is) | allow (range check) |
||>1| allow (as is) | per shard range check |

h3. Rules from the tables above
* joining from/to CompositeId and Implicit is blindly allowed, it pick ups any 
collocated replica, because users who do that probably understand what they do.
* when both sides are Implicit let's join shards by name. ie if request hits 
collectionTO_shardY_replica2 at a node, the collocated 
collectionFROM_shardY_replica* is expected.
* when both sides are CompositeId
** from single shard to single shard - nobrainer, just needs collocated replica;
** from multiple shards to single shard - all "from" shards (any it's replicas) 
are picked for joining 
** from single shard to multiple shards - existing SOLR-4905 functionality
** from multiple to multiple - generic range check algorithm
### check that fromField and toField are router.keys in these collections
### take shard range for the current "to" collection replica (keep in mind that 
request is distributed across "to" collection shards)   
### enumerate "from" collection shrads, find their subset which covers "to" 
shard range (this allows to handle any number of shards at both sides)
### pickup collocated replicas of these "from" shard subset 

h3. Caveat 
this is quite sensitive to shard allocation (and/or replica placement) ie 
failed "from" replica cannot be collocated with the required "to" shard.  

h2. Initial Description
Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
Khludnev.
A) exception handling:
The exception "SolrCloud join: multiple shards not yet supported" thrown in the 
function findLocalReplicaForFromIndex of JoinQParserPlugin is not triggered 
correctly: In my use-case, I've a join on a facet.query and when my results are 
only found in 1 shard and the facet.query with the join is querying the last 
replica of the last slice, then the exception is not thrown.
I believe it's better to verify the nr of slices when we want to verify the  
"multiple shards not yet supported" exception (so exception is thrown when 
zkController.getClusterState().getSlices(fromIndex).size()>1).

B) functional enhancement:
I would expect that there is no problem to perform a cross-core join over 
sharded collections when the following conditions are met:
1) both collections are sharded with the same replicationFactor and numShards
2) router.field of the collections is set to the same "key-field" (collection 
of "fromindex" has router.field = "from" field and collection joined to has 
router.field = "to" field)

The router.field setup ensures that documents with the same "key-field" are 
routed to the same node. 
So the combination based on the "key-field" should always be available within 
the same node.

>From a user perspective, I believe these assumptions seem to be a "normal" 
>use-case in the cross-core join in SolrCloud.

Hope this helps

  was:
Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
Khludnev.
A) exception handling:
The exception "SolrCloud join: multiple shards not yet supported" thrown in the 
function findLocalReplicaForFromIndex of JoinQParserPlugin is not triggered 
correctly: In my use-case, I've a join on a facet.query and when my results are 
only found in 1 shard and the facet.query with the join is querying the last 
replica of the last slice, then the exception is not thrown.
I believe it's better to verify the nr of slices when we want to verify the  
"multiple shards not yet supported" exception (so exception is thrown when 
zkController.getClusterState().getSlices(fromIndex).size()>1).

B) functional enhancement:
I would expect that there is no problem to perform a cross-core join over 
sharded collections when the following conditions are met:
1) both collections are sharded with the same replicationFactor and numShards
2) router.field of the collections is set to the same "key-field" (collection 
of "fromindex" has router.field = "from" field and collection joined to has 
router.field = "to" field)

The router.field setup ensures that documents with the same "key-field" are 
routed to the same node. 
So the combination based on the "key-field" should always be available within 
the same node.

>From a user perspective, I believe these assumptions seem to be a "normal" 
>use-case in the cross-core join in SolrCloud.

Hope this helps

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-08-09 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-8297:
---
Attachment: SOLR-8297.patch

squashing pull request to the single patch to simplify review  

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
> Attachments: SOLR-8297.patch
>
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

3 matches

Site Navigation

Mail list logo

Footer information