[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2017-04-24 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981797#comment-15981797
 ] 

Shikha Somani commented on SOLR-8297:
-

The proposal mentioned is fine except one point i.e. 'check that fromField and 
toField are router.keys in these collections'. This condition will hinder cases 
when routing key is different from toField/fromField. There are practical 
use-cases when these keys can be different. This will add SQL like restriction 
that join can be applied only on foreign key and foreign key has to be the 
primary key of the other collection.

Attached is an new enhanced patch:
* Ready for master branch (Solr 7)
* Adhering to proposal
* Improved test cases and well tested
* Taking ‘rangeCheck’ in raw parameters as well as in local parameters. This 
enables to join multiple (more than 2) collections having a mixed bag of 
composite and implicit collection
** eg - toCol (composite), fromComposite (composite), fromImplicit (implicit)
** Applying join between these 3 collections, is now supported with this new 
patch

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
> Attachments: SOLR-8297_Latest.patch, SOLR-8297.patch
>
>
> h2. Proposal
> h3. General Idea
> Approach [~shikhasomani]'s range check algorithm to the most cases
> h3. Join behavior depending on router types of joined collections
> || to\\from ||CompositeId||Implicit||
> ||CompositeId| shard range check, see table below | allow |
> ||Implicit| allow | shard to shard |
> h3. CompositeId to CompositeId join behaviour for certain number of shards
>  
> || to\\from ||single||>1||
> ||single| allow (as is) | allow (range check) |
> ||>1| allow (as is) | per shard range check |
> h3. Rules from the tables above
> * joining from/to CompositeId and Implicit is blindly allowed, it pick ups 
> any collocated replica, because users who do that probably understand what 
> they do.
> * when both sides are Implicit let's join shards by name. ie if request hits 
> collectionTO_shardY_replica2 at a node, the collocated 
> collectionFROM_shardY_replica* is expected.
> * when both sides are CompositeId
> ** from single shard to single shard - nobrainer, just needs collocated 
> replica;
> ** from multiple shards to single shard - all "from" shards (any it's 
> replicas) are picked for joining 
> ** from single shard to multiple shards - existing SOLR-4905 functionality
> ** from multiple to multiple - generic range check algorithm
> ### check that fromField and toField are router.keys in these collections
> ### take shard range for the current "to" collection replica (keep in mind 
> that request is distributed across "to" collection shards)   
> ### enumerate "from" collection shrads, find their subset which covers "to" 
> shard range (this allows to handle any number of shards at both sides)
> ### pickup collocated replicas of these "from" shard subset 
> h3. Caveat 
> this is quite sensitive to shard allocation (and/or replica placement) ie 
> failed "from" replica cannot be collocated with the required "to" shard.  
> h2. Initial Description
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem 

[jira] [Updated] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2017-04-24 Thread Shikha Somani (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikha Somani updated SOLR-8297:

Attachment: SOLR-8297_Latest.patch

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
> Attachments: SOLR-8297_Latest.patch, SOLR-8297.patch
>
>
> h2. Proposal
> h3. General Idea
> Approach [~shikhasomani]'s range check algorithm to the most cases
> h3. Join behavior depending on router types of joined collections
> || to\\from ||CompositeId||Implicit||
> ||CompositeId| shard range check, see table below | allow |
> ||Implicit| allow | shard to shard |
> h3. CompositeId to CompositeId join behaviour for certain number of shards
>  
> || to\\from ||single||>1||
> ||single| allow (as is) | allow (range check) |
> ||>1| allow (as is) | per shard range check |
> h3. Rules from the tables above
> * joining from/to CompositeId and Implicit is blindly allowed, it pick ups 
> any collocated replica, because users who do that probably understand what 
> they do.
> * when both sides are Implicit let's join shards by name. ie if request hits 
> collectionTO_shardY_replica2 at a node, the collocated 
> collectionFROM_shardY_replica* is expected.
> * when both sides are CompositeId
> ** from single shard to single shard - nobrainer, just needs collocated 
> replica;
> ** from multiple shards to single shard - all "from" shards (any it's 
> replicas) are picked for joining 
> ** from single shard to multiple shards - existing SOLR-4905 functionality
> ** from multiple to multiple - generic range check algorithm
> ### check that fromField and toField are router.keys in these collections
> ### take shard range for the current "to" collection replica (keep in mind 
> that request is distributed across "to" collection shards)   
> ### enumerate "from" collection shrads, find their subset which covers "to" 
> shard range (this allows to handle any number of shards at both sides)
> ### pickup collocated replicas of these "from" shard subset 
> h3. Caveat 
> this is quite sensitive to shard allocation (and/or replica placement) ie 
> failed "from" replica cannot be collocated with the required "to" shard.  
> h2. Initial Description
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-11-30 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709882#comment-15709882
 ] 

Shikha Somani commented on SOLR-8297:
-

This patch is tested on 6.x also and it can be applied to 6.x.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
> Attachments: SOLR-8297.patch
>
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-07-07 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366512#comment-15366512
 ] 

Shikha Somani commented on SOLR-8297:
-

Please review these changes and let me know your thoughts on it. This issue is 
blocking our upgrade to Solr 5.x as it is directly impacting join functionality.

Appreciate your quick response on this.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-27 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351939#comment-15351939
 ] 

Shikha Somani commented on SOLR-8297:
-

Gentle reminder.

Please review above changes and merge if appropriate.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-24 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348901#comment-15348901
 ] 

Shikha Somani commented on SOLR-8297:
-

The above discussed changes are ready and committed. Its PR 
[https://github.com/apache/lucene-solr/pull/35]

Summary of changes:
* Two possible ways of selecting fromCollection shard
** fromCollection is singly sharded get its replica on given node. _Default 
option_ 
** fromCollection is sharded (same as toCollection) pick replica with matching 
range which is present on given node.
* Introduced a new parameter for indicating if range should be matched for 
selecting fromCollection. This is a boolean field with default value as "false"
* Added test cases

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-17 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337112#comment-15337112
 ] 

Shikha Somani commented on SOLR-8297:
-

Sure, will rename *Any* to avoid confusion. And will start work on this 
suggested solution.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-16 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334957#comment-15334957
 ] 

Shikha Somani edited comment on SOLR-8297 at 6/16/16 11:30 PM:
---

*Any* option is introduced to support existing cloud join scenario i.e. where 
_fromCollection is singly sharded_. If asserting Any’s behavior is the only 
concern, will write test cases for thorough verification. Below is a scenario 
which resembles real world and will write test case according to it.

*Scenario*: 
There are 2 collections in a 2 node cluster:
* product_category: It has values like books, toys, etc. _Singly sharded_
* sale: Holds information about current sale. Sale and product collection are 
related, sale collection contains ‘product key’. _Multi sharded_

*Query*: Find sale information with product information:
{!join from=id to =productKey fromCollection= product_category}

*Cluster information*:

||Node1| ||Node2|| ||
|Product_category_shard1_replica1|8000-7fff|Product_category_shard1_replica2|8000-7fff|
|Sale_shard1_replica1|0-7fff|Sale_shard2_replica1|8000-|

With this scenario join can be applied between Sale and Product_category only 
with “Any” condition only otherwise range check will fail, preventing join 
query.


was (Author: shikhasomani):
*Any* option is introduced to support existing cloud join scenario i.e. where 
fromCollection is singly sharded. If asserting Any’s behavior is the only 
concern, will write test cases for thorough verification. Below is a scenario 
which resembles real world and will write test case according to it.

*Scenario*: 
There are 2 collections in a 2 node cluster:
* product_category: It has values like books, toys, etc. _Singly sharded_
* sale: Holds information about current sale. Sale and product collection are 
related, sale collection contains ‘product key’. _Multi sharded_

*Query*: Find sale information with product information:
{!join from=id to =productKey fromCollection= product_category}

*Cluster information*:

||Node1| ||Node2|| ||
|Product_category_shard1_replica1|8000-7fff|Product_category_shard1_replica2|8000-7fff|
|Sale_shard1_replica1|0-7fff|Sale_shard2_replica1|8000-|

With this scenario join can be applied between Sale and Product_category only 
with “Any” condition only otherwise range check will fail, preventing join 
query.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-16 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334957#comment-15334957
 ] 

Shikha Somani commented on SOLR-8297:
-

*Any* option is introduced to support existing cloud join scenario i.e. where 
fromCollection is singly sharded. If asserting Any’s behavior is the only 
concern, will write test cases for thorough verification. Below is a scenario 
which resembles real world and will write test case according to it.

*Scenario*: 
There are 2 collections in a 2 node cluster:
* product_category: It has values like books, toys, etc. _Singly sharded_
* sale: Holds information about current sale. Sale and product collection are 
related, sale collection contains ‘product key’. _Multi sharded_

*Query*: Find sale information with product information:
{!join from=id to =productKey fromCollection= product_category}

*Cluster information*:

||Node1| ||Node2|| ||
|Product_category_shard1_replica1|8000-7fff|Product_category_shard1_replica2|8000-7fff|
|Sale_shard1_replica1|0-7fff|Sale_shard2_replica1|8000-|

With this scenario join can be applied between Sale and Product_category only 
with “Any” condition only otherwise range check will fail, preventing join 
query.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-15 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332988#comment-15332988
 ] 

Shikha Somani commented on SOLR-8297:
-

Gentle reminder for the above proposed solution.

Please let me know your thoughts on this so I can move ahead with this solution.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-13 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328320#comment-15328320
 ] 

Shikha Somani edited comment on SOLR-8297 at 6/13/16 9:33 PM:
--

Below are two proposed solutions to “Allow join query over 2 sharded 
collections” i.e. fixing the broken functionality in Solr 5.x. It is not an 
enhancement for supporting join on multiple shards present on same jvm.

*Proposed solution*: Two possible solutions:
*1. Distributed join with Range*: This will allow join with greater flexibility 
by considering range instead of shard name while selecting fromCollection 
replica. The current implementation requires fromCollection to be singly 
sharded, with this solution fromCollection can be either singly sharded, 
equally sharded (as toCollection) or it can overlap with toCollection range.

* *Solution details*: A new parameter “joinMode” will be introduced. This 
parameter will govern on what basis replica will be selected based on range.
Possible values of joinMode:
** *Exact*: The “fromCollection” shard range should exactly match with 
“toCollection” shard present on that node then only join will be applied 
between two collections. This is the _default_ value
** *Overlap*: Shard range of “fromCollection” should overlap with 
“toCollection” on given node. 
** *Any*: This option will not consider range check, it will pick any replica 
of fromCollection that is present on that node and apply join

*2. Non-distributed join*: The same way join worked in Solr 4.x. Client will 
mention exact replica of “fromCollection” with which join will be applied. It 
is required to pass  “distrib=false” in query parameters

If either of the solution is fine will submit a PR for that.


was (Author: shikhasomani):
Below are two proposed solutions to “Allow join query over 2 sharded 
collections” i.e. fixing the broken functionality in Solr 5.x. It is not an 
enhancement for supporting join on multiple shards present on same jvm.

*Proposed solution*: Two possible solutions:
# *Distributed join with Range*: This will allow join with greater flexibility 
by considering range instead of shard name (rigid criteria) while selecting 
fromCollection replica. The current implementation requires fromCollection to 
be singly sharded, with this solution fromCollection can be either singly 
sharded, equally sharded (as toCollection) or it can overlap with toCollection 
range.

** *Solution details*: A new parameter “joinMode” will be introduced. This 
parameter will govern on what basis replica will be selected based on range.
Possible values of joinMode:
#**Exact*: The “fromCollection” shard range should exactly match with 
“toCollection” shard present on that node then only join will be applied 
between two collections. This is the _default_ value
#**Overlap*: Shard range of “fromCollection” should overlap with “toCollection” 
on given node. 
#**Any*: This option will not consider range check, it will pick any replica of 
fromCollection that is present on that node and apply join
#*Non-distributed join*: The same way it worked in Solr 4.x. Client will 
mention exact replica of “fromCollection” with which join will be applied. It 
is required to pass  “distrib=false” in query parameters

If this solution is fine will submit a PR for this fix.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 

[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-06-13 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328320#comment-15328320
 ] 

Shikha Somani commented on SOLR-8297:
-

Below are two proposed solutions to “Allow join query over 2 sharded 
collections” i.e. fixing the broken functionality in Solr 5.x. It is not an 
enhancement for supporting join on multiple shards present on same jvm.

*Proposed solution*: Two possible solutions:
# *Distributed join with Range*: This will allow join with greater flexibility 
by considering range instead of shard name (rigid criteria) while selecting 
fromCollection replica. The current implementation requires fromCollection to 
be singly sharded, with this solution fromCollection can be either singly 
sharded, equally sharded (as toCollection) or it can overlap with toCollection 
range.

** *Solution details*: A new parameter “joinMode” will be introduced. This 
parameter will govern on what basis replica will be selected based on range.
Possible values of joinMode:
#**Exact*: The “fromCollection” shard range should exactly match with 
“toCollection” shard present on that node then only join will be applied 
between two collections. This is the _default_ value
#**Overlap*: Shard range of “fromCollection” should overlap with “toCollection” 
on given node. 
#**Any*: This option will not consider range check, it will pick any replica of 
fromCollection that is present on that node and apply join
#*Non-distributed join*: The same way it worked in Solr 4.x. Client will 
mention exact replica of “fromCollection” with which join will be applied. It 
is required to pass  “distrib=false” in query parameters

If this solution is fine will submit a PR for this fix.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-05-05 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273269#comment-15273269
 ] 

Shikha Somani commented on SOLR-8297:
-

No.
HttpSolrClient was the only way through which join queries can be done in Solr 
4.x. Join query was not supported in cloud mode, it threw exception: 
"Cross-core join: no such core"

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-05-05 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272929#comment-15272929
 ] 

Shikha Somani commented on SOLR-8297:
-

In Solr 4.x join queries, both "from" and "to" collections one shard was 
specified. 

Reason for this is:
In 4.x join query was performed using HttpSolrClient on a single node at a 
time. Because of using HttpSolrClient exact core name has to be specified for 
both "from" and "to" collection.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-05-04 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271577#comment-15271577
 ] 

Shikha Somani commented on SOLR-8297:
-

To avoid confusion about this fix below is a comparison between how join worked 
in various versions:

||Solr 4.x||Solr 5.x|
|Secondary collection can be well sharded. |Secondary collection should be 
singly sharded||
|​Secondary collection shard/replica should be present on each node where 
primary collection shards are |Secondary collection should be replicated on all 
nodes where primary is present|
|Join query should have core name of both the collections|Join query should 
have only collection name and not core name. Specifying core name will throw 
exception|

Because of the above mentioned differences Solr 5.x has lost backward 
compatibility for join queries. Making it *nearly impossible* to upgrade to 
Solr 5.x from Solr 4.x.

The provided solution is adding backward compatibility for join queries with 
following conditions:
* Single shard of both primary and secondary collection present on same node
* Both primary and secondary collection should have same numShards and 
replicationFactor

This fix is providing *backward compatibility* and is *not an enhancement* for 
above requirements. If required another defect can be opened for backward 
compatibility support and this fix can be part of the new defect. 

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-05-02 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267798#comment-15267798
 ] 

Shikha Somani commented on SOLR-8297:
-

Added test case to verify distributed join when secondary shard is not singly 
sharded but is equally sharded as primary. PR is ready for merge.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-04-29 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264694#comment-15264694
 ] 

Shikha Somani commented on SOLR-8297:
-

I tested this fix with various type of join like:
 - simple join between two collections (A -> B)
 - multi-hop join (join between A -> B -> C)
 - multi collection join (A -> B, A -> C) in single query

Even all testcases passed with the fix.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-04-27 Thread Shikha Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260994#comment-15260994
 ] 

Shikha Somani commented on SOLR-8297:
-

We have been facing this issue and the point#B is main area of concern. We have 
identified it's fix. To work further I would like the defect to be assigned to 
me so I can provide it's patch for review.



> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org