[jira] [Updated] (SOLR-7090) Cross collection join

2015-10-09 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-7090:
-
Attachment: (was: SOLR-7090-fulljoin.patch)

> Cross collection join
> -
>
> Key: SOLR-7090
> URL: https://issues.apache.org/jira/browse/SOLR-7090
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are 
> limitations, (i) the secondary collection must be replicated at each node 
> where the primary collection has a replica, (ii) the secondary collection 
> must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even 
> across nodes. This will be helpful for users who wish to maintain boosts or 
> signals in a secondary, more frequently updated collection, and perform query 
> time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7090) Cross collection join

2015-10-09 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-7090:
-
Attachment: (was: SOLR-7090-fulljoin.patch)

> Cross collection join
> -
>
> Key: SOLR-7090
> URL: https://issues.apache.org/jira/browse/SOLR-7090
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are 
> limitations, (i) the secondary collection must be replicated at each node 
> where the primary collection has a replica, (ii) the secondary collection 
> must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even 
> across nodes. This will be helpful for users who wish to maintain boosts or 
> signals in a secondary, more frequently updated collection, and perform query 
> time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7090) Cross collection join

2015-10-09 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-7090:
-
Attachment: SOLR-7090-fulljoin.patch

All tests passing I think.

> Cross collection join
> -
>
> Key: SOLR-7090
> URL: https://issues.apache.org/jira/browse/SOLR-7090
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are 
> limitations, (i) the secondary collection must be replicated at each node 
> where the primary collection has a replica, (ii) the secondary collection 
> must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even 
> across nodes. This will be helpful for users who wish to maintain boosts or 
> signals in a secondary, more frequently updated collection, and perform query 
> time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7090) Cross collection join

2015-10-06 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-7090:
-
Attachment: SOLR-7090-fulljoin.patch

Tests passing.  I'm doing something kind of hacky to avoid the auto-warm.

> Cross collection join
> -
>
> Key: SOLR-7090
> URL: https://issues.apache.org/jira/browse/SOLR-7090
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-7090-fulljoin.patch, SOLR-7090-fulljoin.patch, 
> SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are 
> limitations, (i) the secondary collection must be replicated at each node 
> where the primary collection has a replica, (ii) the secondary collection 
> must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even 
> across nodes. This will be helpful for users who wish to maintain boosts or 
> signals in a secondary, more frequently updated collection, and perform query 
> time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7090) Cross collection join

2015-10-05 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-7090:
-
Attachment: SOLR-7090-fulljoin.patch

I have this basically working as a QParser.  Under the hood, it uses a 
distributed Facet query to collect the appropriate term list, when it then 
applies to the local core.

I can't get all the random tests to work, though, and I'm not sure what I'm 
doing wrong.  I'm getting a different set of failures on trunk than I was 
getting on a similar patch against ~5.2.1.

On trunk, the final result set tends to have too few documents in it, (e.g. 10 
!= 7), even though the fulljoin is actually recording that it found 10 docs.  
I've been digging on this but haven't figured it out yet.

On ~5.2.1, I was getting a different failure related to caching.  On index 
clear + commit, a fulljoin query result would get cached, and subsequent 
commits would not invalidate the result, so by the time a query would be 
performed, it would miss all but the first few docs.

Any help would be much appreciated!

> Cross collection join
> -
>
> Key: SOLR-7090
> URL: https://issues.apache.org/jira/browse/SOLR-7090
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are 
> limitations, (i) the secondary collection must be replicated at each node 
> where the primary collection has a replica, (ii) the secondary collection 
> must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even 
> across nodes. This will be helpful for users who wish to maintain boosts or 
> signals in a secondary, more frequently updated collection, and perform query 
> time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7090) Cross collection join

2015-02-25 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-7090:
---
Issue Type: New Feature  (was: Bug)

 Cross collection join
 -

 Key: SOLR-7090
 URL: https://issues.apache.org/jira/browse/SOLR-7090
 Project: Solr
  Issue Type: New Feature
Reporter: Ishan Chattopadhyaya
 Fix For: 5.1

 Attachments: SOLR-7090.patch


 Although SOLR-4905 supports joins across collections in Cloud mode, there are 
 limitations, (i) the secondary collection must be replicated at each node 
 where the primary collection has a replica, (ii) the secondary collection 
 must be singly sharded.
 This issue explores ideas/possibilities of cross collection joins, even 
 across nodes. This will be helpful for users who wish to maintain boosts or 
 signals in a secondary, more frequently updated collection, and perform query 
 time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7090) Cross collection join

2015-02-09 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-7090:
---
Attachment: SOLR-7090.patch

Here's an implementation for this using a value source, backed by a per core 
cache.

Here's how to use:

Add this to solrconfig.xml's query section,

cache name=join
class=solr.LRUCache
size=4096
initialSize=1024
autowarmCount=1024
   
regenerator=org.apache.solr.util.SolrPluginUtils$IdentityRegenerator
/

At query time, the coljoin function can be used:
coljoin(fromCollection,fromKey,fromVal,toKey)

fromCollection: the name of the secondary/from collection to be joined from
fromKey: the field name of the foreign key in the from collection to be 
joined against
fromVal: the field name of the value to be returned from from collection
toKey: the field name of the key in primary collection to be joined against 

Implementation details:
All values from the secondary collection are fetched at the primary 
collection's cores and cached into an LRU join cache. An executor thread runs 
continuously in the background to update the cache (by fetching values again 
from secondary collection) at specified intervals (in this patch this is 
2000ms).

 Cross collection join
 -

 Key: SOLR-7090
 URL: https://issues.apache.org/jira/browse/SOLR-7090
 Project: Solr
  Issue Type: Bug
Reporter: Ishan Chattopadhyaya
 Fix For: 5.1

 Attachments: SOLR-7090.patch


 Although SOLR-4905 supports joins across collections in Cloud mode, there are 
 limitations, (i) the secondary collection must be replicated at each node 
 where the primary collection has a replica, (ii) the secondary collection 
 must be singly sharded.
 This issue explores ideas/possibilities of cross collection joins, even 
 across nodes. This will be helpful for users who wish to maintain boosts or 
 signals in a secondary, more frequently updated collection, and perform query 
 time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org