[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Blum updated SOLR-7090: - Attachment: (was: SOLR-7090-fulljoin.patch) > Cross collection join > - > > Key: SOLR-7090 > URL: https://issues.apache.org/jira/browse/SOLR-7090 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya > Fix For: 5.2, Trunk > > Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch > > > Although SOLR-4905 supports joins across collections in Cloud mode, there are > limitations, (i) the secondary collection must be replicated at each node > where the primary collection has a replica, (ii) the secondary collection > must be singly sharded. > This issue explores ideas/possibilities of cross collection joins, even > across nodes. This will be helpful for users who wish to maintain boosts or > signals in a secondary, more frequently updated collection, and perform query > time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Blum updated SOLR-7090: - Attachment: (was: SOLR-7090-fulljoin.patch) > Cross collection join > - > > Key: SOLR-7090 > URL: https://issues.apache.org/jira/browse/SOLR-7090 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya > Fix For: 5.2, Trunk > > Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch > > > Although SOLR-4905 supports joins across collections in Cloud mode, there are > limitations, (i) the secondary collection must be replicated at each node > where the primary collection has a replica, (ii) the secondary collection > must be singly sharded. > This issue explores ideas/possibilities of cross collection joins, even > across nodes. This will be helpful for users who wish to maintain boosts or > signals in a secondary, more frequently updated collection, and perform query > time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Blum updated SOLR-7090: - Attachment: SOLR-7090-fulljoin.patch All tests passing I think. > Cross collection join > - > > Key: SOLR-7090 > URL: https://issues.apache.org/jira/browse/SOLR-7090 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya > Fix For: 5.2, Trunk > > Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch > > > Although SOLR-4905 supports joins across collections in Cloud mode, there are > limitations, (i) the secondary collection must be replicated at each node > where the primary collection has a replica, (ii) the secondary collection > must be singly sharded. > This issue explores ideas/possibilities of cross collection joins, even > across nodes. This will be helpful for users who wish to maintain boosts or > signals in a secondary, more frequently updated collection, and perform query > time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Blum updated SOLR-7090: - Attachment: SOLR-7090-fulljoin.patch Tests passing. I'm doing something kind of hacky to avoid the auto-warm. > Cross collection join > - > > Key: SOLR-7090 > URL: https://issues.apache.org/jira/browse/SOLR-7090 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya > Fix For: 5.2, Trunk > > Attachments: SOLR-7090-fulljoin.patch, SOLR-7090-fulljoin.patch, > SOLR-7090.patch > > > Although SOLR-4905 supports joins across collections in Cloud mode, there are > limitations, (i) the secondary collection must be replicated at each node > where the primary collection has a replica, (ii) the secondary collection > must be singly sharded. > This issue explores ideas/possibilities of cross collection joins, even > across nodes. This will be helpful for users who wish to maintain boosts or > signals in a secondary, more frequently updated collection, and perform query > time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Blum updated SOLR-7090: - Attachment: SOLR-7090-fulljoin.patch I have this basically working as a QParser. Under the hood, it uses a distributed Facet query to collect the appropriate term list, when it then applies to the local core. I can't get all the random tests to work, though, and I'm not sure what I'm doing wrong. I'm getting a different set of failures on trunk than I was getting on a similar patch against ~5.2.1. On trunk, the final result set tends to have too few documents in it, (e.g. 10 != 7), even though the fulljoin is actually recording that it found 10 docs. I've been digging on this but haven't figured it out yet. On ~5.2.1, I was getting a different failure related to caching. On index clear + commit, a fulljoin query result would get cached, and subsequent commits would not invalidate the result, so by the time a query would be performed, it would miss all but the first few docs. Any help would be much appreciated! > Cross collection join > - > > Key: SOLR-7090 > URL: https://issues.apache.org/jira/browse/SOLR-7090 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya > Fix For: 5.2, Trunk > > Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch > > > Although SOLR-4905 supports joins across collections in Cloud mode, there are > limitations, (i) the secondary collection must be replicated at each node > where the primary collection has a replica, (ii) the secondary collection > must be singly sharded. > This issue explores ideas/possibilities of cross collection joins, even > across nodes. This will be helpful for users who wish to maintain boosts or > signals in a secondary, more frequently updated collection, and perform query > time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-7090: --- Issue Type: New Feature (was: Bug) Cross collection join - Key: SOLR-7090 URL: https://issues.apache.org/jira/browse/SOLR-7090 Project: Solr Issue Type: New Feature Reporter: Ishan Chattopadhyaya Fix For: 5.1 Attachments: SOLR-7090.patch Although SOLR-4905 supports joins across collections in Cloud mode, there are limitations, (i) the secondary collection must be replicated at each node where the primary collection has a replica, (ii) the secondary collection must be singly sharded. This issue explores ideas/possibilities of cross collection joins, even across nodes. This will be helpful for users who wish to maintain boosts or signals in a secondary, more frequently updated collection, and perform query time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7090) Cross collection join
[ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-7090: --- Attachment: SOLR-7090.patch Here's an implementation for this using a value source, backed by a per core cache. Here's how to use: Add this to solrconfig.xml's query section, cache name=join class=solr.LRUCache size=4096 initialSize=1024 autowarmCount=1024 regenerator=org.apache.solr.util.SolrPluginUtils$IdentityRegenerator / At query time, the coljoin function can be used: coljoin(fromCollection,fromKey,fromVal,toKey) fromCollection: the name of the secondary/from collection to be joined from fromKey: the field name of the foreign key in the from collection to be joined against fromVal: the field name of the value to be returned from from collection toKey: the field name of the key in primary collection to be joined against Implementation details: All values from the secondary collection are fetched at the primary collection's cores and cached into an LRU join cache. An executor thread runs continuously in the background to update the cache (by fetching values again from secondary collection) at specified intervals (in this patch this is 2000ms). Cross collection join - Key: SOLR-7090 URL: https://issues.apache.org/jira/browse/SOLR-7090 Project: Solr Issue Type: Bug Reporter: Ishan Chattopadhyaya Fix For: 5.1 Attachments: SOLR-7090.patch Although SOLR-4905 supports joins across collections in Cloud mode, there are limitations, (i) the secondary collection must be replicated at each node where the primary collection has a replica, (ii) the secondary collection must be singly sharded. This issue explores ideas/possibilities of cross collection joins, even across nodes. This will be helpful for users who wish to maintain boosts or signals in a secondary, more frequently updated collection, and perform query time join of these boosts/signals with results from the primary collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org