[
https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205740#comment-15205740
]
Erick Erickson commented on SOLR-7090:
--------------------------------------
Disclaimer: I only skimmed this patch and the patch for SOLR-7341, so take this
with a grain of salt.
Both of these seem, from my limited review to form a query against the "from"
collection, return some kind of representation of the matched docs then apply
those to the "to" query. What I'm wondering is if this is really the right way
to go for these kinds of operations or whether the Streaming Aggregation
process is better.
My concern is mostly that there's a fair bit of complexity here, and I'm very
suspicious of the performance across large Solr collections, especially for the
"from" collections.
I'd be reluctant to see this functionality go into Solr without some
performance numbers. Since we're now regularly seeing Solr used with very large
corpi I have to ask whether this is complexity we want to add (and then
support). I'd at least like to see what kinds of use-cases are solved by this
functionality that aren't handled by Streaming Aggregation and/or whether we
could implement this functionality with Streaming Aggregation instead.
The discussion changes if there are use-cases this functionality supports that
we can't implement with a Streaming Aggregation solution, I'd just like to see
them enumerated before we jump in with both feet.
> Cross collection join
> ---------------------
>
> Key: SOLR-7090
> URL: https://issues.apache.org/jira/browse/SOLR-7090
> Project: Solr
> Issue Type: New Feature
> Reporter: Ishan Chattopadhyaya
> Fix For: 5.2, master
>
> Attachments: SOLR-7090-fulljoin.patch, SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are
> limitations, (i) the secondary collection must be replicated at each node
> where the primary collection has a replica, (ii) the secondary collection
> must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even
> across nodes. This will be helpful for users who wish to maintain boosts or
> signals in a secondary, more frequently updated collection, and perform query
> time join of these boosts/signals with results from the primary collection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]