Re: Joins with SolrCloud
Mugeesh, You can use Streaming Aggregation to provide various types of cross-collection joins. This is currently available in trunk and will be a part of Solr 6. To follow with your example, let's assume the following setup: Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345 Users: avail on machine2:8983 with 2 shards, zk at zk2:2345 Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345 You could send a streaming query to solr that would return all reviews for restaurants in NYC and include the user's hometown hashJoin( innerJoin( search(users, q="*:*", fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345", qt="/export"), search(reviews, q="*:*", fl="userId, review, score", sort="userId asc", zkHost="zk1:2345", qt="/export"), on="userId" ), hashed=search(restaurants, q="city:nyc", fl="restaurantId, restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"), on="restaurantId" ) Note that the # of shards doesn't matter and doesn't need to be considered as a part of your query. Were you to send this off to a url for result, it'd look like this http://machine1:8983/solr/users/stream?stream= <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers>[the expression above] Additional information about Streaming API, Streaming Aggregation, and Streaming Expressions can be found at https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, though this is currently incomplete as a lot of the new features have yet to be added to the documentation. For those interested, joins were added under tickets https://issues.apache.org/jira/browse/SOLR-7584 and https://issues.apache.org/jira/browse/SOLR-8188. - Dennis On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote: > I have create 3 cores on same machine using solrlcoud. > core: Restaurant,User,Review > each of core has only 1 shards and 2 replicas. > > Question > 1.) It is possible to use join among 3 of cores on same machine( or > different machine) > 2.)I am struggling how to use join among 3 of core in solrlcoud mode. > > Client: is not interested to de-normalized data. > > Give some suggestion how to solved that problem. > > Thanks > Mugeesh > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Joins with SolrCloud
Something I forgot to mention - the collection shards can live on any number of machines, anywhere in the world. As long as the clusterstate in zk knows where the shard can be found (ie, a basis of SolrCloud) then everything will work. The example I gave had the shards living on the same machine but that is not a requirement. On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dpg...@gmail.com> wrote: > Mugeesh, > > You can use Streaming Aggregation to provide various types of > cross-collection joins. This is currently available in trunk and will be a > part of Solr 6. > > To follow with your example, let's assume the following setup: > Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345 > Users: avail on machine2:8983 with 2 shards, zk at zk2:2345 > Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345 > > You could send a streaming query to solr that would return all reviews for > restaurants in NYC and include the user's hometown > > hashJoin( > innerJoin( > search(users, q="*:*", fl="userId, full_name, hometown", sort="userId > asc", zkHost="zk2:2345", qt="/export"), > search(reviews, q="*:*", fl="userId, review, score", sort="userId > asc", zkHost="zk1:2345", qt="/export"), > on="userId" > ), > hashed=search(restaurants, q="city:nyc", fl="restaurantId, > restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"), > on="restaurantId" > ) > > Note that the # of shards doesn't matter and doesn't need to be considered > as a part of your query. Were you to send this off to a url for result, > it'd look like this > > http://machine1:8983/solr/users/stream?stream= > <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers>[the > expression above] > > Additional information about Streaming API, Streaming Aggregation, and > Streaming Expressions can be found at > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, > though this is currently incomplete as a lot of the new features have yet > to be added to the documentation. > > For those interested, joins were added under tickets > https://issues.apache.org/jira/browse/SOLR-7584 and > https://issues.apache.org/jira/browse/SOLR-8188. > > - Dennis > > > On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote: > >> I have create 3 cores on same machine using solrlcoud. >> core: Restaurant,User,Review >> each of core has only 1 shards and 2 replicas. >> >> Question >> 1.) It is possible to use join among 3 of cores on same machine( or >> different machine) >> 2.)I am struggling how to use join among 3 of core in solrlcoud mode. >> >> Client: is not interested to de-normalized data. >> >> Give some suggestion how to solved that problem. >> >> Thanks >> Mugeesh >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > >
Re: Joins with SolrCloud
You can also do the innerJoin in parallel across worker nodes using the parallel function: hashJoin( parallel(workerCollection, innerJoin( search(users, q="*:*", fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345", qt="/export" partitionKeys="userId"), search(reviews, q="*:*", fl="userId, review, score", sort="userId asc", zkHost="zk1:2345", qt="/export" partitionKeys="userId"), on="userId" ), workers="20", zkHost="zk1:2345", sort="userId asc" ), hashed=search(restaurants, q="city:nyc", fl="restaurantId, restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"), on="restaurantId" ) The parallel function will return the tuples from the innerJoin which is performed on 20 workers in this example. The worker nodes will be selected from "workerCollection" which can be any SolrCloud collection with enough nodes. The "partitionKeys" parameter has been added to searches so that results with the same userId are shuffled to the same worker node. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dpg...@gmail.com> wrote: > Mugeesh, > > You can use Streaming Aggregation to provide various types of > cross-collection joins. This is currently available in trunk and will be a > part of Solr 6. > > To follow with your example, let's assume the following setup: > Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345 > Users: avail on machine2:8983 with 2 shards, zk at zk2:2345 > Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345 > > You could send a streaming query to solr that would return all reviews for > restaurants in NYC and include the user's hometown > > hashJoin( > innerJoin( > search(users, q="*:*", fl="userId, full_name, hometown", sort="userId > asc", zkHost="zk2:2345", qt="/export"), > search(reviews, q="*:*", fl="userId, review, score", sort="userId asc", > zkHost="zk1:2345", qt="/export"), > on="userId" > ), > hashed=search(restaurants, q="city:nyc", fl="restaurantId, > restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"), > on="restaurantId" > ) > > Note that the # of shards doesn't matter and doesn't need to be considered > as a part of your query. Were you to send this off to a url for result, > it'd look like this > > http://machine1:8983/solr/users/stream?stream= > <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers > >[the > expression above] > > Additional information about Streaming API, Streaming Aggregation, and > Streaming Expressions can be found at > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, > though this is currently incomplete as a lot of the new features have yet > to be added to the documentation. > > For those interested, joins were added under tickets > https://issues.apache.org/jira/browse/SOLR-7584 and > https://issues.apache.org/jira/browse/SOLR-8188. > > - Dennis > > > On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote: > > > I have create 3 cores on same machine using solrlcoud. > > core: Restaurant,User,Review > > each of core has only 1 shards and 2 replicas. > > > > Question > > 1.) It is possible to use join among 3 of cores on same machine( or > > different machine) > > 2.)I am struggling how to use join among 3 of core in solrlcoud mode. > > > > Client: is not interested to de-normalized data. > > > > Give some suggestion how to solved that problem. > > > > Thanks > > Mugeesh > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > >
Joins with SolrCloud
I have create 3 cores on same machine using solrlcoud. core: Restaurant,User,Review each of core has only 1 shards and 2 replicas. Question 1.) It is possible to use join among 3 of cores on same machine( or different machine) 2.)I am struggling how to use join among 3 of core in solrlcoud mode. Client: is not interested to de-normalized data. Give some suggestion how to solved that problem. Thanks Mugeesh -- View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Joins with SolrCloud
Hi Yonik, Thanks for the reply. It was very helpful. This may be a newb question but will this work on a individual rows of a query or do all the queries' results need to be on the same shard. ex. if the main query would return - user15 (shard 1) - user16 (shard 2) - user17 (shard 3) is it acceptable to have doc1 (shard 1) whatever (shard 2) yeah (shard 3) for a join of - user15, doc1 - user16, whatever - user17, yeah or do all the results of the main query need to reside on the same shard as all the results of join. Hopefully that's an understandable question. Thanks, slevytam -- View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Joins with SolrCloud
Yes, joins support distributed search fine, provided that the individual documents that are joined reside on the same shard. For example, if you are modeling blogs and posts (one blog object as many posts) shard1 -- joe!blog_info joe!post1 shard2 -- mary!blog_info mary!post1 So now you can search for post bodies and join to the main blog via {!join from=blog_pointer to=blog_id}post_body:hello If both mary and joe have a post with hello, they will both be found and joined to their main blog info docs with a single distributed search across the collection. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 3:37 AM, slevytam developm...@the10thfloor.com wrote: Hi Yonik, Thanks for the reply. It was very helpful. This may be a newb question but will this work on a individual rows of a query or do all the queries' results need to be on the same shard. ex. if the main query would return - user15 (shard 1) - user16 (shard 2) - user17 (shard 3) is it acceptable to have doc1 (shard 1) whatever (shard 2) yeah (shard 3) for a join of - user15, doc1 - user16, whatever - user17, yeah or do all the results of the main query need to reside on the same shard as all the results of join. Hopefully that's an understandable question. Thanks, slevytam -- View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Joins with SolrCloud
Hi Yonik, Can you offer any insight as to how one might ensure that documents reside on the same shard as the document you'd like them to join. For example: I'd like to do a simple join of user actions to a specific document. So, i would query for a list of documents and have the user actions on that document be attached for the current user. ie. liked etc Because I haven't figured out a reasonable way to accomplish this in Solr, I've been looking elsewhere. ie parent/child in elasticsearch. However, I have no problem forcing the appropriate useractions document to the same shard as the parent document, if I knew how. Thanks, slevytam -- View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075310.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Joins with SolrCloud
On Wed, Jul 3, 2013 at 5:40 PM, slevytam developm...@the10thfloor.com wrote: Hi Yonik, Can you offer any insight as to how one might ensure that documents reside on the same shard as the document you'd like them to join. For example: I'd like to do a simple join of user actions to a specific document. So, i would query for a list of documents and have the user actions on that document be attached for the current user. ie. liked etc The easiest way using the default document routing in solr cloud (which is compositeId hashing), is to just use the same prefix on the ID field for all related documents. This will route them all to the same shard. for example, the following unique IDs will all be routed to the same shard: user15!doc1 user15!whatever user15!yeah http://searchhub.org/2013/06/13/solr-cloud-document-routing/ Or you can use custom sharding and directly assign documents to shards... but if the default compositeId solution works for you, use that. -Yonik http://lucidworks.com
Re: Joins with SolrCloud
Thanks, confirmed by trying w/ 4.3.1 that the join works with the outer collection distributed/sharded so long as the inner collection is not distributed/sharded. Chris On Tue, Jun 25, 2013 at 4:55 PM, Upayavira u...@odoko.co.uk wrote: I have never heard mention that joins support distributed search, so you cannot do a join against a sharded core. However, if from your example, innerCollection was replicated across all nodes, I would think that should work, because all that comes back from each server when a distributed search happens is the best 'n' matches, so exactly how those 'n' matches were located doesn't matter particularly. Simpler answer: try it! Upayavira On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote: What are the restrictions/limitations w.r.t. joins when using SolrCloud? Say I have a 3-node cluster and both my outer and inner collections are sharded 3 ways across the cluster. Could I do a query such as select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection? Or if the above isn't supported, would it be if the inner collection was not sharded and was replicated across all 3 nodes, so that it existed in its entirety on each node? thx, Chris
Re: Joins with SolrCloud
On Tue, Jun 25, 2013 at 7:55 PM, Upayavira u...@odoko.co.uk wrote: However, if from your example, innerCollection was replicated across all nodes, I would think that should work, because all that comes back from each server when a distributed search happens is the best 'n' matches, so exactly how those 'n' matches were located doesn't matter particularly. Yes, joins would only join documents residing on the same shard. Distributed search with joins should work fine provided that you have co-located documents you want to join. -Yonik http://lucidworks.com
Joins with SolrCloud
What are the restrictions/limitations w.r.t. joins when using SolrCloud? Say I have a 3-node cluster and both my outer and inner collections are sharded 3 ways across the cluster. Could I do a query such as select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection? Or if the above isn't supported, would it be if the inner collection was not sharded and was replicated across all 3 nodes, so that it existed in its entirety on each node? thx, Chris
Re: Joins with SolrCloud
I have never heard mention that joins support distributed search, so you cannot do a join against a sharded core. However, if from your example, innerCollection was replicated across all nodes, I would think that should work, because all that comes back from each server when a distributed search happens is the best 'n' matches, so exactly how those 'n' matches were located doesn't matter particularly. Simpler answer: try it! Upayavira On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote: What are the restrictions/limitations w.r.t. joins when using SolrCloud? Say I have a 3-node cluster and both my outer and inner collections are sharded 3 ways across the cluster. Could I do a query such as select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection? Or if the above isn't supported, would it be if the inner collection was not sharded and was replicated across all 3 nodes, so that it existed in its entirety on each node? thx, Chris
RE: Joins with SolrCloud
My understanding is the same that {!join...} does not work in SolrCloud (aka distributed search) based on: 1. https://issues.apache.org/jira/browse/LUCENE-3759 2. http://wiki.apache.org/solr/DistributedSearch --- see Limitations section which refers to the JIRA above -- James -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Tuesday, June 25, 2013 7:55 PM To: solr-user@lucene.apache.org Subject: Re: Joins with SolrCloud I have never heard mention that joins support distributed search, so you cannot do a join against a sharded core. However, if from your example, innerCollection was replicated across all nodes, I would think that should work, because all that comes back from each server when a distributed search happens is the best 'n' matches, so exactly how those 'n' matches were located doesn't matter particularly. Simpler answer: try it! Upayavira On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote: What are the restrictions/limitations w.r.t. joins when using SolrCloud? Say I have a 3-node cluster and both my outer and inner collections are sharded 3 ways across the cluster. Could I do a query such as select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection? Or if the above isn't supported, would it be if the inner collection was not sharded and was replicated across all 3 nodes, so that it existed in its entirety on each node? thx, Chris