Re: Joins with SolrCloud

2015-12-11 Thread Dennis Gove
Mugeesh,

You can use Streaming Aggregation to provide various types of
cross-collection joins. This is currently available in trunk and will be a
part of Solr 6.

To follow with your example, let's assume the following setup:
Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345

You could send a streaming query to solr that would return all reviews for
restaurants in NYC and include the user's hometown

hashJoin(
  innerJoin(
search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
asc", zkHost="zk2:2345", qt="/export"),
search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
zkHost="zk1:2345", qt="/export"),
on="userId"
  ),
  hashed=search(restaurants, q="city:nyc", fl="restaurantId,
restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
  on="restaurantId"
)

Note that the # of shards doesn't matter and doesn't need to be considered
as a part of your query. Were you to send this off to a url for result,
it'd look like this

http://machine1:8983/solr/users/stream?stream=
<http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers>[the
expression above]

Additional information about Streaming API, Streaming Aggregation, and
Streaming Expressions can be found at
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
though this is currently incomplete as a lot of the new features have yet
to be added to the documentation.

For those interested, joins were added under tickets
https://issues.apache.org/jira/browse/SOLR-7584 and
https://issues.apache.org/jira/browse/SOLR-8188.

- Dennis


On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote:

> I have create 3 cores  on same machine using solrlcoud.
> core: Restaurant,User,Review
> each of core has only 1 shards and 2 replicas.
>
> Question
> 1.) It is possible to use join among 3 of cores on same machine( or
> different machine)
> 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
>
> Client: is not interested to de-normalized data.
>
> Give some suggestion how to solved that problem.
>
> Thanks
> Mugeesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Joins with SolrCloud

2015-12-11 Thread Dennis Gove
Something I forgot to mention - the collection shards can live on any
number of machines, anywhere in the world. As long as the clusterstate in
zk knows where the shard can be found (ie, a basis of SolrCloud) then
everything will work. The example I gave had the shards living on the same
machine but that is not a requirement.

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dpg...@gmail.com> wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
> search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
> search(reviews, q="*:*", fl="userId, review, score", sort="userId
> asc", zkHost="zk1:2345", qt="/export"),
> on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
> <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers>[the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote:
>
>> I have create 3 cores  on same machine using solrlcoud.
>> core: Restaurant,User,Review
>> each of core has only 1 shards and 2 replicas.
>>
>> Question
>> 1.) It is possible to use join among 3 of cores on same machine( or
>> different machine)
>> 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
>>
>> Client: is not interested to de-normalized data.
>>
>> Give some suggestion how to solved that problem.
>>
>> Thanks
>> Mugeesh
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Joins with SolrCloud

2015-12-11 Thread Joel Bernstein
You can also do the innerJoin in parallel across worker nodes using the
parallel function:

hashJoin(
parallel(workerCollection,
innerJoin(
search(users, q="*:*",
fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345",
qt="/export" partitionKeys="userId"),
search(reviews, q="*:*",
fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
qt="/export" partitionKeys="userId"),
on="userId"
),
 workers="20",
 zkHost="zk1:2345",
 sort="userId asc"
 ),
   hashed=search(restaurants, q="city:nyc",
fl="restaurantId, restaurantName",
sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
   on="restaurantId"
)

The parallel function will return the tuples from the innerJoin which is
performed on 20 workers in this example. The worker nodes will be selected
from "workerCollection" which can be any SolrCloud collection with enough
nodes. The "partitionKeys" parameter has been added to searches so that
results with the same userId are shuffled to the same worker node.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dpg...@gmail.com> wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
> search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
> search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
> zkHost="zk1:2345", qt="/export"),
> on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
> <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers
> >[the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote:
>
> > I have create 3 cores  on same machine using solrlcoud.
> > core: Restaurant,User,Review
> > each of core has only 1 shards and 2 replicas.
> >
> > Question
> > 1.) It is possible to use join among 3 of cores on same machine( or
> > different machine)
> > 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
> >
> > Client: is not interested to de-normalized data.
> >
> > Give some suggestion how to solved that problem.
> >
> > Thanks
> > Mugeesh
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Joins with SolrCloud

2015-12-07 Thread Mugeesh Husain
I have create 3 cores  on same machine using solrlcoud.
core: Restaurant,User,Review 
each of core has only 1 shards and 2 replicas.

Question
1.) It is possible to use join among 3 of cores on same machine( or
different machine)
2.)I am struggling how to use join among 3 of core in solrlcoud mode.

Client: is not interested to de-normalized data.

Give some suggestion how to solved that problem.

Thanks
Mugeesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Joins with SolrCloud

2013-07-04 Thread slevytam
Hi Yonik,

Thanks for the reply.  It was very helpful.

This may be a newb question but will this work on a individual rows of a
query or do all the queries' results need to be on the same shard.

ex.

if the main query would return 
- user15 (shard 1)
- user16 (shard 2)
- user17 (shard 3)

is it acceptable to have 
doc1 (shard 1)
whatever (shard 2)
yeah (shard 3)

for a join of 
- user15, doc1
- user16, whatever
- user17, yeah

or do all the results of the main query need to reside on the same shard as
all the results of join.

Hopefully that's an understandable question.

Thanks,

slevytam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Joins with SolrCloud

2013-07-04 Thread Yonik Seeley
Yes, joins support distributed search fine,
provided that the individual documents that are joined reside on the same shard.

For example, if you are modeling blogs and posts (one blog object as many posts)

shard1
--
joe!blog_info
joe!post1

shard2
--
mary!blog_info
mary!post1


So now you can search for post bodies and join to the main blog via
{!join from=blog_pointer to=blog_id}post_body:hello

If both mary and joe have a post with hello, they will both be found
and joined to their main blog info docs with a single distributed
search across the collection.

-Yonik
http://lucidworks.com


On Thu, Jul 4, 2013 at 3:37 AM, slevytam developm...@the10thfloor.com wrote:
 Hi Yonik,

 Thanks for the reply.  It was very helpful.

 This may be a newb question but will this work on a individual rows of a
 query or do all the queries' results need to be on the same shard.

 ex.

 if the main query would return
 - user15 (shard 1)
 - user16 (shard 2)
 - user17 (shard 3)

 is it acceptable to have
 doc1 (shard 1)
 whatever (shard 2)
 yeah (shard 3)

 for a join of
 - user15, doc1
 - user16, whatever
 - user17, yeah

 or do all the results of the main query need to reside on the same shard as
 all the results of join.

 Hopefully that's an understandable question.

 Thanks,

 slevytam



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Joins with SolrCloud

2013-07-03 Thread slevytam
Hi Yonik,

Can you offer any insight as to how one might ensure that documents reside
on the same shard as the document you'd like them to join.

For example:
I'd like to do a simple join of user actions to a specific document.  So, i
would query for a list of documents and have the user actions on that
document be attached for the current user.  ie. liked etc

Because I haven't figured out a reasonable way to accomplish this in Solr,
I've been looking elsewhere.  ie parent/child in elasticsearch.  However, I
have no problem forcing the appropriate useractions document to the same
shard as the parent document, if I knew how.

Thanks,

slevytam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075310.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Joins with SolrCloud

2013-07-03 Thread Yonik Seeley
On Wed, Jul 3, 2013 at 5:40 PM, slevytam developm...@the10thfloor.com wrote:
 Hi Yonik,

 Can you offer any insight as to how one might ensure that documents reside
 on the same shard as the document you'd like them to join.

 For example:
 I'd like to do a simple join of user actions to a specific document.  So, i
 would query for a list of documents and have the user actions on that
 document be attached for the current user.  ie. liked etc

The easiest way using the default document routing in solr cloud
(which is compositeId hashing), is to just use the same prefix on the
ID field for all related documents.  This will route them all to the
same shard.

for example, the following unique IDs will all be routed to the same shard:

user15!doc1
user15!whatever
user15!yeah

http://searchhub.org/2013/06/13/solr-cloud-document-routing/

Or you can use custom sharding and directly assign documents to
shards... but if the default compositeId solution works for you, use
that.

-Yonik
http://lucidworks.com


Re: Joins with SolrCloud

2013-06-28 Thread Chris Toomey
Thanks, confirmed by trying w/ 4.3.1 that the join works with the outer
collection distributed/sharded so long as the inner collection is not
distributed/sharded.

Chris


On Tue, Jun 25, 2013 at 4:55 PM, Upayavira u...@odoko.co.uk wrote:

 I have never heard mention that joins support distributed search, so you
 cannot do a join against a sharded core.

 However, if from your example, innerCollection was replicated across all
 nodes, I would think that should work, because all that comes back from
 each server when a distributed search happens is the best 'n' matches,
 so exactly how those 'n' matches were located doesn't matter
 particularly.

 Simpler answer: try it!

 Upayavira

 On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
  What are the restrictions/limitations w.r.t. joins when using SolrCloud?
 
  Say I have a 3-node cluster and both my outer and inner collections
  are
  sharded 3 ways across the cluster.  Could I do a query such as
 
 select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection?
 
  Or if the above isn't supported, would it be if the inner collection
  was
  not sharded and was replicated across all 3 nodes, so that it existed in
  its entirety on each node?
 
  thx,
  Chris



Re: Joins with SolrCloud

2013-06-28 Thread Yonik Seeley
On Tue, Jun 25, 2013 at 7:55 PM, Upayavira u...@odoko.co.uk wrote:
 However, if from your example, innerCollection was replicated across all
 nodes, I would think that should work, because all that comes back from
 each server when a distributed search happens is the best 'n' matches,
 so exactly how those 'n' matches were located doesn't matter
 particularly.

Yes, joins would only join documents residing on the same shard.
Distributed search with joins should work fine provided that you have
co-located documents you want to join.

-Yonik
http://lucidworks.com


Joins with SolrCloud

2013-06-25 Thread Chris Toomey
What are the restrictions/limitations w.r.t. joins when using SolrCloud?

Say I have a 3-node cluster and both my outer and inner collections are
sharded 3 ways across the cluster.  Could I do a query such as
select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection?

Or if the above isn't supported, would it be if the inner collection was
not sharded and was replicated across all 3 nodes, so that it existed in
its entirety on each node?

thx,
Chris


Re: Joins with SolrCloud

2013-06-25 Thread Upayavira
I have never heard mention that joins support distributed search, so you
cannot do a join against a sharded core.

However, if from your example, innerCollection was replicated across all
nodes, I would think that should work, because all that comes back from
each server when a distributed search happens is the best 'n' matches,
so exactly how those 'n' matches were located doesn't matter
particularly.

Simpler answer: try it!

Upayavira

On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
 What are the restrictions/limitations w.r.t. joins when using SolrCloud?
 
 Say I have a 3-node cluster and both my outer and inner collections
 are
 sharded 3 ways across the cluster.  Could I do a query such as
 select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection?
 
 Or if the above isn't supported, would it be if the inner collection
 was
 not sharded and was replicated across all 3 nodes, so that it existed in
 its entirety on each node?
 
 thx,
 Chris


RE: Joins with SolrCloud

2013-06-25 Thread James Thomas
My understanding is the same that {!join...} does not work in SolrCloud (aka 
distributed search)
based on:
1.  https://issues.apache.org/jira/browse/LUCENE-3759
2. http://wiki.apache.org/solr/DistributedSearch
--- see Limitations section which refers to the JIRA above


-- James

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Tuesday, June 25, 2013 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Joins with SolrCloud

I have never heard mention that joins support distributed search, so you cannot 
do a join against a sharded core.

However, if from your example, innerCollection was replicated across all nodes, 
I would think that should work, because all that comes back from each server 
when a distributed search happens is the best 'n' matches, so exactly how those 
'n' matches were located doesn't matter particularly.

Simpler answer: try it!

Upayavira

On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
 What are the restrictions/limitations w.r.t. joins when using SolrCloud?
 
 Say I have a 3-node cluster and both my outer and inner 
 collections are sharded 3 ways across the cluster.  Could I do a query 
 such as 
 select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foocollection=outerCollection?
 
 Or if the above isn't supported, would it be if the inner collection 
 was not sharded and was replicated across all 3 nodes, so that it 
 existed in its entirety on each node?
 
 thx,
 Chris