Re: Cross-node joins

Alexandre Rafalovitch Fri, 25 Sep 2015 12:38:15 -0700

How often do the group characteristics change? Because you might be
better off flattening this at the index time. As in.
Users->characteristics, rather than Users->Groups->characteristics.
And update the users when the group characteristics change. And if
characteristics are non-stored but only indexed or - better-yet? -
docvalues, you will not pay much for it with space either.


Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 25 September 2015 at 15:30, Scott Blum <[email protected]> wrote:
> Hi Erick,
>
> Thanks for the thoughtful reply!
>
> The context is essentially that I have Groups and Users, and a User can
> belong to multiple groups.  So if I need to do a query like "Find all Users
> who are members of a Group, for which the Group has certain
> characteristics", then I need to do something like {!join from=GroupId
> to=UserGroupIds}GroupPermission:admin.  We've already sharded our corpus
> such that any given user and that user's associate data have to be on the
> same core, but we can't shard the groups that way, since a user could belong
> to multiple groups.
>
> Thanks for the pointer to SOLR-4905, that would probably work for us, as we
> could put all the group docs into a separate collection, replicate it
> everywhere, and do local cross-collection joins.  My main worry there is
> that having to shard our data in such a way to support this one case would
> be a lot of extra operational work over time, and lock us into a pretty
> proscriptive data architecture just to solve this one issue.
>
> SOLR-7090 is closer to what I was hoping for.  Perhaps I could do something
> to help that effort.  I didn't realize that existed, I've been looking at
> LUCENE-3759 and wondering how to make that go.
>
>> In essence, This Is A Hard Problem in the Solr world to
>> make performant. You'd have to get all of the date from the "from"
>> core across the wire to the "to" node, potentially this would
>> be the entire corpus.
>
>
> Hopefully it wouldn't be that bad?  My understanding of how queries are
> really processed is pretty naive, but I'm imagining that if you have a top
> level query containing a collection-wide join, you'd make one distributed
> request (to all shards) to resolve the  join into a term query, then a
> second one to process the top level request, sending the term list out of
> each shard.  I get that there's a pathological case there where the number
> of terms explodes, but in theory this wouldn't be too different from
> something you do from a client:
>
> 1) Run the join query as a facet query.  Instead of retrieving any docs,
> just facet the "from" field to get a term list.
> 2) Run a normal query with the resulting term list.
>
>>
>> You might look at some of the Streaming Aggregation stuff, that
>> has some capabilities here too.
>
>
> That's on my radar too.   I did start reading about it, but it looked like
> joins were still Work-In-Progress (SOLR-7584), and at any rate the streaming
> stuff seems so bleeding edge to me (the only doc I've been able to find on
> it is from heliosearch) that I was daunted.
>
> Thanks!
> Scott
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Cross-node joins

Reply via email to