Re: Cross-node joins

Erick Erickson Fri, 25 Sep 2015 15:49:33 -0700

yeah, the streaming stuff is pretty bleeding-edge but pretty cool.

Your understanding is accurate, the pathological case is the reason
it's not been implemented in core Solr. I suppose you could do exactly
what you outlined, just with two queries.


for SOLR-4095, why would this affect sharding for your main collection?
The groups collection is just a separate collection, I don't see why you
think it would affect sharding of the main collection. That just means I
don't understand your problem probably...

Best,
Erick

On Fri, Sep 25, 2015 at 12:42 PM, Scott Blum <dragonsi...@gmail.com> wrote:

> Yep, we looked at that, but unfortunately the frequency of group updates
> and number of users would it infeasible to reindex all group members any
> time a group changes.
>
> On Fri, Sep 25, 2015 at 3:36 PM, Alexandre Rafalovitch <arafa...@gmail.com
> > wrote:
>
>> How often do the group characteristics change? Because you might be
>> better off flattening this at the index time. As in.
>> Users->characteristics, rather than Users->Groups->characteristics.
>> And update the users when the group characteristics change. And if
>> characteristics are non-stored but only indexed or - better-yet? -
>> docvalues, you will not pay much for it with space either.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 25 September 2015 at 15:30, Scott Blum <dragonsi...@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > Thanks for the thoughtful reply!
>> >
>> > The context is essentially that I have Groups and Users, and a User can
>> > belong to multiple groups.  So if I need to do a query like "Find all
>> Users
>> > who are members of a Group, for which the Group has certain
>> > characteristics", then I need to do something like {!join from=GroupId
>> > to=UserGroupIds}GroupPermission:admin.  We've already sharded our corpus
>> > such that any given user and that user's associate data have to be on
>> the
>> > same core, but we can't shard the groups that way, since a user could
>> belong
>> > to multiple groups.
>> >
>> > Thanks for the pointer to SOLR-4905, that would probably work for us,
>> as we
>> > could put all the group docs into a separate collection, replicate it
>> > everywhere, and do local cross-collection joins.  My main worry there is
>> > that having to shard our data in such a way to support this one case
>> would
>> > be a lot of extra operational work over time, and lock us into a pretty
>> > proscriptive data architecture just to solve this one issue.
>> >
>> > SOLR-7090 is closer to what I was hoping for.  Perhaps I could do
>> something
>> > to help that effort.  I didn't realize that existed, I've been looking
>> at
>> > LUCENE-3759 and wondering how to make that go.
>> >
>> >> In essence, This Is A Hard Problem in the Solr world to
>> >> make performant. You'd have to get all of the date from the "from"
>> >> core across the wire to the "to" node, potentially this would
>> >> be the entire corpus.
>> >
>> >
>> > Hopefully it wouldn't be that bad?  My understanding of how queries are
>> > really processed is pretty naive, but I'm imagining that if you have a
>> top
>> > level query containing a collection-wide join, you'd make one
>> distributed
>> > request (to all shards) to resolve the  join into a term query, then a
>> > second one to process the top level request, sending the term list out
>> of
>> > each shard.  I get that there's a pathological case there where the
>> number
>> > of terms explodes, but in theory this wouldn't be too different from
>> > something you do from a client:
>> >
>> > 1) Run the join query as a facet query.  Instead of retrieving any docs,
>> > just facet the "from" field to get a term list.
>> > 2) Run a normal query with the resulting term list.
>> >
>> >>
>> >> You might look at some of the Streaming Aggregation stuff, that
>> >> has some capabilities here too.
>> >
>> >
>> > That's on my radar too.   I did start reading about it, but it looked
>> like
>> > joins were still Work-In-Progress (SOLR-7584), and at any rate the
>> streaming
>> > stuff seems so bleeding edge to me (the only doc I've been able to find
>> on
>> > it is from heliosearch) that I was daunted.
>> >
>> > Thanks!
>> > Scott
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

Re: Cross-node joins

Reply via email to