yeah, the streaming stuff is pretty bleeding-edge but pretty cool. Your understanding is accurate, the pathological case is the reason it's not been implemented in core Solr. I suppose you could do exactly what you outlined, just with two queries.
for SOLR-4095, why would this affect sharding for your main collection? The groups collection is just a separate collection, I don't see why you think it would affect sharding of the main collection. That just means I don't understand your problem probably... Best, Erick On Fri, Sep 25, 2015 at 12:42 PM, Scott Blum <dragonsi...@gmail.com> wrote: > Yep, we looked at that, but unfortunately the frequency of group updates > and number of users would it infeasible to reindex all group members any > time a group changes. > > On Fri, Sep 25, 2015 at 3:36 PM, Alexandre Rafalovitch <arafa...@gmail.com > > wrote: > >> How often do the group characteristics change? Because you might be >> better off flattening this at the index time. As in. >> Users->characteristics, rather than Users->Groups->characteristics. >> And update the users when the group characteristics change. And if >> characteristics are non-stored but only indexed or - better-yet? - >> docvalues, you will not pay much for it with space either. >> >> Regards, >> Alex. >> ---- >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> http://www.solr-start.com/ >> >> >> On 25 September 2015 at 15:30, Scott Blum <dragonsi...@gmail.com> wrote: >> > Hi Erick, >> > >> > Thanks for the thoughtful reply! >> > >> > The context is essentially that I have Groups and Users, and a User can >> > belong to multiple groups. So if I need to do a query like "Find all >> Users >> > who are members of a Group, for which the Group has certain >> > characteristics", then I need to do something like {!join from=GroupId >> > to=UserGroupIds}GroupPermission:admin. We've already sharded our corpus >> > such that any given user and that user's associate data have to be on >> the >> > same core, but we can't shard the groups that way, since a user could >> belong >> > to multiple groups. >> > >> > Thanks for the pointer to SOLR-4905, that would probably work for us, >> as we >> > could put all the group docs into a separate collection, replicate it >> > everywhere, and do local cross-collection joins. My main worry there is >> > that having to shard our data in such a way to support this one case >> would >> > be a lot of extra operational work over time, and lock us into a pretty >> > proscriptive data architecture just to solve this one issue. >> > >> > SOLR-7090 is closer to what I was hoping for. Perhaps I could do >> something >> > to help that effort. I didn't realize that existed, I've been looking >> at >> > LUCENE-3759 and wondering how to make that go. >> > >> >> In essence, This Is A Hard Problem in the Solr world to >> >> make performant. You'd have to get all of the date from the "from" >> >> core across the wire to the "to" node, potentially this would >> >> be the entire corpus. >> > >> > >> > Hopefully it wouldn't be that bad? My understanding of how queries are >> > really processed is pretty naive, but I'm imagining that if you have a >> top >> > level query containing a collection-wide join, you'd make one >> distributed >> > request (to all shards) to resolve the join into a term query, then a >> > second one to process the top level request, sending the term list out >> of >> > each shard. I get that there's a pathological case there where the >> number >> > of terms explodes, but in theory this wouldn't be too different from >> > something you do from a client: >> > >> > 1) Run the join query as a facet query. Instead of retrieving any docs, >> > just facet the "from" field to get a term list. >> > 2) Run a normal query with the resulting term list. >> > >> >> >> >> You might look at some of the Streaming Aggregation stuff, that >> >> has some capabilities here too. >> > >> > >> > That's on my radar too. I did start reading about it, but it looked >> like >> > joins were still Work-In-Progress (SOLR-7584), and at any rate the >> streaming >> > stuff seems so bleeding edge to me (the only doc I've been able to find >> on >> > it is from heliosearch) that I was daunted. >> > >> > Thanks! >> > Scott >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >