How often do the group characteristics change? Because you might be better off flattening this at the index time. As in. Users->characteristics, rather than Users->Groups->characteristics. And update the users when the group characteristics change. And if characteristics are non-stored but only indexed or - better-yet? - docvalues, you will not pay much for it with space either.
Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 25 September 2015 at 15:30, Scott Blum <[email protected]> wrote: > Hi Erick, > > Thanks for the thoughtful reply! > > The context is essentially that I have Groups and Users, and a User can > belong to multiple groups. So if I need to do a query like "Find all Users > who are members of a Group, for which the Group has certain > characteristics", then I need to do something like {!join from=GroupId > to=UserGroupIds}GroupPermission:admin. We've already sharded our corpus > such that any given user and that user's associate data have to be on the > same core, but we can't shard the groups that way, since a user could belong > to multiple groups. > > Thanks for the pointer to SOLR-4905, that would probably work for us, as we > could put all the group docs into a separate collection, replicate it > everywhere, and do local cross-collection joins. My main worry there is > that having to shard our data in such a way to support this one case would > be a lot of extra operational work over time, and lock us into a pretty > proscriptive data architecture just to solve this one issue. > > SOLR-7090 is closer to what I was hoping for. Perhaps I could do something > to help that effort. I didn't realize that existed, I've been looking at > LUCENE-3759 and wondering how to make that go. > >> In essence, This Is A Hard Problem in the Solr world to >> make performant. You'd have to get all of the date from the "from" >> core across the wire to the "to" node, potentially this would >> be the entire corpus. > > > Hopefully it wouldn't be that bad? My understanding of how queries are > really processed is pretty naive, but I'm imagining that if you have a top > level query containing a collection-wide join, you'd make one distributed > request (to all shards) to resolve the join into a term query, then a > second one to process the top level request, sending the term list out of > each shard. I get that there's a pathological case there where the number > of terms explodes, but in theory this wouldn't be too different from > something you do from a client: > > 1) Run the join query as a facet query. Instead of retrieving any docs, > just facet the "from" field to get a term list. > 2) Run a normal query with the resulting term list. > >> >> You might look at some of the Streaming Aggregation stuff, that >> has some capabilities here too. > > > That's on my radar too. I did start reading about it, but it looked like > joins were still Work-In-Progress (SOLR-7584), and at any rate the streaming > stuff seems so bleeding edge to me (the only doc I've been able to find on > it is from heliosearch) that I was daunted. > > Thanks! > Scott > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
