SOLR grouped query sorting on numFound
We ran into 1 snag during development with SOLR and I thought I'd run it by anyone to see if they had any slick ways to solve this issue. Basically, we're performing a SOLR query with grouping and want to be able to sort by the number of documents found within each group. Our query response from SOLR looks something like this: { "responseHeader":{ "status":0, "QTime":17, "params":{ "indent":"true", "q":"*:*", "group.limit":"0", "group.field":"rfp_stub", "group":"true", "wt":"json", "rows":"1000"}}, "grouped":{ "rfp_stub":{ "matches":18470, "groups":[{ "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e", "doclist":{"*numFound*":3,"start":0,"docs":[] }}, { "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce", "doclist":{"*numFound*":5,"start":0,"docs":[] }}, { "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131", "doclist":{"*numFound*":6,"start":0,"docs":[] }}, … The *numFound* shows the number of documents within that group. Is there anyway to perform a sort on *numFound* in SOLR ? I don't believe this is supported, but wondered if anyone their has come across this and if there was any suggested workarounds given that the dataset is really too large to hold in memory on our app servers?
Re: SOLR grouped query sorting on numFound
Hmmm, just specifying &sort= is _almost_ what you want, except it sorts by the value of fields in the doc not numFound. this shouldn't be hard to do on the client though, but you'd have to return all the groups... FWIW, Erick On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan wrote: > We ran into 1 snag during development with SOLR and I thought I'd run it by > anyone to see if they had any slick ways to solve this issue. > > Basically, we're performing a SOLR query with grouping and want to be able > to sort by the number of documents found within each group. > > Our query response from SOLR looks something like this: > > { > > "responseHeader":{ > > "status":0, > > "QTime":17, > > "params":{ > > "indent":"true", > > "q":"*:*", > > "group.limit":"0", > > "group.field":"rfp_stub", > > "group":"true", > > "wt":"json", > > "rows":"1000"}}, > > "grouped":{ > > "rfp_stub":{ > > "matches":18470, > > "groups":[{ > > > "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e", > > "doclist":{"*numFound*":3,"start":0,"docs":[] > > }}, > > { > > > "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce", > > "doclist":{"*numFound*":5,"start":0,"docs":[] > > }}, > > { > > > "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131", > > "doclist":{"*numFound*":6,"start":0,"docs":[] > > }}, > > … > > > The *numFound* shows the number of documents within that group. Is there > anyway to perform a sort on *numFound* in SOLR ? I don't believe this is > supported, but wondered if anyone their has come across this and if there > was any suggested workarounds given that the dataset is really too large to > hold in memory on our app servers?
Re: SOLR grouped query sorting on numFound
ya, that's the problem... you can't sort by "numFound" and it's not feasible to do the sort on the client because the grouped result set is too large. Brent On Wed, Sep 25, 2013 at 6:09 AM, Erick Erickson wrote: > Hmmm, just specifying &sort= is _almost_ what you want, > except it sorts by the value of fields in the doc not numFound. > > this shouldn't be hard to do on the client though, but you'd > have to return all the groups... > > FWIW, > Erick > > On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan wrote: > > We ran into 1 snag during development with SOLR and I thought I'd run it > by > > anyone to see if they had any slick ways to solve this issue. > > > > Basically, we're performing a SOLR query with grouping and want to be > able > > to sort by the number of documents found within each group. > > > > Our query response from SOLR looks something like this: > > > > { > > > > "responseHeader":{ > > > > "status":0, > > > > "QTime":17, > > > > "params":{ > > > > "indent":"true", > > > > "q":"*:*", > > > > "group.limit":"0", > > > > "group.field":"rfp_stub", > > > > "group":"true", > > > > "wt":"json", > > > > "rows":"1000"}}, > > > > "grouped":{ > > > > "rfp_stub":{ > > > > "matches":18470, > > > > "groups":[{ > > > > > > "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e", > > > > "doclist":{"*numFound*":3,"start":0,"docs":[] > > > > }}, > > > > { > > > > > > "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce", > > > > "doclist":{"*numFound*":5,"start":0,"docs":[] > > > > }}, > > > > { > > > > > > "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131", > > > > "doclist":{"*numFound*":6,"start":0,"docs":[] > > > > }}, > > > > … > > > > > > The *numFound* shows the number of documents within that group. Is there > > anyway to perform a sort on *numFound* in SOLR ? I don't believe this is > > supported, but wondered if anyone their has come across this and if there > > was any suggested workarounds given that the dataset is really too large > to > > hold in memory on our app servers? >
Re: SOLR grouped query sorting on numFound
but if it's too large on the client, wouldn't it also be too large on the server? After all, you have to hold the entire set of groups in memory since you can't know ahead of time which will be the largest. Or at least the counts of them all. I suppose you could do some two-pass process where you returned 1 doc/group with absolutely minimal data (like score and ID) and then issued a second query that got the data to display if (and only if) that suited your use-case. Otherwise I'm afraid you're into custom Solr code Best, Erick On Wed, Sep 25, 2013 at 6:40 AM, Brent Ryan wrote: > ya, that's the problem... you can't sort by "numFound" and it's not > feasible to do the sort on the client because the grouped result set is too > large. > > Brent > > > On Wed, Sep 25, 2013 at 6:09 AM, Erick Erickson > wrote: > >> Hmmm, just specifying &sort= is _almost_ what you want, >> except it sorts by the value of fields in the doc not numFound. >> >> this shouldn't be hard to do on the client though, but you'd >> have to return all the groups... >> >> FWIW, >> Erick >> >> On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan wrote: >> > We ran into 1 snag during development with SOLR and I thought I'd run it >> by >> > anyone to see if they had any slick ways to solve this issue. >> > >> > Basically, we're performing a SOLR query with grouping and want to be >> able >> > to sort by the number of documents found within each group. >> > >> > Our query response from SOLR looks something like this: >> > >> > { >> > >> > "responseHeader":{ >> > >> > "status":0, >> > >> > "QTime":17, >> > >> > "params":{ >> > >> > "indent":"true", >> > >> > "q":"*:*", >> > >> > "group.limit":"0", >> > >> > "group.field":"rfp_stub", >> > >> > "group":"true", >> > >> > "wt":"json", >> > >> > "rows":"1000"}}, >> > >> > "grouped":{ >> > >> > "rfp_stub":{ >> > >> > "matches":18470, >> > >> > "groups":[{ >> > >> > >> > "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e", >> > >> > "doclist":{"*numFound*":3,"start":0,"docs":[] >> > >> > }}, >> > >> > { >> > >> > >> > "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce", >> > >> > "doclist":{"*numFound*":5,"start":0,"docs":[] >> > >> > }}, >> > >> > { >> > >> > >> > "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131", >> > >> > "doclist":{"*numFound*":6,"start":0,"docs":[] >> > >> > }}, >> > >> > … >> > >> > >> > The *numFound* shows the number of documents within that group. Is there >> > anyway to perform a sort on *numFound* in SOLR ? I don't believe this is >> > supported, but wondered if anyone their has come across this and if there >> > was any suggested workarounds given that the dataset is really too large >> to >> > hold in memory on our app servers? >>