Re: Simulate facet.exists for json query facets
>If all of those facet queries are _known_ to be a performance hit, you might be able to do something custom.That would require custom code though and I wouldn’t go there unless you can demonstrate need. Yeah ... indeed if those facet queries are relatively static (and thus cacheable ... even if there are a lot of them), an appropriately-sized filterCache would allow them to be cached to good effect and then the performance hit should be negligible. Knowing what the queries are up front, you could even add them to your warming queries. It'd also be unusual (though possible, sure?) to run these kinds of facet queries with no intention of ever conditionally following up in a way that would want the actual results/docSet -- even if the initial/more common query only cares about boolean existence. The case in which this type of functionality really might be indicated is: 1. only care about boolean result (obvious, ok) 2. dynamic (i.e., not-particularly-cacheable) queries 3. never intend to follow up with a request that calls for full results If both of the first two conditions hold, and especially if the third also holds, there would in principle definitely be efficiency to be gained by early termination (and avoiding the creation of a DocSet, which at the moment happens unconditionally for every facet query). I'm also thinking about this through the lens of bringing the JSON Facet API to parity with the legacy facet API, fwiw ... On Fri, Oct 30, 2020 at 9:02 AM Erick Erickson wrote: > > I don’t think there’s anything to do what you’re asking OOB. > > If all of those facet queries are _known_ to be a performance hit, > you might be able to do something custom.That would require > custom code though and I wouldn’t go there unless you can > demonstrate need. > > If you issue a debug=timing you’ll see the time each component > takes, and there’s a separate entry for faceting so that’ll give you > a clue whether it’s worth the effort. > > Best, > Erick > > > On Oct 30, 2020, at 8:10 AM, Michael Gibney > > wrote: > > > > Michael, sorry for the confusion; I was positing a *hypothetical* > > "exists()" function that doesn't currently exist, that *is* an > > aggregate function, and the *does* stop early. I didn't account for > > the fact that there's already an "exists()" function *query* that > > behaves very differently. So yes, definitely confusing :-). I guess > > choosing a different name for the proposed aggregate function would > > make sense. I was suggesting it mostly as an alternative to extending > > the syntax of JSON Facet "query" facet type, and to say that I think > > the implementation of such an aggregate function would be pretty > > straightforward. > > > > On Fri, Oct 30, 2020 at 3:44 AM michael dürr wrote: > >> > >> @Erick > >> > >> Sorry! I chose a simple example as I wanted to reduce complexity. > >> In detail: > >> * We have distinct contents like tours, offers, events, etc which > >> themselves may be categorized: A tour may be a hiking tour, a > >> mountaineering tour, ... > >> * We have hundreds of customers that want to facet their searches to that > >> content types but often with distinct combinations of categories, i.e. > >> customer A wants his facet "tours" to only count hiking tours, customer B > >> only mountaineering tours, customer C a combination of both, etc > >> * We use "query" facets as each facet request will be build dynamically (it > >> is not feasible to aggregate certain categories and add them as an > >> additional solr schema field as we have hundreds of different > >> combinations). > >> * Anyways, our ui only requires adding a toggle to filter for (for example) > >> "tours" in case a facet result is present. We do not care about the number > >> of tours. > >> * As we have millions of contents and dozens of content types (and dozens > >> of categories per content type) such queries may take a very long time. > >> > >> A complex example may look like this: > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> *q=*:*&json.facet={ tour:{ type : query, q: \"+categoryId:(21450 > >> 21453)\" }, guide:{ type : query, q: \"+categoryId:(21105 21401 > >> 21301 21302 21303 21304 21305 21403 21404)\" }, story:{ type : > >> query, q: \"+categoryId:21515\" }, condition:{ type : query, > >> q: \"+categoryId:21514\" }, hut:{ type : query, q: > >> \"+categoryId:8510\" }, skiresort:{ type : query, q: > >> \"+categoryId:21493\" }, offer:{ type : query, q: > >> \"+categoryId:21462\" }, lodging:{ type : query, q: > >> \"+categoryId:6061\" }, event:{ type : query, q: > >> \"+categoryId:21465\" }, poi:{ type : query, q: > >> \"+(+categoryId:6000 -c
Re: Simulate facet.exists for json query facets
I don’t think there’s anything to do what you’re asking OOB. If all of those facet queries are _known_ to be a performance hit, you might be able to do something custom.That would require custom code though and I wouldn’t go there unless you can demonstrate need. If you issue a debug=timing you’ll see the time each component takes, and there’s a separate entry for faceting so that’ll give you a clue whether it’s worth the effort. Best, Erick > On Oct 30, 2020, at 8:10 AM, Michael Gibney wrote: > > Michael, sorry for the confusion; I was positing a *hypothetical* > "exists()" function that doesn't currently exist, that *is* an > aggregate function, and the *does* stop early. I didn't account for > the fact that there's already an "exists()" function *query* that > behaves very differently. So yes, definitely confusing :-). I guess > choosing a different name for the proposed aggregate function would > make sense. I was suggesting it mostly as an alternative to extending > the syntax of JSON Facet "query" facet type, and to say that I think > the implementation of such an aggregate function would be pretty > straightforward. > > On Fri, Oct 30, 2020 at 3:44 AM michael dürr wrote: >> >> @Erick >> >> Sorry! I chose a simple example as I wanted to reduce complexity. >> In detail: >> * We have distinct contents like tours, offers, events, etc which >> themselves may be categorized: A tour may be a hiking tour, a >> mountaineering tour, ... >> * We have hundreds of customers that want to facet their searches to that >> content types but often with distinct combinations of categories, i.e. >> customer A wants his facet "tours" to only count hiking tours, customer B >> only mountaineering tours, customer C a combination of both, etc >> * We use "query" facets as each facet request will be build dynamically (it >> is not feasible to aggregate certain categories and add them as an >> additional solr schema field as we have hundreds of different combinations). >> * Anyways, our ui only requires adding a toggle to filter for (for example) >> "tours" in case a facet result is present. We do not care about the number >> of tours. >> * As we have millions of contents and dozens of content types (and dozens >> of categories per content type) such queries may take a very long time. >> >> A complex example may look like this: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *q=*:*&json.facet={ tour:{ type : query, q: \"+categoryId:(21450 >> 21453)\" }, guide:{ type : query, q: \"+categoryId:(21105 21401 >> 21301 21302 21303 21304 21305 21403 21404)\" }, story:{ type : >> query, q: \"+categoryId:21515\" }, condition:{ type : query, >> q: \"+categoryId:21514\" }, hut:{ type : query, q: >> \"+categoryId:8510\" }, skiresort:{ type : query, q: >> \"+categoryId:21493\" }, offer:{ type : query, q: >> \"+categoryId:21462\" }, lodging:{ type : query, q: >> \"+categoryId:6061\" }, event:{ type : query, q: >> \"+categoryId:21465\" }, poi:{ type : query, q: >> \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\" }, authors:{ >> type : query, q: \"+categoryId:(21205 21206)\" }, partners:{ >> type : query, q: \"+categoryId:21200\" }, list:{ type : >> query, q: \"+categoryId:21481\" } }\&rows=0"* >> >> @Michael >> >> Thanks for your suggestion but this does not work as >> * the facet module expects an aggregate function (which i simply added by >> embracing your call with sum(...)) >> * and (please correct me if I am wrong) the exists() function not stops on >> the first match, but counts the number of results for which the query >> matches a document.
Re: Simulate facet.exists for json query facets
Michael, sorry for the confusion; I was positing a *hypothetical* "exists()" function that doesn't currently exist, that *is* an aggregate function, and the *does* stop early. I didn't account for the fact that there's already an "exists()" function *query* that behaves very differently. So yes, definitely confusing :-). I guess choosing a different name for the proposed aggregate function would make sense. I was suggesting it mostly as an alternative to extending the syntax of JSON Facet "query" facet type, and to say that I think the implementation of such an aggregate function would be pretty straightforward. On Fri, Oct 30, 2020 at 3:44 AM michael dürr wrote: > > @Erick > > Sorry! I chose a simple example as I wanted to reduce complexity. > In detail: > * We have distinct contents like tours, offers, events, etc which > themselves may be categorized: A tour may be a hiking tour, a > mountaineering tour, ... > * We have hundreds of customers that want to facet their searches to that > content types but often with distinct combinations of categories, i.e. > customer A wants his facet "tours" to only count hiking tours, customer B > only mountaineering tours, customer C a combination of both, etc > * We use "query" facets as each facet request will be build dynamically (it > is not feasible to aggregate certain categories and add them as an > additional solr schema field as we have hundreds of different combinations). > * Anyways, our ui only requires adding a toggle to filter for (for example) > "tours" in case a facet result is present. We do not care about the number > of tours. > * As we have millions of contents and dozens of content types (and dozens > of categories per content type) such queries may take a very long time. > > A complex example may look like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *q=*:*&json.facet={ tour:{ type : query, q: \"+categoryId:(21450 > 21453)\" }, guide:{ type : query, q: \"+categoryId:(21105 21401 > 21301 21302 21303 21304 21305 21403 21404)\" }, story:{ type : > query, q: \"+categoryId:21515\" }, condition:{ type : query, > q: \"+categoryId:21514\" }, hut:{ type : query, q: > \"+categoryId:8510\" }, skiresort:{ type : query, q: > \"+categoryId:21493\" }, offer:{ type : query, q: > \"+categoryId:21462\" }, lodging:{ type : query, q: > \"+categoryId:6061\" }, event:{ type : query, q: > \"+categoryId:21465\" }, poi:{ type : query, q: > \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\" }, authors:{ > type : query, q: \"+categoryId:(21205 21206)\" }, partners:{ > type : query, q: \"+categoryId:21200\" }, list:{ type : > query, q: \"+categoryId:21481\" } }\&rows=0"* > > @Michael > > Thanks for your suggestion but this does not work as > * the facet module expects an aggregate function (which i simply added by > embracing your call with sum(...)) > * and (please correct me if I am wrong) the exists() function not stops on > the first match, but counts the number of results for which the query > matches a document.
Re: Simulate facet.exists for json query facets
@Erick Sorry! I chose a simple example as I wanted to reduce complexity. In detail: * We have distinct contents like tours, offers, events, etc which themselves may be categorized: A tour may be a hiking tour, a mountaineering tour, ... * We have hundreds of customers that want to facet their searches to that content types but often with distinct combinations of categories, i.e. customer A wants his facet "tours" to only count hiking tours, customer B only mountaineering tours, customer C a combination of both, etc * We use "query" facets as each facet request will be build dynamically (it is not feasible to aggregate certain categories and add them as an additional solr schema field as we have hundreds of different combinations). * Anyways, our ui only requires adding a toggle to filter for (for example) "tours" in case a facet result is present. We do not care about the number of tours. * As we have millions of contents and dozens of content types (and dozens of categories per content type) such queries may take a very long time. A complex example may look like this: *q=*:*&json.facet={ tour:{ type : query, q: \"+categoryId:(21450 21453)\" }, guide:{ type : query, q: \"+categoryId:(21105 21401 21301 21302 21303 21304 21305 21403 21404)\" }, story:{ type : query, q: \"+categoryId:21515\" }, condition:{ type : query, q: \"+categoryId:21514\" }, hut:{ type : query, q: \"+categoryId:8510\" }, skiresort:{ type : query, q: \"+categoryId:21493\" }, offer:{ type : query, q: \"+categoryId:21462\" }, lodging:{ type : query, q: \"+categoryId:6061\" }, event:{ type : query, q: \"+categoryId:21465\" }, poi:{ type : query, q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\" }, authors:{ type : query, q: \"+categoryId:(21205 21206)\" }, partners:{ type : query, q: \"+categoryId:21200\" }, list:{ type : query, q: \"+categoryId:21481\" } }\&rows=0"* @Michael Thanks for your suggestion but this does not work as * the facet module expects an aggregate function (which i simply added by embracing your call with sum(...)) * and (please correct me if I am wrong) the exists() function not stops on the first match, but counts the number of results for which the query matches a document.
Re: Simulate facet.exists for json query facets
Separately, and in parallel to Erick's question: indeed I'm not aware of any way to do this currently, but I *can* imagine cases where this would be useful. I have a sense this could be cleanly implemented as a stat facet function (https://lucene.apache.org/solr/guide/8_6/json-facet-api.html#stat-facet-functions), e.g.: curl http://localhost:8983/solr/portal/select -d \ "q=*:*\ &json.facet={ tour: \"exists(+categoryId:6000 -categoryId:(6061 21493 8510))\" }\ &rows=0" The return value of the `exists` function could be boolean, which would be semantically clearer than capping count to 1, as I gather `facet.exists` does. For the same reason, implementing this as a function would probably be better than adding this functionality to the `query` facet type, which carries certain useful assumptions (the meaning of the "count" attribute in the response, the ability to nest stats and subfacets, etc.) ... just thinking out loud at the moment ... On Wed, Oct 28, 2020 at 9:17 AM Erick Erickson wrote: > > This really sounds like an XY problem. The whole point of facets is > to count the number of documents that have a value in some > number of buckets. So trying to stop your facet query as soon > as it matches a hit for the first time seems like an odd thing to do. > > So what’s the “X”? In other words, what is the problem you’re trying > to solve at a high level? Perhaps there’s a better way to figure this > out. > > Best, > Erick > > > On Oct 28, 2020, at 3:48 AM, michael dürr wrote: > > > > Hi, > > > > I use json facets of type 'query'. As these queries are pretty slow and I'm > > only interested in whether there is a match or not, I'd like to restrict > > the query execution similar to the standard facetting (like with the > > facet.exists parameter). My simplified query looks something like this (in > > reality *:* may be replaced by a complex edismax query and multiple > > subfacets similar to "tour" occur): > > > > curl http://localhost:8983/solr/portal/select -d \ > > "q=*:*\ > > &json.facet={ > > tour:{ > >type : query, > > q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\" > > } > > }\ > > &rows=0" > > > > Is there any possibility to modify my request to ensure that the facet > > query stops as soon as it matches a hit for the first time? > > > > Thanks! > > Michael >
Re: Simulate facet.exists for json query facets
This really sounds like an XY problem. The whole point of facets is to count the number of documents that have a value in some number of buckets. So trying to stop your facet query as soon as it matches a hit for the first time seems like an odd thing to do. So what’s the “X”? In other words, what is the problem you’re trying to solve at a high level? Perhaps there’s a better way to figure this out. Best, Erick > On Oct 28, 2020, at 3:48 AM, michael dürr wrote: > > Hi, > > I use json facets of type 'query'. As these queries are pretty slow and I'm > only interested in whether there is a match or not, I'd like to restrict > the query execution similar to the standard facetting (like with the > facet.exists parameter). My simplified query looks something like this (in > reality *:* may be replaced by a complex edismax query and multiple > subfacets similar to "tour" occur): > > curl http://localhost:8983/solr/portal/select -d \ > "q=*:*\ > &json.facet={ > tour:{ >type : query, > q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\" > } > }\ > &rows=0" > > Is there any possibility to modify my request to ensure that the facet > query stops as soon as it matches a hit for the first time? > > Thanks! > Michael
Simulate facet.exists for json query facets
Hi, I use json facets of type 'query'. As these queries are pretty slow and I'm only interested in whether there is a match or not, I'd like to restrict the query execution similar to the standard facetting (like with the facet.exists parameter). My simplified query looks something like this (in reality *:* may be replaced by a complex edismax query and multiple subfacets similar to "tour" occur): curl http://localhost:8983/solr/portal/select -d \ "q=*:*\ &json.facet={ tour:{ type : query, q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\" } }\ &rows=0" Is there any possibility to modify my request to ensure that the facet query stops as soon as it matches a hit for the first time? Thanks! Michael