Re: Why does faceting add an entry to the filter cache for the q parameter?
> I do think that the docs filterCache refguide [1] could be updated to > be more explicit about all the possible cases that use the > filterCache, and I'd be happy to help with that. Thanks, I appreciate it. If you would, please make yourself a watcher on the ticket I started for the task: https://issues.apache.org/jira/browse/SOLR-16554 > But I think > super-high-level I'd start by trying to size filterCache according to > what it _actually_ uses, rather than size it according to what I think > it _should_ use and then adjust the underlying implementation to > conform to that. Absolutely. I'm not trying to change anything, assuming the behavior was intentional. I just wanted to 1) make sure it was expected behavior, and 2) understand so that I can 3) update the docs to make these things clearer. That's why I titled this "Why does..." Although I guess these days on the internet "Why does this happen" can be read as "I don't like that this happens and I think it is wrong." Thanks, Andy - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
> On Nov 17, 2022, at 12:56 PM, Mikhail Khludnev wrote: > > Overall, FacetComponent explicitly requires docset > > I don't recommend changing anything in this flow. And if that's how it should be, that's fine. We just need to add notes about this in the docs for filterCache. I'll be doing that once Shawn's cache dumper works. - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
Given that the filters being cached are useful across all these contexts, and that facet values are often subsequently applied as filters, I don't think it'd make sense to separate these two -- especially not without some way to have both aware of each other to avoid duplicating effort. It might be good to have a more specific sense of the problems involved -- workload, cache size, etc. Because of the potential downstream consequences (for performance, cache usage patterns, etc.) here, if this can be solved by simply increasing the size of the cache, I'd go that route first for sure. I do think that the docs filterCache refguide [1] could be updated to be more explicit about all the possible cases that use the filterCache, and I'd be happy to help with that. But I think super-high-level I'd start by trying to size filterCache according to what it _actually_ uses, rather than size it according to what I think it _should_ use and then adjust the underlying implementation to conform to that. I've also idly considered a "named cache" variant of the `filter(query)` context -- something like `namedFilter(cacheName, query)` -- that would allow more fine-grained control over where queries hit. I also have some thoughts about possible clean ways to have multiple coordinated caches of compatible type -- but that'd be farther out. Short-term for your use case, it occurs to me that it'd be relatively straightforward to write a QParserPlugin that consults a specific named cache; then you could replace all `fq=query` with `fq={!myFilterQuery cacheName=name v=$query}` and I think get the behavior you're seeking. Would want to make the outer query return false for `getCache()`, unless you want the query stored in both filterCache _and_ your named cache! Michael [1] https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#filter-cache On Thu, Nov 17, 2022 at 1:57 PM Mikhail Khludnev wrote: > > > whether DocList can even be used for facets, > No way. For sure. > > Overall, FacetComponent explicitly requires docset > https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L82 > and then > https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1564 > https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L971 > I don't recommend changing anything in this flow. > > > On Thu, Nov 17, 2022 at 9:05 PM Shawn Heisey wrote: > > > On 11/17/22 08:45, Andy Lester wrote: > > > Short of that, wouldn't it make sense for facets to put the q in the > > queryResultsCache, not the filterCache? > > > > The queryResultCache is defined as ... very > > different from the that filterCache uses. I have no idea > > whether DocList can even be used for facets, but it is likely that > > DocSet is faster and more directly applicable. Because filterCache was > > already available, a choice was probably made to just use that rather > > than introduce a whole new cache. > > > > That decision is a bad one for your use case. Having a separate > > facetCache seems like a very good thing for your use case and probably > > would be generally helpful for overall performance in a variety of use > > cases. > > > > That part of Solr code is very unfamiliar to me, I wouldn't have any > > idea where to begin or what to do for implementing facetCache. I would > > tackle it if I knew how. > > > > Thanks, > > Shawn > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > > For additional commands, e-mail: dev-h...@solr.apache.org > > > > > > -- > Sincerely yours > Mikhail Khludnev - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
> whether DocList can even be used for facets, No way. For sure. Overall, FacetComponent explicitly requires docset https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L82 and then https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1564 https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L971 I don't recommend changing anything in this flow. On Thu, Nov 17, 2022 at 9:05 PM Shawn Heisey wrote: > On 11/17/22 08:45, Andy Lester wrote: > > Short of that, wouldn't it make sense for facets to put the q in the > queryResultsCache, not the filterCache? > > The queryResultCache is defined as ... very > different from the that filterCache uses. I have no idea > whether DocList can even be used for facets, but it is likely that > DocSet is faster and more directly applicable. Because filterCache was > already available, a choice was probably made to just use that rather > than introduce a whole new cache. > > That decision is a bad one for your use case. Having a separate > facetCache seems like a very good thing for your use case and probably > would be generally helpful for overall performance in a variety of use > cases. > > That part of Solr code is very unfamiliar to me, I wouldn't have any > idea where to begin or what to do for implementing facetCache. I would > tackle it if I knew how. > > Thanks, > Shawn > > > - > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > > -- Sincerely yours Mikhail Khludnev
Re: Why does faceting add an entry to the filter cache for the q parameter?
On 11/17/22 08:45, Andy Lester wrote: Short of that, wouldn't it make sense for facets to put the q in the queryResultsCache, not the filterCache? The queryResultCache is defined as ... very different from the that filterCache uses. I have no idea whether DocList can even be used for facets, but it is likely that DocSet is faster and more directly applicable. Because filterCache was already available, a choice was probably made to just use that rather than introduce a whole new cache. That decision is a bad one for your use case. Having a separate facetCache seems like a very good thing for your use case and probably would be generally helpful for overall performance in a variety of use cases. That part of Solr code is very unfamiliar to me, I wouldn't have any idea where to begin or what to do for implementing facetCache. I would tackle it if I knew how. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
> I think it likely means that the intended benefit of filterCache, which is > the caching of fqs, is not happening because those cache entries will be > evicted almost as soon as they are created. Yes, that's what I suspect is happening. Another problem with having q in the filterCache is that if your newSearcher is doing autowarming on some number of filterCache entries, it will be doing warming on q entries. This could be quite a bit of unnecessary work. > I wonder if maybe it would be a good idea to have a facetCache in addition to > filterCache. Same K,V as filterCache in the code, but entirely separate so > it does not interfere with caching of filters and can have a different > definition. That would make a lot of sense to me. Short of that, wouldn't it make sense for facets to put the q in the queryResultsCache, not the filterCache? Whatever comes of this, the docs could use some updates and clarifications, and especially discussion of tuning filterCache. I've started a ticket for that task, both as placeholder for the work, and as a place to aggregate notes and thoughts for when it gets done. Andy - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
On 11/16/22 13:21, Mikhail Khludnev wrote: Why do you think that thousands of evictions is a bad thing? I think it likely means that the intended benefit of filterCache, which is the caching of fqs, is not happening because those cache entries will be evicted almost as soon as they are created. In my experience an uncached fq is significantly slower than a cached fq, so caching them can be very important. In the setup I used to manage, facets were not used by the application. I used them for manual data mining and some daily cronjobs that would report statistics from the index during hours of low usage. So I did not run into this. I wonder if maybe it would be a good idea to have a facetCache in addition to filterCache. Same K,V as filterCache in the code, but entirely separate so it does not interfere with caching of filters and can have a different definition. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
> It seems reasonable to me. To count facets Solr needs docsets of results > for in-order processing. > I suppose, SimpleFacets code can be amended so it keeps the result docset > out of filter cache, but would it get significant gain? Mostly I think this is about expectations, and perhaps this is just about things that need to be explained more in the docs. If I see "filter cache" I expect that to be FQs. I don't expect to see my q=whatever in the filter cache. When I'm planning my filter cache, I would expect that the size of the filter cache should be the number of possible FQs that I'm going to call, if possible. If my app only has about 50 FQs possible, I'd think I only need to have to allocate a filter cache of maybe 100. But that's not the case. Since every search (1000s/hr) a user does in the app does uses faceting, then every search throws a new entry into the filter cache. The docs for facet.method=enum make it clear that "faceting on a field with U.S. States such as Alabama, Alaska, … Wyoming would lead to fifty cached filters which would be used over and over again. The filterCache should be large enough to hold all the cached filters." OK, that's good and makes sense, but then I only need to allocate 50 more slots in the filter cache for that. With the caching of the q=whatever, then it's entirely different planning. I just realized that the filterCache docs say "Solr also uses this cache for faceting when the configuration parameter facet.method is set to fc." from which I infer that facet.method=enum does not use the filter cache. But it does. > Why do you think that thousands of evictions is a bad thing? It's not bad, it's surprising. It's something I need to account for when planning and tuning. I'm imagining pulling together a section in the filter cache docs explaining "Here's everything that can go into the filter cache". If I knew enough about the Solr code, I'd do it all myself, but I don't. Anyone want to help on this endeavor? Andy - To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org
Re: Why does faceting add an entry to the filter cache for the q parameter?
Hi, Andy. It seems reasonable to me. To count facets Solr needs docsets of results for in-order processing. I suppose, SimpleFacets code can be amended so it keeps the result docset out of filter cache, but would it get significant gain? Why do you think that thousands of evictions is a bad thing? On Wed, Nov 16, 2022 at 8:26 PM Andy Lester wrote: > I've been working on tuning my filter cache, and have discovered that > doing a search with facet=on results in an entry in the filter cache for > the q parameter. This makes no sense to me. Is this intentional? > > In my app, there are only a couple of dozen possible FQs that can get > passed to Solr. There are no restrictions on the Q. The result is that I > have many added keys to the filter cache, and many evictions. I should be > able to get by with a filterCache size of 100, but even with a filterCache > size of 5000 I get thousands of evictions per hour because of all the > unique Q arguments getting added. > > If it is intentional, then we need to update the docs for both faceting > and the filter cache to explain the behavior. This is something that users > will need to take in to account when planning and tuning caches. > > If this behavior is not intentional and is a bug, it seems like fixing it > would be an easy performance win. > > I've made a ticket for this at > https://issues.apache.org/jira/browse/SOLR-16546 It illustrates the > specific steps I've taken to demonstrate the behavior. The demonstration is > on Solr 9.0.0 and uses Shawn Heisey's filter cache dumper for illustration, > but it also happens on a stock 8.11.2 install. > > Thanks, > Andy > - > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > > -- Sincerely yours Mikhail Khludnev