Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-18 Thread Andy Lester
> I do think that the docs filterCache refguide [1] could be updated to
> be more explicit about all the possible cases that use the
> filterCache, and I'd be happy to help with that.

Thanks, I appreciate it. If you would, please make yourself a watcher on the 
ticket I started for the task: https://issues.apache.org/jira/browse/SOLR-16554


> But I think
> super-high-level I'd start by trying to size filterCache according to
> what it _actually_ uses, rather than size it according to what I think
> it _should_ use and then adjust the underlying implementation to
> conform to that.

Absolutely. I'm not trying to change anything, assuming the behavior was 
intentional. I just wanted to 1) make sure it was expected behavior, and 2) 
understand so that I can 3) update the docs to make these things clearer.

That's why I titled this "Why does..."  Although I guess these days on the 
internet "Why does this happen" can be read as  "I don't like that this happens 
and I think it is wrong."

Thanks,
Andy
-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-18 Thread Andy Lester



> On Nov 17, 2022, at 12:56 PM, Mikhail Khludnev  wrote:
> 
> Overall, FacetComponent explicitly requires docset
> 
> I don't recommend changing anything in this flow.

And if that's how it should be, that's fine.

We just need to add notes about this in the docs for filterCache. I'll be doing 
that once Shawn's cache dumper works.


-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-18 Thread Michael Gibney
Given that the filters being cached are useful across all these
contexts, and that facet values are often subsequently applied as
filters, I don't think it'd make sense to separate these two --
especially not without some way to have both aware of each other to
avoid duplicating effort.

It might be good to have a more specific sense of the problems
involved -- workload, cache size, etc. Because of the potential
downstream consequences (for performance, cache usage patterns, etc.)
here, if this can be solved by simply increasing the size of the
cache, I'd go that route first for sure.

I do think that the docs filterCache refguide [1] could be updated to
be more explicit about all the possible cases that use the
filterCache, and I'd be happy to help with that. But I think
super-high-level I'd start by trying to size filterCache according to
what it _actually_ uses, rather than size it according to what I think
it _should_ use and then adjust the underlying implementation to
conform to that.

I've also idly considered a "named cache" variant of the
`filter(query)` context -- something like `namedFilter(cacheName,
query)` -- that would allow more fine-grained control over where
queries hit. I also have some thoughts about possible clean ways to
have multiple coordinated caches of compatible type -- but that'd be
farther out.

Short-term for your use case, it occurs to me that it'd be relatively
straightforward to write a QParserPlugin that consults a specific
named cache; then you could replace all `fq=query` with
`fq={!myFilterQuery cacheName=name v=$query}` and I think get the
behavior you're seeking. Would want to make the outer query return
false for `getCache()`, unless you want the query stored in both
filterCache _and_ your named cache!

Michael

[1] 
https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#filter-cache

On Thu, Nov 17, 2022 at 1:57 PM Mikhail Khludnev  wrote:
>
> >  whether DocList can even be used for facets,
> No way. For sure.
>
> Overall, FacetComponent explicitly requires docset
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L82
> and then
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1564
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L971
> I don't recommend changing anything in this flow.
>
>
> On Thu, Nov 17, 2022 at 9:05 PM Shawn Heisey  wrote:
>
> > On 11/17/22 08:45, Andy Lester wrote:
> > > Short of that, wouldn't it make sense for facets to put the q in the
> > queryResultsCache, not the filterCache?
> >
> > The queryResultCache is defined as  ... very
> > different from the  that filterCache uses.  I have no idea
> > whether DocList can even be used for facets, but it is likely that
> > DocSet is faster and more directly applicable.  Because filterCache was
> > already available, a choice was probably made to just use that rather
> > than introduce a whole new cache.
> >
> > That decision is a bad one for your use case.  Having a separate
> > facetCache seems like a very good thing for your use case and probably
> > would be generally helpful for overall performance in a variety of use
> > cases.
> >
> > That part of Solr code is very unfamiliar to me, I wouldn't have any
> > idea where to begin or what to do for implementing facetCache.  I would
> > tackle it if I knew how.
> >
> > Thanks,
> > Shawn
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-17 Thread Mikhail Khludnev
>  whether DocList can even be used for facets,
No way. For sure.

Overall, FacetComponent explicitly requires docset
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L82
and then
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1564
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L971
I don't recommend changing anything in this flow.


On Thu, Nov 17, 2022 at 9:05 PM Shawn Heisey  wrote:

> On 11/17/22 08:45, Andy Lester wrote:
> > Short of that, wouldn't it make sense for facets to put the q in the
> queryResultsCache, not the filterCache?
>
> The queryResultCache is defined as  ... very
> different from the  that filterCache uses.  I have no idea
> whether DocList can even be used for facets, but it is likely that
> DocSet is faster and more directly applicable.  Because filterCache was
> already available, a choice was probably made to just use that rather
> than introduce a whole new cache.
>
> That decision is a bad one for your use case.  Having a separate
> facetCache seems like a very good thing for your use case and probably
> would be generally helpful for overall performance in a variety of use
> cases.
>
> That part of Solr code is very unfamiliar to me, I wouldn't have any
> idea where to begin or what to do for implementing facetCache.  I would
> tackle it if I knew how.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-17 Thread Shawn Heisey

On 11/17/22 08:45, Andy Lester wrote:

Short of that, wouldn't it make sense for facets to put the q in the 
queryResultsCache, not the filterCache?


The queryResultCache is defined as  ... very 
different from the  that filterCache uses.  I have no idea 
whether DocList can even be used for facets, but it is likely that 
DocSet is faster and more directly applicable.  Because filterCache was 
already available, a choice was probably made to just use that rather 
than introduce a whole new cache.


That decision is a bad one for your use case.  Having a separate 
facetCache seems like a very good thing for your use case and probably 
would be generally helpful for overall performance in a variety of use 
cases.


That part of Solr code is very unfamiliar to me, I wouldn't have any 
idea where to begin or what to do for implementing facetCache.  I would 
tackle it if I knew how.


Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-17 Thread Andy Lester
> I think it likely means that the intended benefit of filterCache, which is 
> the caching of fqs, is not happening because those cache entries will be 
> evicted almost as soon as they are created.

Yes, that's what I suspect is happening.

Another problem with having q in the filterCache is that if your newSearcher is 
doing autowarming on some number of filterCache entries, it will be doing 
warming on q entries. This could be quite a bit of unnecessary work.


> I wonder if maybe it would be a good idea to have a facetCache in addition to 
> filterCache.  Same K,V as filterCache in the code, but entirely separate so 
> it does not interfere with caching of filters and can have a different 
> definition.

That would make a lot of sense to me. 

Short of that, wouldn't it make sense for facets to put the q in the 
queryResultsCache, not the filterCache?

Whatever comes of this, the docs could use some updates and clarifications, and 
especially discussion of tuning filterCache. I've started a ticket for that 
task, both as placeholder for the work, and as a place to aggregate notes and 
thoughts for when it gets done.

Andy
-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-17 Thread Shawn Heisey

On 11/16/22 13:21, Mikhail Khludnev wrote:

Why do you think that thousands of evictions is a bad thing?


I think it likely means that the intended benefit of filterCache, which 
is the caching of fqs, is not happening because those cache entries will 
be evicted almost as soon as they are created.  In my experience an 
uncached fq is significantly slower than a cached fq, so caching them 
can be very important.


In the setup I used to manage, facets were not used by the application.  
I used them for manual data mining and some daily cronjobs that would 
report statistics from the index during hours of low usage.  So I did 
not run into this.


I wonder if maybe it would be a good idea to have a facetCache in 
addition to filterCache.  Same K,V as filterCache in the code, but 
entirely separate so it does not interfere with caching of filters and 
can have a different definition.


Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-16 Thread Andy Lester


> It seems reasonable to me. To count facets Solr needs docsets of results
> for in-order processing.
> I suppose, SimpleFacets code can be amended so it keeps the result docset
> out of filter cache, but would it get significant gain?

Mostly I think this is about expectations, and perhaps this is just about 
things that need to be explained more in the docs.

If I see "filter cache" I expect that to be FQs. I don't expect to see my 
q=whatever in the filter cache.

When I'm planning my filter cache, I would expect that the size of the filter 
cache should be the number of possible FQs that I'm going to call, if possible. 
If my app only has about 50 FQs possible, I'd think I only need to have to 
allocate a filter cache of maybe 100.  But that's not the case. Since every 
search (1000s/hr) a user does in the app does uses faceting, then every search 
throws a new entry into the filter cache.

The docs for facet.method=enum make it clear that "faceting on a field with 
U.S. States such as Alabama, Alaska, …​ Wyoming would lead to fifty cached 
filters which would be used over and over again. The filterCache should be 
large enough to hold all the cached filters."  OK, that's good and makes sense, 
but then I only need to allocate 50 more slots in the filter cache for that. 
With the caching of the q=whatever, then it's entirely different planning.

I just realized that the filterCache docs say "Solr also uses this cache for 
faceting when the configuration parameter facet.method is set to fc." from 
which I infer that facet.method=enum does not use the filter cache. But it does.

> Why do you think that thousands of evictions is a bad thing?

It's not bad, it's surprising. It's something I need to account for when 
planning and tuning.

I'm imagining pulling together a section in the filter cache docs explaining 
"Here's everything that can go into the filter cache". If I knew enough about 
the Solr code, I'd do it all myself, but I don't.

Anyone want to help on this endeavor?

Andy
-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Why does faceting add an entry to the filter cache for the q parameter?

2022-11-16 Thread Mikhail Khludnev
Hi, Andy.
It seems reasonable to me. To count facets Solr needs docsets of results
for in-order processing.
I suppose, SimpleFacets code can be amended so it keeps the result docset
out of filter cache, but would it get significant gain?
Why do you think that thousands of evictions is a bad thing?

On Wed, Nov 16, 2022 at 8:26 PM Andy Lester  wrote:

> I've been working on tuning my filter cache, and have discovered that
> doing a search with facet=on results in an entry in the filter cache for
> the q parameter. This makes no sense to me. Is this intentional?
>
> In my app, there are only a couple of dozen possible FQs that can get
> passed to Solr. There are no restrictions on the Q. The result is that I
> have many added keys to the filter cache, and many evictions. I should be
> able to get by with a filterCache size of 100, but even with a filterCache
> size of 5000 I get thousands of evictions per hour because of all the
> unique Q arguments getting added.
>
> If it is intentional, then we need to update the docs for both faceting
> and the filter cache to explain the behavior. This is something that users
> will need to take in to account when planning and tuning caches.
>
> If this behavior is not intentional and is a bug, it seems like fixing it
> would be an easy performance win.
>
> I've made a ticket for this at
> https://issues.apache.org/jira/browse/SOLR-16546  It illustrates the
> specific steps I've taken to demonstrate the behavior. The demonstration is
> on Solr 9.0.0 and uses Shawn Heisey's filter cache dumper for illustration,
> but it also happens on a stock 8.11.2 install.
>
> Thanks,
> Andy
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev