Re: post filtering for boolean filter queries
bq. How slow is "around commit points really slow"? You could at least lessen the pain here by committing less often if you can stand the latency They are shamelessly slow, like 60-70 seconds. While normal searches are within 1-3 seconds range. And, yes. your idea is right and what we are pursuing: less commits. However we do have shards that are hot because we need to keep them that hot, i.e. we commit as often as data arrives. This is where the slow searches pop up. bq. Often users are more disturbed by getting (numbers from thin air) 2 second responses occasionally spiking to 20 seconds with an average of 3 seconds than getting all responses between 4 and 6 seconds with an average of 5. yes, I believe so too. So at the moment, the call for using post-filtering or cache is more or less for business folks to make. We have been looking into other things, like making our shards as small as possible. This a parallel route to making our cache efficient. Thanks, Dmitry On Thu, Dec 5, 2013 at 3:59 PM, Erick Erickson wrote: > bq: To be sure we are using cost 101 and no cache > > The guy who wrote the code is really good, but I'm paranoid too so I use > 101. Based on the number of off-by-one errors I've coded :)... > > How slow is "around commit points really slow"? You could at least lessen > the pain here by committing less often if you can stand the latency > > But otherwise you've pretty much nailed your options. One approach is to > give users _predictable_ responses, not necessarily the best average. Often > users are more disturbed by getting (numbers from thin air) 2 second > responses occasionally spiking to 20 seconds with an average of 3 seconds > than getting all responses between 4 and 6 seconds with an average of 5. > > FWIW, > Erick > > > On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan wrote: > > > Thanks Erick! > > To be sure we are using cost 101 and no cache. It seems to affect on > > searches as we expected. > > > > Basically with cache on we see more "fat" spikes around commit points, as > > cache is getting flushed (we don't rerun too many entries from old > cache). > > But when the post-filtering is involved, those spikes are thinner, but > the > > rest of the queries take about 2 seconds longer (our queries are pretty > > heavy duty stuff). > > > > So the post-filtering gives an option of making trade-offs between query > > times for all users during normal execution and query times during > commits. > > To rephrase we have 2 options: > > > > 1. Make all searches somewhat slower for all users and avoid really slow > > searches around commit points: post-filtering option > > > > OR > > > > 2. Make majority of searches really fast, but around commit points really > > slow: normal with cache option > > > > Dmitry > > > > > > On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson > >wrote: > > > > > OK, so cache=false and cost=100 should do it, see: > > > http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ > > > > > > Best, > > > Erick > > > > > > > > > On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan > wrote: > > > > > > > Thanks Yonik. > > > > > > > > For our use case, we would like to skip caching only one particular > > > filter > > > > cache, yet apply a high cost for it to make sure it executes last of > > all > > > > filter queries. > > > > > > > > So this means, the rest of the fqs will execute and cache as usual. > > > > > > > > > > > > > > > > > > > > On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley > > > > wrote: > > > > > > > > > On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan > > > wrote: > > > > > > ok, we were able to confirm the behavior regarding not caching > the > > > > filter > > > > > > query. It works as expected. It does not cache with > {!cache=false}. > > > > > > > > > > > > We are still looking into clarifying the cost assignment: i.e. > > > whether > > > > it > > > > > > works as expected for long boolean filter queries. > > > > > > > > > > Yes, filters should be ordered by cost (cheapest first) whenever > you > > > > > use {!cache=false} > > > > > > > > > > -Yonik > > > > > http://heliosearch.com -- making solr shine > > > > > > > > > > > > > > > > > > > > > -- > > > > Dmitry > > > > Blog: http://dmitrykan.blogspot.com > > > > Twitter: twitter.com/dmitrykan > > > > > > > > > > > > > > > -- > > Dmitry > > Blog: http://dmitrykan.blogspot.com > > Twitter: twitter.com/dmitrykan > > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: post filtering for boolean filter queries
On Thu, Dec 5, 2013 at 4:49 PM, Yonik Seeley wrote: > On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan wrote: > > Thanks Erick! > > To be sure we are using cost 101 and no cache. It seems to affect on > > searches as we expected. > > > > Basically with cache on we see more "fat" spikes around commit points, as > > cache is getting flushed (we don't rerun too many entries from old > cache). > > But when the post-filtering is involved, those spikes are thinner, but > the > > rest of the queries take about 2 seconds longer (our queries are pretty > > heavy duty stuff). > > > > So the post-filtering gives an option of making trade-offs between query > > times for all users during normal execution and query times during > commits. > > To rephrase we have 2 options: > > > > 1. Make all searches somewhat slower for all users and avoid really slow > > searches around commit points: post-filtering option > > > > OR > > > > 2. Make majority of searches really fast, but around commit points really > > slow: normal with cache option > > OR > > 3. Use warming queries or auto-warming of caches to make all searches fast > but the commits themselves slow. > > thanks Yonik. This is indeeed what we have tried originally. But, as I have briefly described on the Dublin's Stump the Chump, auto-warming is way too long and does not complete within up to an hour. So the next commit kicks in and so on. So we opted for an external automatic warming. > -Yonik > http://heliosearch.com -- making solr shine > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: post filtering for boolean filter queries
On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan wrote: > Thanks Erick! > To be sure we are using cost 101 and no cache. It seems to affect on > searches as we expected. > > Basically with cache on we see more "fat" spikes around commit points, as > cache is getting flushed (we don't rerun too many entries from old cache). > But when the post-filtering is involved, those spikes are thinner, but the > rest of the queries take about 2 seconds longer (our queries are pretty > heavy duty stuff). > > So the post-filtering gives an option of making trade-offs between query > times for all users during normal execution and query times during commits. > To rephrase we have 2 options: > > 1. Make all searches somewhat slower for all users and avoid really slow > searches around commit points: post-filtering option > > OR > > 2. Make majority of searches really fast, but around commit points really > slow: normal with cache option OR 3. Use warming queries or auto-warming of caches to make all searches fast but the commits themselves slow. -Yonik http://heliosearch.com -- making solr shine
Re: post filtering for boolean filter queries
bq: To be sure we are using cost 101 and no cache The guy who wrote the code is really good, but I'm paranoid too so I use 101. Based on the number of off-by-one errors I've coded :)... How slow is "around commit points really slow"? You could at least lessen the pain here by committing less often if you can stand the latency But otherwise you've pretty much nailed your options. One approach is to give users _predictable_ responses, not necessarily the best average. Often users are more disturbed by getting (numbers from thin air) 2 second responses occasionally spiking to 20 seconds with an average of 3 seconds than getting all responses between 4 and 6 seconds with an average of 5. FWIW, Erick On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan wrote: > Thanks Erick! > To be sure we are using cost 101 and no cache. It seems to affect on > searches as we expected. > > Basically with cache on we see more "fat" spikes around commit points, as > cache is getting flushed (we don't rerun too many entries from old cache). > But when the post-filtering is involved, those spikes are thinner, but the > rest of the queries take about 2 seconds longer (our queries are pretty > heavy duty stuff). > > So the post-filtering gives an option of making trade-offs between query > times for all users during normal execution and query times during commits. > To rephrase we have 2 options: > > 1. Make all searches somewhat slower for all users and avoid really slow > searches around commit points: post-filtering option > > OR > > 2. Make majority of searches really fast, but around commit points really > slow: normal with cache option > > Dmitry > > > On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson >wrote: > > > OK, so cache=false and cost=100 should do it, see: > > http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ > > > > Best, > > Erick > > > > > > On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan wrote: > > > > > Thanks Yonik. > > > > > > For our use case, we would like to skip caching only one particular > > filter > > > cache, yet apply a high cost for it to make sure it executes last of > all > > > filter queries. > > > > > > So this means, the rest of the fqs will execute and cache as usual. > > > > > > > > > > > > > > > On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley > > > wrote: > > > > > > > On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan > > wrote: > > > > > ok, we were able to confirm the behavior regarding not caching the > > > filter > > > > > query. It works as expected. It does not cache with {!cache=false}. > > > > > > > > > > We are still looking into clarifying the cost assignment: i.e. > > whether > > > it > > > > > works as expected for long boolean filter queries. > > > > > > > > Yes, filters should be ordered by cost (cheapest first) whenever you > > > > use {!cache=false} > > > > > > > > -Yonik > > > > http://heliosearch.com -- making solr shine > > > > > > > > > > > > > > > > -- > > > Dmitry > > > Blog: http://dmitrykan.blogspot.com > > > Twitter: twitter.com/dmitrykan > > > > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: twitter.com/dmitrykan >
Re: post filtering for boolean filter queries
Thanks Erick! To be sure we are using cost 101 and no cache. It seems to affect on searches as we expected. Basically with cache on we see more "fat" spikes around commit points, as cache is getting flushed (we don't rerun too many entries from old cache). But when the post-filtering is involved, those spikes are thinner, but the rest of the queries take about 2 seconds longer (our queries are pretty heavy duty stuff). So the post-filtering gives an option of making trade-offs between query times for all users during normal execution and query times during commits. To rephrase we have 2 options: 1. Make all searches somewhat slower for all users and avoid really slow searches around commit points: post-filtering option OR 2. Make majority of searches really fast, but around commit points really slow: normal with cache option Dmitry On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson wrote: > OK, so cache=false and cost=100 should do it, see: > http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ > > Best, > Erick > > > On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan wrote: > > > Thanks Yonik. > > > > For our use case, we would like to skip caching only one particular > filter > > cache, yet apply a high cost for it to make sure it executes last of all > > filter queries. > > > > So this means, the rest of the fqs will execute and cache as usual. > > > > > > > > > > On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley > > wrote: > > > > > On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan > wrote: > > > > ok, we were able to confirm the behavior regarding not caching the > > filter > > > > query. It works as expected. It does not cache with {!cache=false}. > > > > > > > > We are still looking into clarifying the cost assignment: i.e. > whether > > it > > > > works as expected for long boolean filter queries. > > > > > > Yes, filters should be ordered by cost (cheapest first) whenever you > > > use {!cache=false} > > > > > > -Yonik > > > http://heliosearch.com -- making solr shine > > > > > > > > > > > -- > > Dmitry > > Blog: http://dmitrykan.blogspot.com > > Twitter: twitter.com/dmitrykan > > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: post filtering for boolean filter queries
OK, so cache=false and cost=100 should do it, see: http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ Best, Erick On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan wrote: > Thanks Yonik. > > For our use case, we would like to skip caching only one particular filter > cache, yet apply a high cost for it to make sure it executes last of all > filter queries. > > So this means, the rest of the fqs will execute and cache as usual. > > > > > On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley > wrote: > > > On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan wrote: > > > ok, we were able to confirm the behavior regarding not caching the > filter > > > query. It works as expected. It does not cache with {!cache=false}. > > > > > > We are still looking into clarifying the cost assignment: i.e. whether > it > > > works as expected for long boolean filter queries. > > > > Yes, filters should be ordered by cost (cheapest first) whenever you > > use {!cache=false} > > > > -Yonik > > http://heliosearch.com -- making solr shine > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: twitter.com/dmitrykan >
Re: post filtering for boolean filter queries
Thanks Yonik. For our use case, we would like to skip caching only one particular filter cache, yet apply a high cost for it to make sure it executes last of all filter queries. So this means, the rest of the fqs will execute and cache as usual. On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley wrote: > On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan wrote: > > ok, we were able to confirm the behavior regarding not caching the filter > > query. It works as expected. It does not cache with {!cache=false}. > > > > We are still looking into clarifying the cost assignment: i.e. whether it > > works as expected for long boolean filter queries. > > Yes, filters should be ordered by cost (cheapest first) whenever you > use {!cache=false} > > -Yonik > http://heliosearch.com -- making solr shine > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: post filtering for boolean filter queries
On 12/03/2013 01:55 AM, Dmitry Kan wrote: Hello! We have been experimenting with post filtering lately. Our setup is a filter having long boolean query; drawing the example from the Dublin's Stump the Chump: fq=UserId:(user1 OR user2 OR...OR user1000) The underlining issue impacting performance is that the combination of user ids in the query above is unique per each user in the system and on top the combination is changing every day. Our idea was to stop caching the filter query with {!cache=false}. Since there is no way to introspect the contents of the filter cache to our knowledge (jmx?), we can't be sure those are not cached. This is because the initial query per each combination takes substantially more time (as if it was *not* cached) than the second and subsequent queries with the same fq (as if it *was* cached). Question is: does post filtering support boolean queries in fq params? Another thing we have been trying is assigning a cost to the fq relatively higher than for other filter queries. Does this feature support the boolean queries in fq params as well? Dmitry - I went to a talk at LR where this problem was discussed, and the solution of implementing a custom filter cache only for logged-in users was discussed -- sounds interesting, but maybe some tricky parts to it -Mike
Re: post filtering for boolean filter queries
On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan wrote: > ok, we were able to confirm the behavior regarding not caching the filter > query. It works as expected. It does not cache with {!cache=false}. > > We are still looking into clarifying the cost assignment: i.e. whether it > works as expected for long boolean filter queries. Yes, filters should be ordered by cost (cheapest first) whenever you use {!cache=false} -Yonik http://heliosearch.com -- making solr shine
Re: post filtering for boolean filter queries
ok, we were able to confirm the behavior regarding not caching the filter query. It works as expected. It does not cache with {!cache=false}. We are still looking into clarifying the cost assignment: i.e. whether it works as expected for long boolean filter queries. On Tue, Dec 3, 2013 at 8:55 AM, Dmitry Kan wrote: > Hello! > > We have been experimenting with post filtering lately. Our setup is a > filter having long boolean query; drawing the example from the Dublin's > Stump the Chump: > > fq=UserId:(user1 OR user2 OR...OR user1000) > > The underlining issue impacting performance is that the combination of > user ids in the query above is unique per each user in the system and on > top the combination is changing every day. > > Our idea was to stop caching the filter query with {!cache=false}. Since > there is no way to introspect the contents of the filter cache to our > knowledge (jmx?), we can't be sure those are not cached. This is because > the initial query per each combination takes substantially more time (as if > it was *not* cached) than the second and subsequent queries with the same > fq (as if it *was* cached). > > Question is: does post filtering support boolean queries in fq params? > > Another thing we have been trying is assigning a cost to the fq relatively > higher than for other filter queries. Does this feature support the boolean > queries in fq params as well? > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: twitter.com/dmitrykan > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan