Re: post filtering for boolean filter queries

2013-12-07 Thread Dmitry Kan
On Thu, Dec 5, 2013 at 4:49 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Thanks Erick!
  To be sure we are using cost 101 and no cache. It seems to affect on
  searches as we expected.
 
  Basically with cache on we see more fat spikes around commit points, as
  cache is getting flushed (we don't rerun too many entries from old
 cache).
  But when the post-filtering is involved, those spikes are thinner, but
 the
  rest of the queries take about 2 seconds longer (our queries are pretty
  heavy duty stuff).
 
  So the post-filtering gives an option of making trade-offs between query
  times for all users during normal execution and query times during
 commits.
  To rephrase we have 2 options:
 
  1. Make all searches somewhat slower for all users and avoid really slow
  searches around commit points: post-filtering option
 
  OR
 
  2. Make majority of searches really fast, but around commit points really
  slow: normal with cache option

 OR

 3. Use warming queries or auto-warming of caches to make all searches fast
 but the commits themselves slow.


thanks Yonik. This is indeeed what we have tried originally. But, as I have
briefly described on the Dublin's Stump the Chump, auto-warming is way too
long and does not complete within up to an hour. So the next commit kicks
in and so on. So we opted for an external automatic warming.



 -Yonik
 http://heliosearch.com -- making solr shine




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-07 Thread Dmitry Kan
bq. How slow is around commit points really slow? You could at least
lessen
the pain here by committing less often if you can stand the latency

They are shamelessly slow, like 60-70 seconds. While normal searches are
within 1-3 seconds range. And, yes. your idea is right and what we are
pursuing: less commits. However we do have shards that are hot because we
need to keep them that hot, i.e. we commit as often as data arrives. This
is where the slow searches pop up.

bq.  Often
users are more disturbed by getting (numbers from thin air) 2 second
responses occasionally spiking to 20 seconds with an average of 3 seconds
 than getting all responses between 4 and 6 seconds with an average of 5.

yes, I believe so too. So at the moment, the call for using post-filtering
or cache is more or less for business folks to make. We have been looking
into other things, like making our shards as small as possible. This a
parallel route to making our cache efficient.

Thanks,
Dmitry


On Thu, Dec 5, 2013 at 3:59 PM, Erick Erickson erickerick...@gmail.comwrote:

 bq: To be sure we are using cost 101 and no cache

 The guy who wrote the code is really good, but I'm paranoid too so I use
 101. Based on the number of off-by-one errors I've coded :)...

 How slow is around commit points really slow? You could at least lessen
 the pain here by committing less often if you can stand the latency

 But otherwise you've pretty much nailed your options. One approach is to
 give users _predictable_ responses, not necessarily the best average. Often
 users are more disturbed by getting (numbers from thin air) 2 second
 responses occasionally spiking to 20 seconds with an average of 3 seconds
  than getting all responses between 4 and 6 seconds with an average of 5.

 FWIW,
 Erick


 On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Thanks Erick!
  To be sure we are using cost 101 and no cache. It seems to affect on
  searches as we expected.
 
  Basically with cache on we see more fat spikes around commit points, as
  cache is getting flushed (we don't rerun too many entries from old
 cache).
  But when the post-filtering is involved, those spikes are thinner, but
 the
  rest of the queries take about 2 seconds longer (our queries are pretty
  heavy duty stuff).
 
  So the post-filtering gives an option of making trade-offs between query
  times for all users during normal execution and query times during
 commits.
  To rephrase we have 2 options:
 
  1. Make all searches somewhat slower for all users and avoid really slow
  searches around commit points: post-filtering option
 
  OR
 
  2. Make majority of searches really fast, but around commit points really
  slow: normal with cache option
 
  Dmitry
 
 
  On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   OK, so cache=false and cost=100 should do it, see:
   http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/
  
   Best,
   Erick
  
  
   On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
  
Thanks Yonik.
   
For our use case, we would like to skip caching only one particular
   filter
cache, yet apply a high cost for it to make sure it executes last of
  all
filter queries.
   
So this means, the rest of the fqs will execute and cache as usual.
   
   
   
   
On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com
wrote:
   
 On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com
   wrote:
  ok, we were able to confirm the behavior regarding not caching
 the
filter
  query. It works as expected. It does not cache with
 {!cache=false}.
 
  We are still looking into clarifying the cost assignment: i.e.
   whether
it
  works as expected for long boolean filter queries.

 Yes, filters should be ordered by cost (cheapest first) whenever
 you
 use {!cache=false}

 -Yonik
 http://heliosearch.com -- making solr shine

   
   
   
--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan
   
  
 
 
 
  --
  Dmitry
  Blog: http://dmitrykan.blogspot.com
  Twitter: twitter.com/dmitrykan
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-05 Thread Dmitry Kan
Thanks Erick!
To be sure we are using cost 101 and no cache. It seems to affect on
searches as we expected.

Basically with cache on we see more fat spikes around commit points, as
cache is getting flushed (we don't rerun too many entries from old cache).
But when the post-filtering is involved, those spikes are thinner, but the
rest of the queries take about 2 seconds longer (our queries are pretty
heavy duty stuff).

So the post-filtering gives an option of making trade-offs between query
times for all users during normal execution and query times during commits.
To rephrase we have 2 options:

1. Make all searches somewhat slower for all users and avoid really slow
searches around commit points: post-filtering option

OR

2. Make majority of searches really fast, but around commit points really
slow: normal with cache option

Dmitry


On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson erickerick...@gmail.comwrote:

 OK, so cache=false and cost=100 should do it, see:
 http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

 Best,
 Erick


 On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Thanks Yonik.
 
  For our use case, we would like to skip caching only one particular
 filter
  cache, yet apply a high cost for it to make sure it executes last of all
  filter queries.
 
  So this means, the rest of the fqs will execute and cache as usual.
 
 
 
 
  On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com
  wrote:
 
   On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
ok, we were able to confirm the behavior regarding not caching the
  filter
query. It works as expected. It does not cache with {!cache=false}.
   
We are still looking into clarifying the cost assignment: i.e.
 whether
  it
works as expected for long boolean filter queries.
  
   Yes, filters should be ordered by cost (cheapest first) whenever you
   use {!cache=false}
  
   -Yonik
   http://heliosearch.com -- making solr shine
  
 
 
 
  --
  Dmitry
  Blog: http://dmitrykan.blogspot.com
  Twitter: twitter.com/dmitrykan
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-05 Thread Erick Erickson
bq: To be sure we are using cost 101 and no cache

The guy who wrote the code is really good, but I'm paranoid too so I use
101. Based on the number of off-by-one errors I've coded :)...

How slow is around commit points really slow? You could at least lessen
the pain here by committing less often if you can stand the latency

But otherwise you've pretty much nailed your options. One approach is to
give users _predictable_ responses, not necessarily the best average. Often
users are more disturbed by getting (numbers from thin air) 2 second
responses occasionally spiking to 20 seconds with an average of 3 seconds
 than getting all responses between 4 and 6 seconds with an average of 5.

FWIW,
Erick


On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Thanks Erick!
 To be sure we are using cost 101 and no cache. It seems to affect on
 searches as we expected.

 Basically with cache on we see more fat spikes around commit points, as
 cache is getting flushed (we don't rerun too many entries from old cache).
 But when the post-filtering is involved, those spikes are thinner, but the
 rest of the queries take about 2 seconds longer (our queries are pretty
 heavy duty stuff).

 So the post-filtering gives an option of making trade-offs between query
 times for all users during normal execution and query times during commits.
 To rephrase we have 2 options:

 1. Make all searches somewhat slower for all users and avoid really slow
 searches around commit points: post-filtering option

 OR

 2. Make majority of searches really fast, but around commit points really
 slow: normal with cache option

 Dmitry


 On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  OK, so cache=false and cost=100 should do it, see:
  http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/
 
  Best,
  Erick
 
 
  On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com wrote:
 
   Thanks Yonik.
  
   For our use case, we would like to skip caching only one particular
  filter
   cache, yet apply a high cost for it to make sure it executes last of
 all
   filter queries.
  
   So this means, the rest of the fqs will execute and cache as usual.
  
  
  
  
   On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com
   wrote:
  
On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com
  wrote:
 ok, we were able to confirm the behavior regarding not caching the
   filter
 query. It works as expected. It does not cache with {!cache=false}.

 We are still looking into clarifying the cost assignment: i.e.
  whether
   it
 works as expected for long boolean filter queries.
   
Yes, filters should be ordered by cost (cheapest first) whenever you
use {!cache=false}
   
-Yonik
http://heliosearch.com -- making solr shine
   
  
  
  
   --
   Dmitry
   Blog: http://dmitrykan.blogspot.com
   Twitter: twitter.com/dmitrykan
  
 



 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan



Re: post filtering for boolean filter queries

2013-12-05 Thread Yonik Seeley
On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Thanks Erick!
 To be sure we are using cost 101 and no cache. It seems to affect on
 searches as we expected.

 Basically with cache on we see more fat spikes around commit points, as
 cache is getting flushed (we don't rerun too many entries from old cache).
 But when the post-filtering is involved, those spikes are thinner, but the
 rest of the queries take about 2 seconds longer (our queries are pretty
 heavy duty stuff).

 So the post-filtering gives an option of making trade-offs between query
 times for all users during normal execution and query times during commits.
 To rephrase we have 2 options:

 1. Make all searches somewhat slower for all users and avoid really slow
 searches around commit points: post-filtering option

 OR

 2. Make majority of searches really fast, but around commit points really
 slow: normal with cache option

OR

3. Use warming queries or auto-warming of caches to make all searches fast
but the commits themselves slow.

-Yonik
http://heliosearch.com -- making solr shine


Re: post filtering for boolean filter queries

2013-12-04 Thread Dmitry Kan
Thanks Yonik.

For our use case, we would like to skip caching only one particular filter
cache, yet apply a high cost for it to make sure it executes last of all
filter queries.

So this means, the rest of the fqs will execute and cache as usual.




On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote:
  ok, we were able to confirm the behavior regarding not caching the filter
  query. It works as expected. It does not cache with {!cache=false}.
 
  We are still looking into clarifying the cost assignment: i.e. whether it
  works as expected for long boolean filter queries.

 Yes, filters should be ordered by cost (cheapest first) whenever you
 use {!cache=false}

 -Yonik
 http://heliosearch.com -- making solr shine




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-04 Thread Erick Erickson
OK, so cache=false and cost=100 should do it, see:
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

Best,
Erick


On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Thanks Yonik.

 For our use case, we would like to skip caching only one particular filter
 cache, yet apply a high cost for it to make sure it executes last of all
 filter queries.

 So this means, the rest of the fqs will execute and cache as usual.




 On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com
 wrote:

  On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote:
   ok, we were able to confirm the behavior regarding not caching the
 filter
   query. It works as expected. It does not cache with {!cache=false}.
  
   We are still looking into clarifying the cost assignment: i.e. whether
 it
   works as expected for long boolean filter queries.
 
  Yes, filters should be ordered by cost (cheapest first) whenever you
  use {!cache=false}
 
  -Yonik
  http://heliosearch.com -- making solr shine
 



 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan



Re: post filtering for boolean filter queries

2013-12-03 Thread Dmitry Kan
ok, we were able to confirm the behavior regarding not caching the filter
query. It works as expected. It does not cache with {!cache=false}.

We are still looking into clarifying the cost assignment: i.e. whether it
works as expected for long boolean filter queries.


On Tue, Dec 3, 2013 at 8:55 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 We have been experimenting with post filtering lately. Our setup is a
 filter having long boolean query; drawing the example from the Dublin's
 Stump the Chump:

 fq=UserId:(user1 OR user2 OR...OR user1000)

 The underlining issue impacting performance is that the combination of
 user ids in the query above is unique per each user in the system and on
 top the combination is changing every day.

 Our idea was to stop caching the filter query with {!cache=false}. Since
 there is no way to introspect the contents of the filter cache to our
 knowledge (jmx?), we can't be sure those are not cached. This is because
 the initial query per each combination takes substantially more time (as if
 it was *not* cached) than the second and subsequent queries with the same
 fq (as if it *was* cached).

 Question is: does post filtering support boolean queries in fq params?

 Another thing we have been trying is assigning a cost to the fq relatively
 higher than for other filter queries. Does this feature support the boolean
 queries in fq params as well?

 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-03 Thread Yonik Seeley
On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote:
 ok, we were able to confirm the behavior regarding not caching the filter
 query. It works as expected. It does not cache with {!cache=false}.

 We are still looking into clarifying the cost assignment: i.e. whether it
 works as expected for long boolean filter queries.

Yes, filters should be ordered by cost (cheapest first) whenever you
use {!cache=false}

-Yonik
http://heliosearch.com -- making solr shine


Re: post filtering for boolean filter queries

2013-12-03 Thread Michael Sokolov

On 12/03/2013 01:55 AM, Dmitry Kan wrote:

Hello!

We have been experimenting with post filtering lately. Our setup is a
filter having long boolean query; drawing the example from the Dublin's
Stump the Chump:

fq=UserId:(user1 OR user2 OR...OR user1000)

The underlining issue impacting performance is that the combination of user
ids in the query above is unique per each user in the system and on top the
combination is changing every day.

Our idea was to stop caching the filter query with {!cache=false}. Since
there is no way to introspect the contents of the filter cache to our
knowledge (jmx?), we can't be sure those are not cached. This is because
the initial query per each combination takes substantially more time (as if
it was *not* cached) than the second and subsequent queries with the same
fq (as if it *was* cached).

Question is: does post filtering support boolean queries in fq params?

Another thing we have been trying is assigning a cost to the fq relatively
higher than for other filter queries. Does this feature support the boolean
queries in fq params as well?

Dmitry - I went to a talk at LR where this problem was discussed, and 
the solution of implementing a custom filter cache only for logged-in 
users  was discussed -- sounds interesting, but maybe some tricky parts 
to it


-Mike


post filtering for boolean filter queries

2013-12-02 Thread Dmitry Kan
Hello!

We have been experimenting with post filtering lately. Our setup is a
filter having long boolean query; drawing the example from the Dublin's
Stump the Chump:

fq=UserId:(user1 OR user2 OR...OR user1000)

The underlining issue impacting performance is that the combination of user
ids in the query above is unique per each user in the system and on top the
combination is changing every day.

Our idea was to stop caching the filter query with {!cache=false}. Since
there is no way to introspect the contents of the filter cache to our
knowledge (jmx?), we can't be sure those are not cached. This is because
the initial query per each combination takes substantially more time (as if
it was *not* cached) than the second and subsequent queries with the same
fq (as if it *was* cached).

Question is: does post filtering support boolean queries in fq params?

Another thing we have been trying is assigning a cost to the fq relatively
higher than for other filter queries. Does this feature support the boolean
queries in fq params as well?

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan