subject:"Re\: Query vs Filter Query Usage"

Re: Query vs Filter Query Usage

2011-08-25 Thread Erick Erickson

The pitfalls of filter queries is also their strength. The results will be
cached and re-used if possible. This will take some memory,
of course. Depending upon how big your index is, this could
be quite a lot.

Yet another time/space tradeoff But yeah, use filter queries
until you have OOMs, then get more memory G...

Best
Erick

On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com wrote:
 Shawn -

     Thanks for your reply. Given that my application is mainly used as
 faceted search, would the following types of queries make sense or are there
 other pitfalls to consider?

 *q=*:*fq=someField:someValuefq=anotherField:anotherValue*

 Thanks!

 Josh

 On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/24/2011 2:02 PM, Joshua Harness wrote:

      I've done some basic query performance testing on my SOLR instance,
 which allows users to search via a faceted search interface. As such,
 document relevancy is less important to me since I am performing exact
 match
 searching. Comparing using filter queries with a plain query has yielded
 remarkable performance.  However, I'm suspicious of statements like
 'always
 use filter queries since they are so much faster'. In my experience,
 things
 are never so straightforward. Can anybody provide any further guidance?
 What
 are the pitfalls of relying heavily on filter queries? When would one want
 to use plain vanilla SOLR queries as opposed to filter queries?


 Completely separate from any performance consideration, the key to their
 usage lies in their name:  They are filters.  They are particularly useful
 in a faceted situation, because you can have more than one of them, and the
 overall result is the intersection (AND) of them all.

 When someone tells the interface to restrict their search by a facet, you
 can simply add a filter query with the field:value relating to that facet
 and reissue the query.  If they decide to remove that restriction, you just
 have to remove the filter query.  You don't have to try and combine the
 various pieces in the query, which means you'll have much less hassle with
 parentheses.

 If you need a union (OR) operation with your filters, you'll have to use
 more complex construction within a single filter query, or not use them at
 all.

 Thanks,
 Shawn

Re: Query vs Filter Query Usage

2011-08-25 Thread Joshua Harness

Erick -

Thanks for the insight. Does the filter cache just cache the internal
document id's of the result set, correct (as opposed to the document)? If
so, am I correct in the following math:

10,000,000 document index
Internal Document id is 32 bit unsigned int
Max Memory Used by a single cache slot in the filter cache = 32 bits x
10,000,000 docs = 320,000,000 bits or 38 MB

Of course, I realize there some additional overhead if we're dealing with
Integer objects as opposed to primitives -- and I'm way off if the internal
document id is implemented as a long.

Also, does SOLR fail gracefully when an OOM occurs (e.g. the cache fails but
the query still succeeds)?

Thanks!

Josh

On Thu, Aug 25, 2011 at 2:55 PM, Erick Erickson erickerick...@gmail.comwrote:

 The pitfalls of filter queries is also their strength. The results will be
 cached and re-used if possible. This will take some memory,
 of course. Depending upon how big your index is, this could
 be quite a lot.

 Yet another time/space tradeoff But yeah, use filter queries
 until you have OOMs, then get more memory G...

 Best
 Erick

 On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com
 wrote:
  Shawn -
 
  Thanks for your reply. Given that my application is mainly used as
  faceted search, would the following types of queries make sense or are
 there
  other pitfalls to consider?
 
  *q=*:*fq=someField:someValuefq=anotherField:anotherValue*
 
  Thanks!
 
  Josh
 
  On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 8/24/2011 2:02 PM, Joshua Harness wrote:
 
   I've done some basic query performance testing on my SOLR
 instance,
  which allows users to search via a faceted search interface. As such,
  document relevancy is less important to me since I am performing exact
  match
  searching. Comparing using filter queries with a plain query has
 yielded
  remarkable performance.  However, I'm suspicious of statements like
  'always
  use filter queries since they are so much faster'. In my experience,
  things
  are never so straightforward. Can anybody provide any further guidance?
  What
  are the pitfalls of relying heavily on filter queries? When would one
 want
  to use plain vanilla SOLR queries as opposed to filter queries?
 
 
  Completely separate from any performance consideration, the key to their
  usage lies in their name:  They are filters.  They are particularly
 useful
  in a faceted situation, because you can have more than one of them, and
 the
  overall result is the intersection (AND) of them all.
 
  When someone tells the interface to restrict their search by a facet,
 you
  can simply add a filter query with the field:value relating to that
 facet
  and reissue the query.  If they decide to remove that restriction, you
 just
  have to remove the filter query.  You don't have to try and combine the
  various pieces in the query, which means you'll have much less hassle
 with
  parentheses.
 
  If you need a union (OR) operation with your filters, you'll have to use
  more complex construction within a single filter query, or not use them
 at
  all.
 
  Thanks,
  Shawn

RE: Query vs Filter Query Usage

2011-08-25 Thread Michael Ryan

 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

I think it depends on where exactly the result set was generated. I believe the 
result set will usually be represented by a BitDocSet, which requires 1 bit per 
doc in your index (result set size doesn't matter), so in your case it would be 
about 1.2MB.

-Michael

Re: Query vs Filter Query Usage

2011-08-25 Thread Yonik Seeley

On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote:
 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

 I think it depends on where exactly the result set was generated. I believe 
 the result set will usually be represented by a BitDocSet, which requires 1 
 bit per doc in your index (result set size doesn't matter), so in your case 
 it would be about 1.2MB.

Right - and Solr switches between the implementation depending on set
size... so if the number of documents in the set were 100, then it
would only take up 400 bytes.

-Yonik
http://www.lucidimagination.com

Re: Query vs Filter Query Usage

2011-08-25 Thread Lance Norskog

The point of filter queries is that they are applied very early in the
searching algorithm, and thus cut the amount of work later on. Some
complex queries take a lot of time and so this pre-trimming helps a
lot.

On Thu, Aug 25, 2011 at 2:37 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote:
 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

 I think it depends on where exactly the result set was generated. I believe 
 the result set will usually be represented by a BitDocSet, which requires 1 
 bit per doc in your index (result set size doesn't matter), so in your case 
 it would be about 1.2MB.

 Right - and Solr switches between the implementation depending on set
 size... so if the number of documents in the set were 100, then it
 would only take up 400 bytes.

 -Yonik
 http://www.lucidimagination.com




-- 
Lance Norskog
goks...@gmail.com

Re: Query vs Filter Query Usage

2011-08-24 Thread Shawn Heisey


On 8/24/2011 2:02 PM, Joshua Harness wrote:

  I've done some basic query performance testing on my SOLR instance,
which allows users to search via a faceted search interface. As such,
document relevancy is less important to me since I am performing exact match
searching. Comparing using filter queries with a plain query has yielded
remarkable performance.  However, I'm suspicious of statements like 'always
use filter queries since they are so much faster'. In my experience, things
are never so straightforward. Can anybody provide any further guidance? What
are the pitfalls of relying heavily on filter queries? When would one want
to use plain vanilla SOLR queries as opposed to filter queries?


Completely separate from any performance consideration, the key to their 
usage lies in their name:  They are filters.  They are particularly 
useful in a faceted situation, because you can have more than one of 
them, and the overall result is the intersection (AND) of them all.


When someone tells the interface to restrict their search by a facet, 
you can simply add a filter query with the field:value relating to that 
facet and reissue the query.  If they decide to remove that restriction, 
you just have to remove the filter query.  You don't have to try and 
combine the various pieces in the query, which means you'll have much 
less hassle with parentheses.


If you need a union (OR) operation with your filters, you'll have to use 
more complex construction within a single filter query, or not use them 
at all.


Thanks,
Shawn

Re: Query vs Filter Query Usage

2011-08-24 Thread Joshua Harness

Shawn -

 Thanks for your reply. Given that my application is mainly used as
faceted search, would the following types of queries make sense or are there
other pitfalls to consider?

*q=*:*fq=someField:someValuefq=anotherField:anotherValue*

Thanks!

Josh

On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/24/2011 2:02 PM, Joshua Harness wrote:

  I've done some basic query performance testing on my SOLR instance,
 which allows users to search via a faceted search interface. As such,
 document relevancy is less important to me since I am performing exact
 match
 searching. Comparing using filter queries with a plain query has yielded
 remarkable performance.  However, I'm suspicious of statements like
 'always
 use filter queries since they are so much faster'. In my experience,
 things
 are never so straightforward. Can anybody provide any further guidance?
 What
 are the pitfalls of relying heavily on filter queries? When would one want
 to use plain vanilla SOLR queries as opposed to filter queries?


 Completely separate from any performance consideration, the key to their
 usage lies in their name:  They are filters.  They are particularly useful
 in a faceted situation, because you can have more than one of them, and the
 overall result is the intersection (AND) of them all.

 When someone tells the interface to restrict their search by a facet, you
 can simply add a filter query with the field:value relating to that facet
 and reissue the query.  If they decide to remove that restriction, you just
 have to remove the filter query.  You don't have to try and combine the
 various pieces in the query, which means you'll have much less hassle with
 parentheses.

 If you need a union (OR) operation with your filters, you'll have to use
 more complex construction within a single filter query, or not use them at
 all.

 Thanks,
 Shawn

Re: Query vs Filter Query Usage

2011-08-24 Thread Shawn Heisey


On 8/24/2011 6:07 PM, Joshua Harness wrote:

Shawn -

  Thanks for your reply. Given that my application is mainly used as
faceted search, would the following types of queries make sense or are there
other pitfalls to consider?

q=*:*fq=someField:someValuefq=anotherField:anotherValue


I'm no expert, but that looks like the perfect thing to do with filter 
queries.  One thing that you might want to think about and experiment 
with is removing someField and anotherField from the faceting when you 
issue a query like that.  It would likely work fine if you left them in, 
but there's not really a need to facet on a field that you've limited to 
a single value.


Thanks,
Shawn

Re: Query vs Filter Query Usage

Re: Query vs Filter Query Usage

RE: Query vs Filter Query Usage

Re: Query vs Filter Query Usage

Re: Query vs Filter Query Usage

Re: Query vs Filter Query Usage

Re: Query vs Filter Query Usage

Re: Query vs Filter Query Usage

8 matches

Site Navigation

Mail list logo

Footer information