Re: Query vs Filter Query Usage
The pitfalls of filter queries is also their strength. The results will be cached and re-used if possible. This will take some memory, of course. Depending upon how big your index is, this could be quite a lot. Yet another time/space tradeoff But yeah, use filter queries until you have OOMs, then get more memory G... Best Erick On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com wrote: Shawn - Thanks for your reply. Given that my application is mainly used as faceted search, would the following types of queries make sense or are there other pitfalls to consider? *q=*:*fq=someField:someValuefq=anotherField:anotherValue* Thanks! Josh On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote: On 8/24/2011 2:02 PM, Joshua Harness wrote: I've done some basic query performance testing on my SOLR instance, which allows users to search via a faceted search interface. As such, document relevancy is less important to me since I am performing exact match searching. Comparing using filter queries with a plain query has yielded remarkable performance. However, I'm suspicious of statements like 'always use filter queries since they are so much faster'. In my experience, things are never so straightforward. Can anybody provide any further guidance? What are the pitfalls of relying heavily on filter queries? When would one want to use plain vanilla SOLR queries as opposed to filter queries? Completely separate from any performance consideration, the key to their usage lies in their name: They are filters. They are particularly useful in a faceted situation, because you can have more than one of them, and the overall result is the intersection (AND) of them all. When someone tells the interface to restrict their search by a facet, you can simply add a filter query with the field:value relating to that facet and reissue the query. If they decide to remove that restriction, you just have to remove the filter query. You don't have to try and combine the various pieces in the query, which means you'll have much less hassle with parentheses. If you need a union (OR) operation with your filters, you'll have to use more complex construction within a single filter query, or not use them at all. Thanks, Shawn
Re: Query vs Filter Query Usage
Erick - Thanks for the insight. Does the filter cache just cache the internal document id's of the result set, correct (as opposed to the document)? If so, am I correct in the following math: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB Of course, I realize there some additional overhead if we're dealing with Integer objects as opposed to primitives -- and I'm way off if the internal document id is implemented as a long. Also, does SOLR fail gracefully when an OOM occurs (e.g. the cache fails but the query still succeeds)? Thanks! Josh On Thu, Aug 25, 2011 at 2:55 PM, Erick Erickson erickerick...@gmail.comwrote: The pitfalls of filter queries is also their strength. The results will be cached and re-used if possible. This will take some memory, of course. Depending upon how big your index is, this could be quite a lot. Yet another time/space tradeoff But yeah, use filter queries until you have OOMs, then get more memory G... Best Erick On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com wrote: Shawn - Thanks for your reply. Given that my application is mainly used as faceted search, would the following types of queries make sense or are there other pitfalls to consider? *q=*:*fq=someField:someValuefq=anotherField:anotherValue* Thanks! Josh On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote: On 8/24/2011 2:02 PM, Joshua Harness wrote: I've done some basic query performance testing on my SOLR instance, which allows users to search via a faceted search interface. As such, document relevancy is less important to me since I am performing exact match searching. Comparing using filter queries with a plain query has yielded remarkable performance. However, I'm suspicious of statements like 'always use filter queries since they are so much faster'. In my experience, things are never so straightforward. Can anybody provide any further guidance? What are the pitfalls of relying heavily on filter queries? When would one want to use plain vanilla SOLR queries as opposed to filter queries? Completely separate from any performance consideration, the key to their usage lies in their name: They are filters. They are particularly useful in a faceted situation, because you can have more than one of them, and the overall result is the intersection (AND) of them all. When someone tells the interface to restrict their search by a facet, you can simply add a filter query with the field:value relating to that facet and reissue the query. If they decide to remove that restriction, you just have to remove the filter query. You don't have to try and combine the various pieces in the query, which means you'll have much less hassle with parentheses. If you need a union (OR) operation with your filters, you'll have to use more complex construction within a single filter query, or not use them at all. Thanks, Shawn
RE: Query vs Filter Query Usage
10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. -Michael
Re: Query vs Filter Query Usage
On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. Right - and Solr switches between the implementation depending on set size... so if the number of documents in the set were 100, then it would only take up 400 bytes. -Yonik http://www.lucidimagination.com
Re: Query vs Filter Query Usage
The point of filter queries is that they are applied very early in the searching algorithm, and thus cut the amount of work later on. Some complex queries take a lot of time and so this pre-trimming helps a lot. On Thu, Aug 25, 2011 at 2:37 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. Right - and Solr switches between the implementation depending on set size... so if the number of documents in the set were 100, then it would only take up 400 bytes. -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: Query vs Filter Query Usage
On 8/24/2011 2:02 PM, Joshua Harness wrote: I've done some basic query performance testing on my SOLR instance, which allows users to search via a faceted search interface. As such, document relevancy is less important to me since I am performing exact match searching. Comparing using filter queries with a plain query has yielded remarkable performance. However, I'm suspicious of statements like 'always use filter queries since they are so much faster'. In my experience, things are never so straightforward. Can anybody provide any further guidance? What are the pitfalls of relying heavily on filter queries? When would one want to use plain vanilla SOLR queries as opposed to filter queries? Completely separate from any performance consideration, the key to their usage lies in their name: They are filters. They are particularly useful in a faceted situation, because you can have more than one of them, and the overall result is the intersection (AND) of them all. When someone tells the interface to restrict their search by a facet, you can simply add a filter query with the field:value relating to that facet and reissue the query. If they decide to remove that restriction, you just have to remove the filter query. You don't have to try and combine the various pieces in the query, which means you'll have much less hassle with parentheses. If you need a union (OR) operation with your filters, you'll have to use more complex construction within a single filter query, or not use them at all. Thanks, Shawn
Re: Query vs Filter Query Usage
Shawn - Thanks for your reply. Given that my application is mainly used as faceted search, would the following types of queries make sense or are there other pitfalls to consider? *q=*:*fq=someField:someValuefq=anotherField:anotherValue* Thanks! Josh On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote: On 8/24/2011 2:02 PM, Joshua Harness wrote: I've done some basic query performance testing on my SOLR instance, which allows users to search via a faceted search interface. As such, document relevancy is less important to me since I am performing exact match searching. Comparing using filter queries with a plain query has yielded remarkable performance. However, I'm suspicious of statements like 'always use filter queries since they are so much faster'. In my experience, things are never so straightforward. Can anybody provide any further guidance? What are the pitfalls of relying heavily on filter queries? When would one want to use plain vanilla SOLR queries as opposed to filter queries? Completely separate from any performance consideration, the key to their usage lies in their name: They are filters. They are particularly useful in a faceted situation, because you can have more than one of them, and the overall result is the intersection (AND) of them all. When someone tells the interface to restrict their search by a facet, you can simply add a filter query with the field:value relating to that facet and reissue the query. If they decide to remove that restriction, you just have to remove the filter query. You don't have to try and combine the various pieces in the query, which means you'll have much less hassle with parentheses. If you need a union (OR) operation with your filters, you'll have to use more complex construction within a single filter query, or not use them at all. Thanks, Shawn
Re: Query vs Filter Query Usage
On 8/24/2011 6:07 PM, Joshua Harness wrote: Shawn - Thanks for your reply. Given that my application is mainly used as faceted search, would the following types of queries make sense or are there other pitfalls to consider? q=*:*fq=someField:someValuefq=anotherField:anotherValue I'm no expert, but that looks like the perfect thing to do with filter queries. One thing that you might want to think about and experiment with is removing someField and anotherField from the faceting when you issue a query like that. It would likely work fine if you left them in, but there's not really a need to facet on a field that you've limited to a single value. Thanks, Shawn