Re: fq vs q parameter
Yes, definitely, fq parameters don't affect scoring and can be cached. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Wed, Jun 19, 2013 at 4:27 PM, Learner bbar...@gmail.com wrote: Hi, I am currently using the below configuration in one of my handler and I was thinking of removing the values from q parameter and including as a part of fq parameter. Can someone let me know if there is any performance improvement when using fq parameter compared to q? str name=q ( _query_:{!dismax qf=person_name_lname_i v=$fps_lname}^8.3 OR ) /str /lst lst name=appends str name=fq{!switch case='*:*' default=$fq_bbox v=$fps_latlong}/str /lst lst name=invariants str name=fq_bbox_query_:{!bbox pt=$fps_latlong sfield=geo d=$fps_dist}^0.2/str /lst -- View this message in context: http://lucene.472066.n3.nabble.com/fq-vs-q-parameter-tp4071748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs q parameter
I see that your query has boost value so this mean you need Solr to Score on each match document. One of the key difference between q and fq is thats fq will not have any impact on score. where as having it in q will score each document based on the Similarity Score. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-vs-q-parameter-tp4071748p4071758.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs q parameter
+1 q and fq both can be cached. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-vs-q-parameter-tp4071748p4071759.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs. q
Fergus McMenemie schrieb: While q= and fq= affect the results portion of a search response. The facet.query only affects the facets portion of a response. facet.query(s) are only used where you want a facet summary of your query based on some kind of complex expression rather than the terms within a single field. I added the comment in that I think that a wiki page discussing fs vs q should also mention facet.query. It now does: http://wiki.apache.org/solr/FilterQueryGuidance Michael Ludwig
Re: fq vs. q
Fergus McMenemie schrieb: The article could explain the difference between fq= and facet.query= and when you should use one in preference to the other. My understanding is that while these query modifiers rely on the same implementation (cached filters) to boost performance, they simply and obviously differ in that fq limits the result set to your filter criterion whereas facet.query does not restrict the result but instead enhances it with statistical information gained from applying set intersection of result and facet query filters. It looks like facet.query is just a more flexible means of defining a filter than possible using a mere facet.field. Would that be approximately correct? Yes. While q= and fq= affect the results portion of a search response. The facet.query only affects the facets portion of a response. facet.query(s) are only used where you want a facet summary of your query based on some kind of complex expression rather than the terms within a single field. I added the comment in that I think that a wiki page discussing fs vs q should also mention facet.query. It appears to me that each facet.query invariably leads to one boolean filter, so if you wanted to do range faceting for a given field and obtain, say, results reduced from their actual continuum of values to three ranges {A,B,C}, you'd have to define three facet.query parameters accordingly. A mere facet.field, on the other hand, creates as many filters as there are unique values in the field. Is that correct? Yes, A single facet.query on its own is probably useless. You would need many of them. And as they have to be re-calculated after each query I would imagine they are expensive. Also, given that facets are used to help drive GUI options which turn drive the contents of subsequent fq= filters, I am wondering fq= queries are not analyzed before the search is made but I get the impression that facet.querys are! This could be a big pitfall. Michael Ludwig Fergus. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: fq vs. q
Ensdorf Ken schrieb: I ran into this very issue recently as we are using a freshness filter for our data that can be 6//12/18 months etc. I discovered that even though we were only indexing with day-level granularity, we were specifying the query by computing a date down to the second and thus virutally every filter was unique. It's amazing how something this simple could bring solr to it's knees on a large data set. I want to retrieve documents (TV programs) by a particular date and decided to convert the date to an integer, so I have: * 20090615 * 20090616 * 20090617 etc. I lose all date logic (timezones) for that date field, but it works for this particular use case, as the date is merely a tag, and not a real date I need to perform more logic on than an integer allows. Also, an integer looks about as efficient as it gets, so I thought it preferable to a date for this use case. YMMV. I think if you truncate dates to incomplete dates, you effectively also lose all the date logic. You may still apply it, but what would you take the result to mean? You can't regain precision you've decided to drop. The actual points in time where my TV programs start and end are encoded as a UNIX timestamp with exactitude down to the second, also stored as an integer, as I don't need sub-second precision. This makes sense for my client, which is not Java, but PHP, so it uses the C library strftime and friends, which need UNIX timestamps. Bottom line, I think it may make perfect sense to store dates and times in integers, depending on your use case and your client. Michael Ludwig
Re: fq vs. q
On Mon, Jun 15, 2009 at 4:39 PM, Michael Ludwig m...@as-guides.com wrote: I want to retrieve documents (TV programs) by a particular date and decided to convert the date to an integer, so I have: * 20090615 * 20090616 * 20090617 etc. I lose all date logic (timezones) for that date field, but it works for this particular use case, as the date is merely a tag, and not a real date I need to perform more logic on than an integer allows. Also, an integer looks about as efficient as it gets, so I thought it preferable to a date for this use case. YMMV. I think if you truncate dates to incomplete dates, you effectively also lose all the date logic. You may still apply it, but what would you take the result to mean? You can't regain precision you've decided to drop. Note that with Trie search coming in (see example schema.xml in the nightly builds), this rounding may not be necessary any more. -- Regards, Shalin Shekhar Mangar.
Re: fq vs. q
Fergus McMenemie schrieb: The article could explain the difference between fq= and facet.query= and when you should use one in preference to the other. My understanding is that while these query modifiers rely on the same implementation (cached filters) to boost performance, they simply and obviously differ in that fq limits the result set to your filter criterion whereas facet.query does not restrict the result but instead enhances it with statistical information gained from applying set intersection of result and facet query filters. It looks like facet.query is just a more flexible means of defining a filter than possible using a mere facet.field. Would that be approximately correct? A question of mine: It appears to me that each facet.query invariably leads to one boolean filter, so if you wanted to do range faceting for a given field and obtain, say, results reduced from their actual continuum of values to three ranges {A,B,C}, you'd have to define three facet.query parameters accordingly. A mere facet.field, on the other hand, creates as many filters as there are unique values in the field. Is that correct? Michael Ludwig
Re: fq vs. q
Shalin Shekhar Mangar schrieb: On Mon, Jun 15, 2009 at 4:39 PM, Michael Ludwig m...@as-guides.com wrote: I think if you truncate dates to incomplete dates, you effectively also lose all the date logic. You may still apply it, but what would you take the result to mean? You can't regain precision you've decided to drop. Note that with Trie search coming in (see example schema.xml in the nightly builds), this rounding may not be necessary any more. http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/schema.xml Not sure I understand correctly, but this sounds as if given an integer field and a @precisionStep of 3, the original value is stored along with three copies that omit (1) the last bit, (2) the two last bits, (3) the three last bits. So a given range query might be optimized to an equality query. But I'm not sure I'm on the right track here. Michael Ludwig
Re: fq vs. q
Michael Ludwig schrieb: Martin Davidsson schrieb: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there [...] some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. This is a good question. I don't know if there is any such rule. I'm going to sum up my understanding of filter queries hoping that the pros will point out any flaws in my assumptions. I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Michael Ludwig
Re: fq vs. q
On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. -- Regards, Shalin Shekhar Mangar.
Re: fq vs. q
On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. -- Regards, Shalin Shekhar Mangar. A very useful article. If I could chip in with another stupid but related issue. The article could explain the difference between fq= and facet.query= and when you should use one in preference to the other. Regards Fergus. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
RE: fq vs. q
-Original Message- From: Fergus McMenemie [mailto:fer...@twig.me.uk] Sent: Friday, June 12, 2009 3:41 PM To: solr-user@lucene.apache.org Subject: Re: fq vs. q On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. +1 definitely a great article I ran into this very issue recently as we are using a freshness filter for our data that can be 6//12/18 months etc. I discovered that even though we were only indexing with day-level granularity, we were specifying the query by computing a date down to the second and thus virutally every filter was unique. It's amazing how something this simple could bring solr to it's knees on a large data set. By simply changing the filter to date:[NOW-18MONTHS TO NOW] or equivalent, the problem vanishes. It does bring up an interestion question though - how is NOW treated wrt to the cache key? Does solr translate it to a date first? If so, how does it determine the granularity? If not, is there any mechanism to flush the cache when the corresponding result set changes? -Ken
Re: fq vs. q
On Sat, Jun 13, 2009 at 1:36 AM, Ensdorf Ken ensd...@zoominfo.com wrote: I ran into this very issue recently as we are using a freshness filter for our data that can be 6//12/18 months etc. I discovered that even though we were only indexing with day-level granularity, we were specifying the query by computing a date down to the second and thus virutally every filter was unique. It's amazing how something this simple could bring solr to it's knees on a large data set. By simply changing the filter to date:[NOW-18MONTHS TO NOW] or equivalent, the problem vanishes. Since you are indexing with day-level granularity, you should query too with the same granularity. For example, date:[NOW/DAY-18MONTHS TO NOW/DAY]. The '/' operator is used for rounding off in DateMath syntax ( http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html). Perhaps this is something we should document more clearly, we recently had high CPU issues with one of our webapps due to the same issue. It does bring up an interestion question though - how is NOW treated wrt to the cache key? Does solr translate it to a date first? If so, how does it determine the granularity? If not, is there any mechanism to flush the cache when the corresponding result set changes? The date math syntax is translated to a date before a search is performed. NOW is always granular upto seconds (maybe milliseconds, not sure). -- Regards, Shalin Shekhar Mangar.
Re: fq vs. q
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com wrote: A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we know what might be candidates for filtering. Sorry but I cant make any sense of the above. Could you have another go at explaining it? The result of a filter query is cached and then used to filter a primary query result using set intersection. If my filter query result comprises more than 50 % of the entire document collection, its selectivity is poor. I might need it despite this fact, but it might also be worth while thinking about how to reframe the requirement, allowing for more efficient filters. So, just to be explicit, if I have a query containing: fq=EventType:fairfq=EventType:filmfq=LAT:[50 TO 60]fq=LONG:[-1 TO 1] The first time this is encountered it is going to cause four queries of the entire index and cause four sets of document ID's to be cached. Subsequent queries will reuse the various cached entries as appropriate. Is that correct? I guess in the above case where my GEO search window will keep changing I should ideally arrange that the lat and long element is added to the q parameter to stop my cache being cluttered. Also what happens when the filter is full? If there any accounting of which cache entries are getting the most or most recent hits? Regards Fergus
Re: fq vs. q
Fergus McMenemie schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com wrote: A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we know what might be candidates for filtering. Sorry but I cant make any sense of the above. Could you have another go at explaining it? Filtering a given query result R on bla:eins, bla:zwei, bla:drei or bla:vier is very common in my application. So while I could include this criterion in my main query (q) and hope for the queryResultCache to kick in, this would be unlikely to be efficient as my primary query, which gave me R, likely varies a lot, resulting in a high number of distinct queries, with relatively low probability for a given query to occur frequently. So each of these query result sets would enter the queryResultCache as a distinct set, hence high contention, high eviction rate, poor cache efficiency. Now I'm going to factor out those bla:{eins,zwei,drei,vier} filters from my primary query (q) and put them in the filter query (fq). The benefit is double: (1) Solr has a dedicated cachespace for filters the usage of which I control by my usage of the filter query (fq). I can set up things so the usage of the primary query (q) is under the user's control while the usage of the filter query (fq) is under my application's control. I control this cache, I ensure its efficiency. (2) Factoring out the filter query bla:{eins,zwei,drei,vier} from the primary query also reduces variation in the primary query, thus making the queryResultCache more efficient. So instead of having, say, 1 distinct primary queries, no usage of the filterCache, and poor usage of the queryResultCache, I may have only, say, 3000 distinct primary queries, four cached filters in the filterCache (bla:{eins,zwei,drei,vier}), and a somewhat better usage of the queryResultCache. I wrote that we know how often certain queries arise, or at least have the means to collect that data, because we know the application we're writing, so we either know the frequency of a given search pattern based on the usage our application makes of Solr and on the restrictions it imposes on the user by, say, using Dismax; or - if we give the user fine-grained control over the query language - we may somehow collect and analyze the actual queries in order to empirically determine actual search engine usage and optimize accordingly. The result of a filter query is cached and then used to filter a primary query result using set intersection. If my filter query result comprises more than 50 % of the entire document collection, its selectivity is poor. I might need it despite this fact, but it might also be worth while thinking about how to reframe the requirement, allowing for more efficient filters. So, just to be explicit, if I have a query containing: fq=EventType:fairfq=EventType:filmfq=LAT:[50 TO 60]fq=LONG:[-1 TO 1] The first time this is encountered it is going to cause four queries of the entire index and cause four sets of document ID's to be cached. Subsequent queries will reuse the various cached entries as appropriate. Is that correct? I do think so. I guess in the above case where my GEO search window will keep changing I should ideally arrange that the lat and long element is added to the q parameter to stop my cache being cluttered. My understanding is that what varies heavily should *not* go into the filterCache. Your GEO search window might vary quite a bit (probably much more than EventType), so to me it looks like a candidate for the main query. Also what happens when the filter is full? If there any accounting of which cache entries are getting the most or most recent hits? Good question! Michael Ludwig
Re: fq vs. q
Martin Davidsson schrieb: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there [...] some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. This is a good question. I don't know if there is any such rule. I'm going to sum up my understanding of filter queries hoping that the pros will point out any flaws in my assumptions. http://wiki.apache.org/solr/SolrCaching - filterCache A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we know what might be candidates for filtering. The result of a filter query is cached and then used to filter a primary query result using set intersection. If my filter query result comprises more than 50 % of the entire document collection, its selectivity is poor. I might need it despite this fact, but it might also be worth while thinking about how to reframe the requirement, allowing for more efficient filters. Memory consumption is probably not a great concern here as the cache stores only document IDs. (And if those are integers, it's just 4 bytes each.) So having 100 filters containing 100,000 items on average, the memory consumption increase should be around 40 MB. By the way, are these document IDs (user in filterCache, documentCache, queryResultCache) the ones I configure in schema.xml or does Solr map my IDs to integers in order to ensure efficiency? A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field category, which is a required field. In the class of searches where I use a filter on that field, the primary search is for something entirely different, so in most cases, it will not, or not necessarily, bias the primary result to any particular distribution of the category values. I then allow the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Michael Ludwig
Re: fq vs. q
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com wrote: http://wiki.apache.org/solr/SolrCaching - filterCache A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we know what might be candidates for filtering. Correct. The result of a filter query is cached and then used to filter a primary query result using set intersection. If my filter query result comprises more than 50 % of the entire document collection, its selectivity is poor. I might need it despite this fact, but it might also be worth while thinking about how to reframe the requirement, allowing for more efficient filters. Correct. Memory consumption is probably not a great concern here as the cache stores only document IDs. (And if those are integers, it's just 4 bytes each.) So having 100 filters containing 100,000 items on average, the memory consumption increase should be around 40 MB. A lot of times it is stored as a bitset so the memory requirements may be even lesser. By the way, are these document IDs (user in filterCache, documentCache, queryResultCache) the ones I configure in schema.xml or does Solr map my IDs to integers in order to ensure efficiency? These are internal doc ids assigned by Lucene. A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field category, which is a required field. In the class of searches where I use a filter on that field, the primary search is for something entirely different, so in most cases, it will not, or not necessarily, bias the primary result to any particular distribution of the category values. I then allow the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Yes and no. There are use-cases where the query is applicable only to the filtered set. For example, when the same index contains many different types of documents. It is just that the intersection may need to do more or less work. -- Regards, Shalin Shekhar Mangar.
Re: fq vs. q
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com wrote: A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field category, which is a required field. In the class of searches where I use a filter on that field, the primary search is for something entirely different, so in most cases, it will not, or not necessarily, bias the primary result to any particular distribution of the category values. I then allow the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Yes and no. There are use-cases where the query is applicable only to the filtered set. For example, when the same index contains many different types of documents. It is just that the intersection may need to do more or less work. Sorry, I don't understand. I used to think that the engine applies the filter to the primary query result. What you're saying here sounds as if it could also pre-filter my document collection to then apply a query to it (which should yield the same result). What does it mean that the query is applicable only to the filtered set? And thanks for having clarified the other points! Michael Ludwig
Re: fq vs. q
On Tue, Jun 9, 2009 at 11:11 PM, Michael Ludwig m...@as-guides.com wrote: Sorry, I don't understand. I used to think that the engine applies the filter to the primary query result. What you're saying here sounds as if it could also pre-filter my document collection to then apply a query to it (which should yield the same result). What does it mean that the query is applicable only to the filtered set? Sorry for not being clear. No, both filters and queries are computed on the entire index. My comment was related to the A filter query should probably be orthogonal to the primary query... part. I meant that both kinds of use-cases are common. -- Regards, Shalin Shekhar Mangar.
Re: fq vs. q
Shalin Shekhar Mangar schrieb: No, both filters and queries are computed on the entire index. My comment was related to the A filter query should probably be orthogonal to the primary query... part. I meant that both kinds of use-cases are common. Got it. Thanks :-) Michael Ludwig
Re: fq vs. q
It's definitely not proper documentation but maybe can give you a hand: http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput/ Martin Davidsson-2 wrote: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there some documentation that describes how each field is treated internally, or even better, some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. In most cases, I'm looking for exact matches but sometimes an occasional wildcard query shows up too. Thank you! -- Martin -- View this message in context: http://www.nabble.com/fq-vs.-q-tp23845282p23847845.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs. q
wow! that was a good read!!! On Wed, Jun 3, 2009 at 2:23 PM, Marc Sturlese marc.sturl...@gmail.comwrote: It's definitely not proper documentation but maybe can give you a hand: http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput/ Martin Davidsson-2 wrote: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there some documentation that describes how each field is treated internally, or even better, some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. In most cases, I'm looking for exact matches but sometimes an occasional wildcard query shows up too. Thank you! -- Martin -- View this message in context: http://www.nabble.com/fq-vs.-q-tp23845282p23847845.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs. q
On Wed, Jun 3, 2009 at 1:53 AM, Marc Sturlese marc.sturl...@gmail.comwrote: It's definitely not proper documentation but maybe can give you a hand: http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput/ Martin Davidsson-2 wrote: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there some documentation that describes how each field is treated internally, or even better, some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. In most cases, I'm looking for exact matches but sometimes an occasional wildcard query shows up too. Thank you! -- Martin -- View this message in context: http://www.nabble.com/fq-vs.-q-tp23845282p23847845.html Sent from the Solr - User mailing list archive at Nabble.com. Thanks, I'd seen that article too. I totally agree that it's worth understanding how things are treated under the hood. That's the kind of literature I'm looking for I guess. Given that article, I wasn't sure what the query would look like if I need to query against multiple fields. Let's say I have a name field and a brand field and I want to find the Apple iPod. Using only the 'q' param the query would look like select?q=brand:Apple AND name:iPod Is there a better query format that utilizes the fq field? Thanks again -- Martin