Re: A few random questions about solr queries.

2012-06-03 Thread Erick Erickson
See below:

On Tue, May 29, 2012 at 6:18 AM, santamaria2 aravinda@contify.com wrote:
 *1)* With faceting, how does facet.query perform in comparison to
 facet.field? I'm just wondering this as in my use case, I need to facet over
 a field -- which would get me the top n facets for that field, but I also
 need to show the count for a selected filter which might have a relatively
 low count so it doesn't appear in the top n returned facets. So the solution
 would be to 'ensure' its presence by adding a 'facet.query=cat:val' in
 addition to my facet.field=cat.

You have two choices here. Either specify that the return should
contain the top, say,
1,000,000 responses (which would be a disaster in some cases) and
facet by field, or
facet by query. You really don't have any other choice than to add the
facet.query here
so performance is moot.


 I want to do this to quite a few fields.

 Related/example-based question:
 When I facet over a field, and something gets returned, eg: John Smith (83),
 and I also 'ensure' this facet's presence by having it in
 facet.query=author:John Smith, are two different calculations performed?
 Or is the facet returned by facet.field also used by facet.query to obtain
 the count?


I'm pretty sure that two different calculations are performed, but
don't know for
certain. But again, it seems like your use-case requires the addition of the
query so why does it matter?



 *2) *Is there a performance issue if I have around, say, 20 facet.query
 conditions along with 10 facet.fields? 3/10 of those fields have around
 100,000 possible values. Remaining have a few hundred each.


It Depends (tm). You don't say, for instance, how big your index is. Or how much
memory you have or. Really, the only good way to answer this question
is to try it and _then_ worry about it. So far, you've really described your
requirements so asking low-level implementation details seems premature unless
and until you see a performance problem.



 *3)* I've rummaged around a bit, looking for info on when to use q vs fq. I
 want to clear my doubts for a certain use case.

 Where should my date range queries go? In q or fq? The default settings in
 my site show results from the past 90 days with buttons to show stuff from
 the last month and week as well. But the user is allowed to use a slider to
 apply any date range... this is allowed, but it's not /that/ common.
 I definitely use fq for filtering various tags. Choosing a tag is a common
 activity.


In addition to Shawn's answer, using fq clauses enables using of the
filterCache
which can substantially increase performance, but see this blog post for some
interesting considerations when using NOW..

http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/

Best
Erick

 Should the date range query go in fq? As I mentioned, the default view shows
 stuff from the past 90 days. So on each new day does this like invalidate
 stuff in the cache? Or is stuff stored in the filtered cache in some way
 that makes it easy to fetch stuff from the past 89 days when a query is
 performed the next day?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: A few random questions about solr queries.

2012-06-01 Thread Shawn Heisey

On 5/29/2012 4:18 AM, santamaria2 wrote:

*3)* I've rummaged around a bit, looking for info on when to use q vs fq. I
want to clear my doubts for a certain use case.

Where should my date range queries go? In q or fq? The default settings in
my site show results from the past 90 days with buttons to show stuff from
the last month and week as well. But the user is allowed to use a slider to
apply any date range... this is allowed, but it's not /that/ common.
I definitely use fq for filtering various tags. Choosing a tag is a common
activity.


I can't answer your facet questions, but this one I can.  If you are 
using the default relevancy ranking and you do not want the values in a 
given part of your search to affect the score, put it in a filter query 
(fq).  Also, if you are sorting all your search results in a 
deterministic way rather than using relevancy, use a filter query.


If you do want those values to affect the score, which is normal for 
fulltext fields, put your search clause in the regular query (q).  Most 
of the time, a date range is not something that you want to affect the 
relevancy score, so it is a perfect candidate for filter queries.


Thanks,
Shawn



Re: A few random questions about solr queries.

2012-05-31 Thread santamaria2
A wee bit of clarification on the 2nd question. I meant relative performance,
ie. would it be much slower to facet over 20 facet.queries  10 facet.fields
compared to say, 4 facet.queries  facet.fields. I wonder if this makes
sense...

So... is a bump improper etiquette here? _

--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562p3986977.html
Sent from the Solr - User mailing list archive at Nabble.com.


A few random questions about solr queries.

2012-05-29 Thread santamaria2
*1)* With faceting, how does facet.query perform in comparison to
facet.field? I'm just wondering this as in my use case, I need to facet over
a field -- which would get me the top n facets for that field, but I also
need to show the count for a selected filter which might have a relatively
low count so it doesn't appear in the top n returned facets. So the solution
would be to 'ensure' its presence by adding a 'facet.query=cat:val' in
addition to my facet.field=cat.

I want to do this to quite a few fields.

Related/example-based question:
When I facet over a field, and something gets returned, eg: John Smith (83),
and I also 'ensure' this facet's presence by having it in
facet.query=author:John Smith, are two different calculations performed?
Or is the facet returned by facet.field also used by facet.query to obtain
the count?



*2) *Is there a performance issue if I have around, say, 20 facet.query
conditions along with 10 facet.fields? 3/10 of those fields have around
100,000 possible values. Remaining have a few hundred each.



*3)* I've rummaged around a bit, looking for info on when to use q vs fq. I
want to clear my doubts for a certain use case.

Where should my date range queries go? In q or fq? The default settings in
my site show results from the past 90 days with buttons to show stuff from
the last month and week as well. But the user is allowed to use a slider to
apply any date range... this is allowed, but it's not /that/ common. 
I definitely use fq for filtering various tags. Choosing a tag is a common
activity.

Should the date range query go in fq? As I mentioned, the default view shows
stuff from the past 90 days. So on each new day does this like invalidate
stuff in the cache? Or is stuff stored in the filtered cache in some way
that makes it easy to fetch stuff from the past 89 days when a query is
performed the next day?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562.html
Sent from the Solr - User mailing list archive at Nabble.com.