Re: Relevancy Practices

Peter Keegan Mon, 03 May 2010 11:09:28 -0700

We discovered very soon after going to production that Lucene's scores were
often 'too precise'. For example, a page of 25 results may have several
different score values, and all within 15% of each other, but to the end
user all 25 results were equally relevant. Thus we wanted the secondary sort
field to determine the order, instead. This required writing a custom score
comparator to 'round' the scores. The same thing occurred for distance
sorting. We also limit the effect of term frequency to help prevent
spamming.  In comparison to Avi, we use 'AND' as the default operator for
keyword queries and if no docs are found, the query is automatically retried
with 'OR'. This improves precision a bit and only occurs if the user
provides no operators.


Lucene's Explanation class has been invaluable in helping me to explain a
particular sort order in many, many situations.
Most of our relevance tuning has occurred after deployment to production.

Peter

On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <gsing...@apache.org>wrote:

> I'm putting on a talk at Lucene Eurocon (
> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> Relevance" and I'm curious as to what people put in practice for testing and
> improving relevance.  I have my own inclinations, but I don't want to muddy
> the water just yet.  So, if you have a few moments, I'd love to hear
> responses to the following questions.
>
> What worked?
> What didn't work?
> What didn't you understand about it?
> What tools did you use?
> What tools did you wish you had either for debugging relevance or "fixing"
> it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide to do relevance
> tuning?  Was that timing planned or not?
>
>
> Thanks,
> Grant
>

Re: Relevancy Practices

Reply via email to