Regarding Part3: 
Data quality
For our search domain (catalog products) we face very often the problem that 
the search data is full of acronyms and abbreviations like:
cable,nym-j,pvc,3x2.5mm²
or
dvd-/cd-/usb-carradio,4x50W,divx,bl

We solved this by a combination of normalization for better data quality (or 
less variations)
and some tolerant sloppy phrase search where the search token needs only to 
partly match an indexed token.
We use here a dictionary lookup approach into the indexed tokens of some fields 
and expand the users query with a well weighted set of search terms.

It took us some iterations to get this right and fast enough to search in 
several million products.
The next step on our list are facets.

Uwe 


-----Ursprüngliche Nachricht-----
Von: mbennett.idea...@gmail.com [mailto:mbennett.idea...@gmail.com] Im Auftrag 
von Mark Bennett
Gesendet: Donnerstag, 29. April 2010 16:59
An: java-user@lucene.apache.org
Betreff: Re: Relevancy Practices

Hi Grant,

You're welcome to use any of my slides (Dave's got them), with attribution
of course.

BUT....

Have you considered a section something like "why the hell do you think
Relevancy tweaking is gonna save you!?!?"

Basically that, as a corpus grows exponentially, so do results list sizes,
so ALL relevancy tweaks will eventually fail.  And FACETS (or other
navigators) are the answer.  I've got slides on that as well.

Of course relevancy matters.... but it's only ONE of perhaps a three pronged
approach:
1: Organic Relevancy and top query suggetions
2: Results list Navigators, the best the system can support, and
3: Data quality (spidering, METADATA quality, source weighting, etc)

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, Apr 29, 2010 at 7:14 AM, Grant Ingersoll <gsing...@apache.org>wrote:

> I'm putting on a talk at Lucene Eurocon (
> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> Relevance" and I'm curious as to what people put in practice for testing and
> improving relevance.  I have my own inclinations, but I don't want to muddy
> the water just yet.  So, if you have a few moments, I'd love to hear
> responses to the following questions.
>
> What worked?
> What didn't work?
> What didn't you understand about it?
> What tools did you use?
> What tools did you wish you had either for debugging relevance or "fixing"
> it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide to do relevance
> tuning?  Was that timing planned or not?
>
>
> Thanks,
> Grant
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to