Regarding Part3: Data quality For our search domain (catalog products) we face very often the problem that the search data is full of acronyms and abbreviations like: cable,nym-j,pvc,3x2.5mm² or dvd-/cd-/usb-carradio,4x50W,divx,bl
We solved this by a combination of normalization for better data quality (or less variations) and some tolerant sloppy phrase search where the search token needs only to partly match an indexed token. We use here a dictionary lookup approach into the indexed tokens of some fields and expand the users query with a well weighted set of search terms. It took us some iterations to get this right and fast enough to search in several million products. The next step on our list are facets. Uwe -----Ursprüngliche Nachricht----- Von: mbennett.idea...@gmail.com [mailto:mbennett.idea...@gmail.com] Im Auftrag von Mark Bennett Gesendet: Donnerstag, 29. April 2010 16:59 An: java-user@lucene.apache.org Betreff: Re: Relevancy Practices Hi Grant, You're welcome to use any of my slides (Dave's got them), with attribution of course. BUT.... Have you considered a section something like "why the hell do you think Relevancy tweaking is gonna save you!?!?" Basically that, as a corpus grows exponentially, so do results list sizes, so ALL relevancy tweaks will eventually fail. And FACETS (or other navigators) are the answer. I've got slides on that as well. Of course relevancy matters.... but it's only ONE of perhaps a three pronged approach: 1: Organic Relevancy and top query suggetions 2: Results list Navigators, the best the system can support, and 3: Data quality (spidering, METADATA quality, source weighting, etc) Mark -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Thu, Apr 29, 2010 at 7:14 AM, Grant Ingersoll <gsing...@apache.org>wrote: > I'm putting on a talk at Lucene Eurocon ( > http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical > Relevance" and I'm curious as to what people put in practice for testing and > improving relevance. I have my own inclinations, but I don't want to muddy > the water just yet. So, if you have a few moments, I'd love to hear > responses to the following questions. > > What worked? > What didn't work? > What didn't you understand about it? > What tools did you use? > What tools did you wish you had either for debugging relevance or "fixing" > it? > How much time did you spend on it? > How did you avoid over/under tuning? > What stage of development/testing/production did you decide to do relevance > tuning? Was that timing planned or not? > > > Thanks, > Grant > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org