Hi qibaoyuan, I tried your second solution, using the scoring data. I think in this way, I could use MoreLikeThis. All documents with a score > X are a possible match :-).
Thanks! Jochen 2012/9/7 qibaoyuan <qibaoy...@126.com> > > MAYBE you could alter MLT to make him working on AND > operator.But,i don't think thers is anything wrong with using OR > opearator.Lucne will rank all the docs depending on the undeylying > similarity algorithem(SVM,BM25 etc.).Just as you case,Docs2 will be rank > firstly because it matches the most words in DOC4 . Further, other docs > containing SOME words in DOC4 may be listed too, but will get lower score. > > > At 2012-09-07 15:32:23,"Jochen Hebbrecht" <jochenhebbre...@gmail.com> > wrote: > >Hi, > > > >Imagine you are indexing the following documents (every line is stored in > 1 > >single field, analyzed with the default StandardAnalyzer): > >- Doc 1: restaurant 't Robbeke fish passoa beer 15 EUR 5 EUR 2 EUR total > 22 > >EUR > >- Doc 2: restaurant De Genieter scampi's fish sticks cola fanta 18 EUR 15 > >EUR 2 EUR 2 EUR total 37 EUR > >- Doc 3: restaurant 't Stoveke frites meat beer 10 EUR 5 EUR total 15 EUR > > > >Now, I have a following document with the following field: > >- Doc 4: restaurant De Genieter VAT 37 EUR > > > >I'm wondering if Lucene has a feature to find the "most-matching" > document. > >In my example, the "most-matching" document for Doc 4 is > >Doc 2. > >I've played around with "MoreLikeThis", but this seems to be creating a > >query with an OR operator for each term. So it created something like this > >"restaurant OR de OR genieter OR VAT OR VAT OR 37 EUR". > >Lucene has to be matching on "restaurant" AND "de" AND "genieter" AND "37" > >AND "EUR". Well, it shouldn't be really AND'ing all terms, because I'm > >looking for the best match. And it could be some term should be removed > >from the list, to get to the best match. > > > >Maybe it can generate a kind of percentage/scoring to tell me which > >document is the closest to Doc 4? Does Lucene have this kind of feature? > > > >Thanks in advance for any answer, > >Jochen >