On 7/25/06, Martin Braun <[EMAIL PROTECTED]> wrote:
Hi Yonik,>> I can't figure out what the parameters does. ;) > > Yes, it will fail without slop... I don't think there is a practical > way around that. I am trying to analyze your WordDelimiterFilter. If I have x-men, after analyzing (with catenateAll) I get this: Analzying "The x-men story" de.unihd.ub.ftsearch.WordIndexAnalyzer: [the] [x] [men] [xmen] [story] 1: [the:0->3:word] 2: [x:4->5:word] 3: [men:6->9:word] [xmen:4->9:word] 4: [story:10->15:word] 1: [the] 2: [x] 3: [men] [xmen] 4: [story] So a Phrase search to "The xmen story" will fail. With a slop of 1 the doc will be found. But when generating the query I won't know when to use a slop. So adding slops isn't a nice solution.
If you can't tolerate slop, this is a problem. The only 100% solution that I could think of to this problem is to re-index the entire stream (with a very large position gap inbetween) for each variant. "the x men story" "the xmen story" Problems: 1) combinatorial explosion very quickly (not practical at all) 2) messes up idfs pretty badly Phrase slop is the easiest workaround, esp when you wanted slop anyway.
Would it be a solution, to take the concatenated synonyms to both Positions? Or are there any drawbacks with this?
I considered that too... but it increases false matches, and it still doesn't fix many phrase queries.
1: [the] 2: [x] [xmen] 3: [men] [xmen] 4: [story]
While "the xmen" and "xmen story" will now both match, "the xmen story" will still fail to match. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
