Re: dismax catenated token search
: > Yes, pf should be replaced by a word proximity query that doesn't : > require all words to match :) : 2) dismax parameter that throws word catenations into the MaxDisjunction: :"a b c" would also search for ab and bc. that doesn't address the inverse problem: when "pain killer" is indexed but the user searches for "painkiller" I believe both problems can be solved by using the NgramTokenizer on a field in the qf ... but i have not tested this. (i'm not entreily certain what the NgramTokenizer does with whitespaces, so it might actually need to KeywordTokenizer followed by a Filter that strips out interword whitespace, followed by NgramTokenFilter ... or something like that. -Hoss
Re: dismax catenated token search
On 7/19/07, Mike Klaas <[EMAIL PROTECTED]> wrote: On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote: > Does anyone have a good idea how to go about searching for > concatenated tokens? > > Say that the index has "painkiller" and the user types in > "pain killer" (without the quotes). > > If one were using the standard request handler, the easiest would be > to have the client handle it by sending in both variants: > pain OR killer OR painkiller > or a variant like > "pain killer" OR painkiller > > But is there any answer when using dismax? > Requiring the client to send in pain killer painkiller seems like it > may decrease relevance too much if you currently use "pf" (phrase > fields) since the phrase "pain killer painkiller" isn't going to match > anything. > > Thoughts? Yes, pf should be replaced by a word proximity query that doesn't require all words to match :) Some other quick ideas: 1) client issues two separate queries... "pain killer" and "painkiller" and merges results. 2) dismax parameter that throws word catenations into the MaxDisjunction: "a b c" would also search for ab and bc. -Yonik
Re: dismax catenated token search
On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote: Does anyone have a good idea how to go about searching for concatenated tokens? Say that the index has "painkiller" and the user types in "pain killer" (without the quotes). If one were using the standard request handler, the easiest would be to have the client handle it by sending in both variants: pain OR killer OR painkiller or a variant like "pain killer" OR painkiller But is there any answer when using dismax? Requiring the client to send in pain killer painkiller seems like it may decrease relevance too much if you currently use "pf" (phrase fields) since the phrase "pain killer painkiller" isn't going to match anything. Thoughts? Yes, pf should be replaced by a word proximity query that doesn't require all words to match :) -Mike
dismax catenated token search
Does anyone have a good idea how to go about searching for concatenated tokens? Say that the index has "painkiller" and the user types in "pain killer" (without the quotes). If one were using the standard request handler, the easiest would be to have the client handle it by sending in both variants: pain OR killer OR painkiller or a variant like "pain killer" OR painkiller But is there any answer when using dismax? Requiring the client to send in pain killer painkiller seems like it may decrease relevance too much if you currently use "pf" (phrase fields) since the phrase "pain killer painkiller" isn't going to match anything. Thoughts? -Yonik