Re: dismax catenated token search

2007-07-19 Thread Chris Hostetter

: > Yes, pf should be replaced by a word proximity query that doesn't
: > require all words to match :)

: 2) dismax parameter that throws word catenations into the MaxDisjunction:
:"a b c" would also search for ab and bc.

that doesn't address the inverse problem: when "pain killer" is indexed
but the user searches for "painkiller"

I believe both problems can be solved by using the NgramTokenizer on a
field in the qf ... but i have not tested this.  (i'm not entreily certain
what the NgramTokenizer does with whitespaces, so it might actually need
to KeywordTokenizer followed by a Filter that strips out interword
whitespace, followed by NgramTokenFilter ... or something like that.


-Hoss



Re: dismax catenated token search

2007-07-19 Thread Yonik Seeley

On 7/19/07, Mike Klaas <[EMAIL PROTECTED]> wrote:


On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote:

> Does anyone have a good idea how to go about searching for
> concatenated tokens?
>
> Say that the index has "painkiller" and the user types in
> "pain killer" (without the quotes).
>
> If one were using the standard request handler, the easiest would be
> to have the client handle it by sending in both variants:
> pain OR killer OR painkiller
>  or a variant like
> "pain killer" OR painkiller
>
> But is there any answer when using dismax?
> Requiring the client to send in pain killer painkiller seems like it
> may decrease relevance too much if you currently use "pf" (phrase
> fields) since the phrase "pain killer painkiller" isn't going to match
> anything.
>
> Thoughts?

Yes, pf should be replaced by a word proximity query that doesn't
require all words to match :)


Some other quick ideas:
1) client issues two separate queries... "pain killer" and
"painkiller" and merges
  results.
2) dismax parameter that throws word catenations into the MaxDisjunction:
  "a b c" would also search for ab and bc.

-Yonik


Re: dismax catenated token search

2007-07-19 Thread Mike Klaas


On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote:

Does anyone have a good idea how to go about searching for  
concatenated tokens?


Say that the index has "painkiller" and the user types in
"pain killer" (without the quotes).

If one were using the standard request handler, the easiest would be
to have the client handle it by sending in both variants:
pain OR killer OR painkiller
 or a variant like
"pain killer" OR painkiller

But is there any answer when using dismax?
Requiring the client to send in pain killer painkiller seems like it
may decrease relevance too much if you currently use "pf" (phrase
fields) since the phrase "pain killer painkiller" isn't going to match
anything.

Thoughts?


Yes, pf should be replaced by a word proximity query that doesn't  
require all words to match :)


-Mike


dismax catenated token search

2007-07-19 Thread Yonik Seeley

Does anyone have a good idea how to go about searching for concatenated tokens?

Say that the index has "painkiller" and the user types in
"pain killer" (without the quotes).

If one were using the standard request handler, the easiest would be
to have the client handle it by sending in both variants:
pain OR killer OR painkiller
 or a variant like
"pain killer" OR painkiller

But is there any answer when using dismax?
Requiring the client to send in pain killer painkiller seems like it
may decrease relevance too much if you currently use "pf" (phrase
fields) since the phrase "pain killer painkiller" isn't going to match
anything.

Thoughts?

-Yonik