Re: Scoring exact matches higher in a stemmed field

Itamar Syn-Hershko Sat, 17 Jul 2010 11:05:34 -0700

Shai, you got it right. I want to be able to send "b bb" through the QPwith my custom analyzer, and get back "(b b$) (b bb$)" -- 2 terms with 2tokens in the same position for each.

I want this to be a native product of the engine, as opposed to forcingthis from the query end. I'm using different types of queries (Bool,DisMax), and I'm actually interested in using the QP itself. Instead ofgoing through all sub-queries post-parsing and boosting terms endingwith $, I want some sort of a plugin mechanism to do this for me perresult. The easiest path would be subcalssing Similarity, if only therelevant functions wouldn't have been deprecated...

Are there any other ways to do so? For example, is this doable withfunction queries (since access to the actual term is required)?


Itamar.

On 16/7/2010 8:01 PM, Shai Erera wrote:

Depends for which query no? ;)

Sounds like you want to simulate the QP behavior
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html for
boosting. Meaning, if for the query "b" you want to simulate the query
"b OR b$^2" and have matches of b$ count more than b, then I'd follow
how QP does it - create the query programmatically or something (I'm
not near the code at the moment so I cannot give a more concrete
approach).

If you want b and b$ to count the same, then that's already the
behavior - i.e., docs containing both will score higher.

If I misunderstood your question, then plea correct me.

Shai

On Friday, July 16, 2010, Itamar Syn-Hershko<ita...@code972.com>  wrote:

Hi all,

Consider the following string: "the buffalo buffaloes" [1].

When passed through a stemming analyzer, the resulting token would be "buffalo
buffalo" (assuming a good stemmer).

To enable exact searches, say I mark the original term and index it at the same term
position. So "the buffalo buffaloes" -> (buffalo buffalo$) (buffalo
buffaloes$) - now exact searches are allowed on the same field without having 2 different
fields [2].

However, with this approach default scoring isn't working well. What is my best
option at upgrading a match for an exact match of this sort, also when using
the same stemming analyzer, without using payloads on the marked token?

In other words - how do I make documents containing "the buffalo buffaloes" considered
more relevant than docs containing the word "buffalo" only once?

The trick here is to boost the marked token if found at search time. While this
sounds easy to do, I can't find the best approach on implementing this - esp.
since Similarity.float Idf(Index.Term term, Searcher searcher) seem to have
been deprecated for some reason.

Itamar.

[1]
http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
:)

[2] Rationale:
http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Scoring exact matches higher in a stemmed field

Reply via email to