Scoring exact matches higher in a stemmed field

Itamar Syn-Hershko Fri, 16 Jul 2010 08:30:19 -0700

Hi all,


Consider the following string: "the buffalo buffaloes" [1].

When passed through a stemming analyzer, the resulting token would be"buffalo buffalo" (assuming a good stemmer).

To enable exact searches, say I mark the original term and index it atthe same term position. So "the buffalo buffaloes" -> (buffalo buffalo$)(buffalo buffaloes$) - now exact searches are allowed on the same fieldwithout having 2 different fields [2].

However, with this approach default scoring isn't working well. What ismy best option at upgrading a match for an exact match of this sort,also when using the same stemming analyzer, without using payloads onthe marked token?

In other words - how do I make documents containing "the buffalobuffaloes" considered more relevant than docs containing the word"buffalo" only once?

The trick here is to boost the marked token if found at search time.While this sounds easy to do, I can't find the best approach onimplementing this - esp. since Similarity.float Idf(Index.Term term,Searcher searcher) seem to have been deprecated for some reason.



Itamar.

[1]http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo:)

[2] Rationale:http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Scoring exact matches higher in a stemmed field

Reply via email to