Hi all,

Consider the following string: "the buffalo buffaloes" [1].


When passed through a stemming analyzer, the resulting token would be "buffalo buffalo" (assuming a good stemmer).


To enable exact searches, say I mark the original term and index it at the same term position. So "the buffalo buffaloes" -> (buffalo buffalo$) (buffalo buffaloes$) - now exact searches are allowed on the same field without having 2 different fields [2].


However, with this approach default scoring isn't working well. What is my best option at upgrading a match for an exact match of this sort, also when using the same stemming analyzer, without using payloads on the marked token?


In other words - how do I make documents containing "the buffalo buffaloes" considered more relevant than docs containing the word "buffalo" only once?


The trick here is to boost the marked token if found at search time. While this sounds easy to do, I can't find the best approach on implementing this - esp. since Similarity.float Idf(Index.Term term, Searcher searcher) seem to have been deprecated for some reason.


Itamar.


[1] http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo :)

[2] Rationale: http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to