Hi all,
Consider the following string: "the buffalo buffaloes" [1].
When passed through a stemming analyzer, the resulting token would be
"buffalo buffalo" (assuming a good stemmer).
To enable exact searches, say I mark the original term and index it at
the same term position. So "the buffalo buffaloes" -> (buffalo buffalo$)
(buffalo buffaloes$) - now exact searches are allowed on the same field
without having 2 different fields [2].
However, with this approach default scoring isn't working well. What is
my best option at upgrading a match for an exact match of this sort,
also when using the same stemming analyzer, without using payloads on
the marked token?
In other words - how do I make documents containing "the buffalo
buffaloes" considered more relevant than docs containing the word
"buffalo" only once?
The trick here is to boost the marked token if found at search time.
While this sounds easy to do, I can't find the best approach on
implementing this - esp. since Similarity.float Idf(Index.Term term,
Searcher searcher) seem to have been deprecated for some reason.
Itamar.
[1]
http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
:)
[2] Rationale:
http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org