Shai, you got it right. I want to be able to send "b bb" through the QP with my custom analyzer, and get back "(b b$) (b bb$)" -- 2 terms with 2 tokens in the same position for each.

I want this to be a native product of the engine, as opposed to forcing this from the query end. I'm using different types of queries (Bool, DisMax), and I'm actually interested in using the QP itself. Instead of going through all sub-queries post-parsing and boosting terms ending with $, I want some sort of a plugin mechanism to do this for me per result. The easiest path would be subcalssing Similarity, if only the relevant functions wouldn't have been deprecated...

Are there any other ways to do so? For example, is this doable with function queries (since access to the actual term is required)?

Itamar.

On 16/7/2010 8:01 PM, Shai Erera wrote:
Depends for which query no? ;)

Sounds like you want to simulate the QP behavior
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html for
boosting. Meaning, if for the query "b" you want to simulate the query
"b OR b$^2" and have matches of b$ count more than b, then I'd follow
how QP does it - create the query programmatically or something (I'm
not near the code at the moment so I cannot give a more concrete
approach).

If you want b and b$ to count the same, then that's already the
behavior - i.e., docs containing both will score higher.

If I misunderstood your question, then plea correct me.

Shai

On Friday, July 16, 2010, Itamar Syn-Hershko<ita...@code972.com>  wrote:
Hi all,


Consider the following string: "the buffalo buffaloes" [1].


When passed through a stemming analyzer, the resulting token would be "buffalo 
buffalo" (assuming a good stemmer).


To enable exact searches, say I mark the original term and index it at the same term 
position. So "the buffalo buffaloes" ->  (buffalo buffalo$) (buffalo 
buffaloes$) - now exact searches are allowed on the same field without having 2 different 
fields [2].


However, with this approach default scoring isn't working well. What is my best 
option at upgrading a match for an exact match of this sort, also when using 
the same stemming analyzer, without using payloads on the marked token?


In other words - how do I make documents containing "the buffalo buffaloes" considered 
more relevant than docs containing the word "buffalo" only once?


The trick here is to boost the marked token if found at search time. While this 
sounds easy to do, I can't find the best approach on implementing this - esp. 
since Similarity.float Idf(Index.Term term, Searcher searcher) seem to have 
been deprecated for some reason.


Itamar.


[1] 
http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
 :)

[2] Rationale: 
http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to