Is your objective to avoid highlighting matching tokens which are not in a phrase? I recently received the request to avoid highlighting single tokens which appear in the hit (vs. sequences of matched tokens). I have just completed a partial re-write of the getBestTextFragments to allow this. Now the calling object can specify the minimum number of tokens (default is 1 to replicate the current functionality) that have to be in a sequence before the tokens will be highlighted.

I haven't done a whole lot of testing as I finished the code last night, but if you are interested I have made the code available (along with a patch file) at http://my-family.us/highlighter. To set the minimum sequence size, just call setMinTokenSequence(int) after creating the Highlighter object.

Shane

Harini Raghavan wrote:
I have a requirement to highlight phrases. I came across a reference to this alternate highlighter implementation. But I am unable to see the source files for the same. Can someone please point me to it?

Thanks,
Harini

mark harwood wrote:

See here for a thread reviewing the challenges and possible solutions associated with this problem:
  http://www.mail-archive.com/java-user@lucene.apache.org/msg02543.html

An alternative highlighter implementation was recently contributed here:
  http://issues.apache.org/jira/browse/LUCENE-644?page=all
I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it will do Spans correctly.
Cheers
Mark



----- Original Message ----
From: Pierre Van Ingelandt <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 5 September, 2006 2:21:56 PM
Subject: Highlighting "really" found terms

Hello,

After a search, I need to highlight only the terms that do "really"
correspond to the query.
For instance :
1/ I search docs with toto and titi in the SAME sentence (using
SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
2/ Then I try to highlight "toto" and "titi" found (I use the queryscorer
from highlight package)

Then the problem is that it highlights ALL the titi and toto terms in the
documents. (even if they are not in the same sentence).
Is there a way to highlight only the terms really found ?

Thanks a lot !

Pierre


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to