Re: Incorrect Token Offset when using multiple fieldable instance

Toph Mon, 30 Jun 2008 16:14:28 -0700

Interesting discussion... glad I'm not the only one with this challenge.

Michael McCandless-2 wrote:
> 
> EG, if you use Highlighter on a  
> multi-valued field indexed with stored field & term vectors and say  
> the first field ended with a stop word that was filtered out, then  
> your offsets will be off and the wrong parts will be highlighted 
> 

I found this post by attempting just this exact thing, and I can confirm,
that yes, the offsets are incorrect for all but the first instance of the
field in the document, so they are useless for highlighting.  I tried
concatenating all instances of the fields, but of course if an instance of
the field ended with punctuation or a stop word, those characters were not
added to the offset.  I'll try the suggested workaround re adding a false
term at the end of each field, but a better API would be if "offset" became
a pair of ints, first being the index of the Field for getFields(name) and
the second being the offset in that instance of the field.

Christopher
-- 
View this message in context: 
http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468p18206216.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Incorrect Token Offset when using multiple fieldable instance

Reply via email to