Re: Incorrect Token Offset when using multiple fieldable instance

Michael McCandless Wed, 02 Jul 2008 05:33:26 -0700

This would actually be a fairly large change: it's a change to theindex format and all APIs that handle offsets during indexing &searching/retrieving.

We could alternatively extend TokenStream so you could query it forthe final offset, then fix indexing to use that value instead of theendOffset of the last token that it saw.


Mike

Toph wrote:

Interesting discussion... glad I'm not the only one with thischallenge.
Michael McCandless-2 wrote:
EG, if you use Highlighter on a
multi-valued field indexed with stored field & term vectors and say
the first field ended with a stop word that was filtered out, then
your offsets will be off and the wrong parts will be highlighted
I found this post by attempting just this exact thing, and I canconfirm,that yes, the offsets are incorrect for all but the first instanceof the
field in the document, so they are useless for highlighting.  I tried
concatenating all instances of the fields, but of course if aninstance ofthe field ended with punctuation or a stop word, those characterswere notadded to the offset. I'll try the suggested workaround re adding afalseterm at the end of each field, but a better API would be if "offset"becamea pair of ints, first being the index of the Field forgetFields(name) and
the second being the offset in that instance of the field.

Christopher
--
View this message in context: 
http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468p18206216.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Incorrect Token Offset when using multiple fieldable instance

Reply via email to