Re: Incorrect Token Offset when using multiple fieldable instance

Toph Wed, 02 Jul 2008 07:20:25 -0700


Michael McCandless-2 wrote:
> 
> 
> This would actually be a fairly large change: it's a change to the  
> index format and all APIs that handle offsets during indexing &  
> searching/retrieving.
> 
>

For now I just changed the offset calculation in DocumentWriter as specified
here by the OP:

> replace DocumentWriter$FieldData#invertField offset = offsetEnd+1; by
> offset = stringValue.length(); 
> 

It has side effects as previously mentioned on this list, e.g. if the
tokenstream is not backed by a stringValue or the Analyzer does not
calculate offsets in the normal way.  But for my purposes it works.

This issue was also discussed previously 
http://lucene.markmail.org/search/?q=offset%20documentwriter#query:offset%20documentwriter+page:1+mid:l6jbfmfisyg5zyre+state:results
here .

Michael McCandless-2 wrote:
> 
> 
> We could alternatively extend TokenStream so you could query it for  
> the final offset, then fix indexing to use that value instead of the  
> endOffset of the last token that it saw.
> 
> 

Querying the tokenstream for the final offset would good, but then would the
change be put into the DocumentWriter directly or available as an option?

Chris
-- 
View this message in context: 
http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468p18238566.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Incorrect Token Offset when using multiple fieldable instance

Reply via email to