[ 
https://issues.apache.org/jira/browse/LUCENE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863550#comment-13863550
 ] 

Robert Muir commented on LUCENE-5386:
-------------------------------------

I'm just gonna throw out a crazier idea for kicks:

Just revisiting what this final offset is: its nothing more than the number of 
characters read (adjusted by any charfilters), so if i have a multivalued field 
that say, ends with a space, the last space is not lost.

Its really sad a tokenizer should have to implement this final offset stuff at 
all: its worth thinking if the base Tokenizer class could do this automatically 
(e.g. wrap the Reader in a FilterReader and just track it + impl it by default).

The only classes that would really need to implement anything would be ones 
that do "crazy" stuff (e.g. dont consume the entire Reader), and such filters 
(LimitXXX) already have a consumeAllTokens to ensure they can be well-behaved 
today.

just an idea.

> Make Tokenizers deliver their final offsets
> -------------------------------------------
>
>                 Key: LUCENE-5386
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5386
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Benson Margulies
>
> Tokenizers _must_ have an implementation of #end() in which they set up the 
> final offset. Currently, nothing enforces this. end() has a useful 
> implementation in TokenStream, so just making it abstract is not attractive.
> Proposal: add
>   abstract int finalOffset(); 
> to tokenizer, and then make
>     void end() {
>         super.end();
>         int fo = finalOffset();
>        offsetAttr.setOffsets(fo, fo);
>    }
> or something to that effect.
> Other alternative to be considered depending on how this looks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to