Benson Margulies created LUCENE-5386:
----------------------------------------

             Summary: Make Tokenizers deliver their final offsets
                 Key: LUCENE-5386
                 URL: https://issues.apache.org/jira/browse/LUCENE-5386
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Benson Margulies


Tokenizers _must_ have an implementation of #end() in which they set up the 
final offset. Currently, nothing enforces this. end() has a useful 
implementation in TokenStream, so just making it abstract is not attractive.

Proposal: add

  abstract int finalOffset(); 

to tokenizer, and then make

    void end() {
        super.end();
        int fo = finalOffset();
       offsetAttr.setOffsets(fo, fo);
   }

or something to that effect.

Other alternative to be considered depending on how this looks.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to