[ 
https://issues.apache.org/jira/browse/LUCENE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867448#comment-13867448
 ] 

Robert Muir commented on LUCENE-5386:
-------------------------------------

Here's sorta a parallel that Uwe did along the same lines for TokenFilters to 
solve another issue:

Its tricky to remove tokens. Sure we want to provide infinite flexibility to 
move tokens around, but 99% of people just want to remove them based on some 
simple criteria. So he added FilteringTokenFilter, and it just exposes a simple 
api:

{code}
  /** Override this method and return if the current input token should be 
returned by {@link #incrementToken}. */
  protected abstract boolean accept() throws IOException;
{code}

the base class takes care of actually tracking all the positions and removing 
and end() and all that. Subclasses like StopFilter, TypeTokenFilter, etc are 
much simpler and don't have to deal with that crazy stuff, they just have the 
"logic" of what they want to remove (e.g. one-liners in most cases).

If someone doesnt like that API, they can always just subclass TokenFilter 
directly. But so far, all the removers are using it :)

I think basically there is probably a similar opportunity here.

> Make Tokenizers deliver their final offsets
> -------------------------------------------
>
>                 Key: LUCENE-5386
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5386
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Benson Margulies
>
> Tokenizers _must_ have an implementation of #end() in which they set up the 
> final offset. Currently, nothing enforces this. end() has a useful 
> implementation in TokenStream, so just making it abstract is not attractive.
> Proposal: add
>   abstract int finalOffset(); 
> to tokenizer, and then make
>     void end() {
>         super.end();
>         int fo = finalOffset();
>        offsetAttr.setOffsets(fo, fo);
>    }
> or something to that effect.
> Other alternative to be considered depending on how this looks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to