[ https://issues.apache.org/jira/browse/LUCENE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867448#comment-13867448 ]
Robert Muir commented on LUCENE-5386: ------------------------------------- Here's sorta a parallel that Uwe did along the same lines for TokenFilters to solve another issue: Its tricky to remove tokens. Sure we want to provide infinite flexibility to move tokens around, but 99% of people just want to remove them based on some simple criteria. So he added FilteringTokenFilter, and it just exposes a simple api: {code} /** Override this method and return if the current input token should be returned by {@link #incrementToken}. */ protected abstract boolean accept() throws IOException; {code} the base class takes care of actually tracking all the positions and removing and end() and all that. Subclasses like StopFilter, TypeTokenFilter, etc are much simpler and don't have to deal with that crazy stuff, they just have the "logic" of what they want to remove (e.g. one-liners in most cases). If someone doesnt like that API, they can always just subclass TokenFilter directly. But so far, all the removers are using it :) I think basically there is probably a similar opportunity here. > Make Tokenizers deliver their final offsets > ------------------------------------------- > > Key: LUCENE-5386 > URL: https://issues.apache.org/jira/browse/LUCENE-5386 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Benson Margulies > > Tokenizers _must_ have an implementation of #end() in which they set up the > final offset. Currently, nothing enforces this. end() has a useful > implementation in TokenStream, so just making it abstract is not attractive. > Proposal: add > abstract int finalOffset(); > to tokenizer, and then make > void end() { > super.end(); > int fo = finalOffset(); > offsetAttr.setOffsets(fo, fo); > } > or something to that effect. > Other alternative to be considered depending on how this looks. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org