Github user jimczi commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/384#discussion_r191594957 --- Diff: lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ConcatenateGraphFilter.java --- @@ -31,80 +33,106 @@ import org.apache.lucene.util.IOUtils; import org.apache.lucene.util.IntsRef; import org.apache.lucene.util.automaton.Automaton; -import org.apache.lucene.util.automaton.FiniteStringsIterator; import org.apache.lucene.util.automaton.LimitedFiniteStringsIterator; import org.apache.lucene.util.automaton.Operations; +import org.apache.lucene.util.automaton.TooComplexToDeterminizeException; import org.apache.lucene.util.automaton.Transition; import org.apache.lucene.util.fst.Util; -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_MAX_GRAPH_EXPANSIONS; -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_POSITION_INCREMENTS; -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_SEP; -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.SEP_LABEL; - /** - * Token stream which converts a provided token stream to an automaton. - * The accepted strings enumeration from the automaton are available through the - * {@link org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute} attribute - * The token stream uses a {@link org.apache.lucene.analysis.tokenattributes.PayloadAttribute} to store - * a completion's payload (see {@link CompletionTokenStream#setPayload(org.apache.lucene.util.BytesRef)}) + * Concatenates/Joins every incoming token with a separator into one output token for every path through the + * token stream (which is a graph). In simple cases this yields one token, but in the presence of any tokens with + * a zero positionIncrmeent (e.g. synonyms) it will be more. This filter uses the token bytes, position increment, + * and position length of the incoming stream. Other attributes are not used or manipulated. * * @lucene.experimental */ -public final class CompletionTokenStream extends TokenStream { +public final class ConcatenateGraphFilter extends TokenFilter { --- End diff -- Converting this token stream into a token filter is trappy. It requires to rewrite the logic completely. For instance you removed the `close` implementation, this means that it is called twice on the input stream since it is already closed here https://github.com/apache/lucene-solr/pull/384/files#diff-04d4c6352889dbffd0eb4d1e6ecd6097R197. You also added a call to super() which was avoided clearly in the original impl: ```` // Don't call the super(input) ctor - this is a true delegate and has a new attribute source since we consume // the input stream entirely in the first call to incrementToken ```` My idea was to move this token stream if only minor changes are required. If it needs a complete rewrite then we should probably reconsider.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org