Github user jimczi commented on a diff in the pull request:
https://github.com/apache/lucene-solr/pull/384#discussion_r191594957
--- Diff:
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ConcatenateGraphFilter.java
---
@@ -31,80 +33,106 @@
import org.apache.lucene.util.IOUtils;
import org.apache.lucene.util.IntsRef;
import org.apache.lucene.util.automaton.Automaton;
-import org.apache.lucene.util.automaton.FiniteStringsIterator;
import org.apache.lucene.util.automaton.LimitedFiniteStringsIterator;
import org.apache.lucene.util.automaton.Operations;
+import org.apache.lucene.util.automaton.TooComplexToDeterminizeException;
import org.apache.lucene.util.automaton.Transition;
import org.apache.lucene.util.fst.Util;
-import static
org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_MAX_GRAPH_EXPANSIONS;
-import static
org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_POSITION_INCREMENTS;
-import static
org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_SEP;
-import static
org.apache.lucene.search.suggest.document.CompletionAnalyzer.SEP_LABEL;
-
/**
- * Token stream which converts a provided token stream to an automaton.
- * The accepted strings enumeration from the automaton are available
through the
- * {@link
org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute} attribute
- * The token stream uses a {@link
org.apache.lucene.analysis.tokenattributes.PayloadAttribute} to store
- * a completion's payload (see {@link
CompletionTokenStream#setPayload(org.apache.lucene.util.BytesRef)})
+ * Concatenates/Joins every incoming token with a separator into one
output token for every path through the
+ * token stream (which is a graph). In simple cases this yields one
token, but in the presence of any tokens with
+ * a zero positionIncrmeent (e.g. synonyms) it will be more. This filter
uses the token bytes, position increment,
+ * and position length of the incoming stream. Other attributes are not
used or manipulated.
*
* @lucene.experimental
*/
-public final class CompletionTokenStream extends TokenStream {
+public final class ConcatenateGraphFilter extends TokenFilter {
--- End diff --
Converting this token stream into a token filter is trappy. It requires to
rewrite the logic completely. For instance you removed the `close`
implementation, this means that it is called twice on the input stream since it
is already closed here
https://github.com/apache/lucene-solr/pull/384/files#diff-04d4c6352889dbffd0eb4d1e6ecd6097R197.
You also added a call to super() which was avoided clearly in the original
impl:
````
// Don't call the super(input) ctor - this is a true delegate and has a new
attribute source since we consume
// the input stream entirely in the first call to incrementToken
````
My idea was to move this token stream if only minor changes are required.
If it needs a complete rewrite then we should probably reconsider.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]