[GitHub] lucene-solr pull request #384: LUCENE-8332 move CompletionTokenStream to Con...

jimczi Tue, 29 May 2018 15:34:08 -0700

Github user jimczi commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/384#discussion_r191594957
  
    --- Diff: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ConcatenateGraphFilter.java
 ---
    @@ -31,80 +33,106 @@
     import org.apache.lucene.util.IOUtils;
     import org.apache.lucene.util.IntsRef;
     import org.apache.lucene.util.automaton.Automaton;
    -import org.apache.lucene.util.automaton.FiniteStringsIterator;
     import org.apache.lucene.util.automaton.LimitedFiniteStringsIterator;
     import org.apache.lucene.util.automaton.Operations;
    +import org.apache.lucene.util.automaton.TooComplexToDeterminizeException;
     import org.apache.lucene.util.automaton.Transition;
     import org.apache.lucene.util.fst.Util;
     
    -import static 
org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_MAX_GRAPH_EXPANSIONS;
    -import static 
org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_POSITION_INCREMENTS;
    -import static 
org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_SEP;
    -import static 
org.apache.lucene.search.suggest.document.CompletionAnalyzer.SEP_LABEL;
    -
     /**
    - * Token stream which converts a provided token stream to an automaton.
    - * The accepted strings enumeration from the automaton are available 
through the
    - * {@link 
org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute} attribute
    - * The token stream uses a {@link 
org.apache.lucene.analysis.tokenattributes.PayloadAttribute} to store
    - * a completion's payload (see {@link 
CompletionTokenStream#setPayload(org.apache.lucene.util.BytesRef)})
    + * Concatenates/Joins every incoming token with a separator into one 
output token for every path through the
    + * token stream (which is a graph).  In simple cases this yields one 
token, but in the presence of any tokens with
    + * a zero positionIncrmeent (e.g. synonyms) it will be more.  This filter 
uses the token bytes, position increment,
    + * and position length of the incoming stream.  Other attributes are not 
used or manipulated.
      *
      * @lucene.experimental
      */
    -public final class CompletionTokenStream extends TokenStream {
    +public final class ConcatenateGraphFilter extends TokenFilter {
    --- End diff --
    
    Converting this token stream into a token filter is trappy. It requires to 
rewrite the logic completely. For instance you removed the `close` 
implementation, this means that it is called twice on the input stream since it 
is already closed here 
https://github.com/apache/lucene-solr/pull/384/files#diff-04d4c6352889dbffd0eb4d1e6ecd6097R197.
 You also added a call to super() which was avoided clearly in the original 
impl:
    ````
    // Don't call the super(input) ctor - this is a true delegate and has a new 
attribute source since we consume
        // the input stream entirely in the first call to incrementToken
    ````
    My idea was to move this token stream if only minor changes are required. 
If it needs a complete rewrite then we should probably reconsider.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request #384: LUCENE-8332 move CompletionTokenStream to Con...

Reply via email to