[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder

ASF GitHub Bot (JIRA) Fri, 30 Dec 2016 09:27:37 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788036#comment-15788036
 ]


ASF GitHub Bot commented on LUCENE-7603:
----------------------------------------

Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/129#discussion_r94243010
  
    --- Diff: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ---
    @@ -210,85 +199,41 @@ private void finish() {
        */
       private void finish(int maxDeterminizedStates) {
         Automaton automaton = builder.finish();
    -
    -    // System.out.println("before det:\n" + automaton.toDot());
    -
    -    Transition t = new Transition();
    -
    -    // TODO: should we add "eps back to initial node" for all states,
    -    // and det that?  then we don't need to revisit initial node at
    -    // every position?  but automaton could blow up?  And, this makes it
    -    // harder to skip useless positions at search time?
    -
    -    if (anyTermID != -1) {
    -
    -      // Make sure there are no leading or trailing ANY:
    -      int count = automaton.initTransition(0, t);
    -      for (int i = 0; i < count; i++) {
    -        automaton.getNextTransition(t);
    -        if (anyTermID >= t.min && anyTermID <= t.max) {
    -          throw new IllegalStateException("automaton cannot lead with an 
ANY transition");
    -        }
    -      }
    -
    -      int numStates = automaton.getNumStates();
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          if (automaton.isAccept(t.dest) && anyTermID >= t.min && 
anyTermID <= t.max) {
    -            throw new IllegalStateException("automaton cannot end with an 
ANY transition");
    -          }
    -        }
    -      }
    -
    -      int termCount = termToID.size();
    -
    -      // We have to carefully translate these transitions so automaton
    -      // realizes they also match all other terms:
    -      Automaton newAutomaton = new Automaton();
    -      for (int i = 0; i < numStates; i++) {
    -        newAutomaton.createState();
    -        newAutomaton.setAccept(i, automaton.isAccept(i));
    -      }
    -
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          int min, max;
    -          if (t.min <= anyTermID && anyTermID <= t.max) {
    -            // Match any term
    -            min = 0;
    -            max = termCount - 1;
    -          } else {
    -            min = t.min;
    -            max = t.max;
    -          }
    -          newAutomaton.addTransition(t.source, t.dest, min, max);
    -        }
    -      }
    -      newAutomaton.finishState();
    -      automaton = newAutomaton;
    -    }
    -
         det = Operations.removeDeadStates(Operations.determinize(automaton, 
maxDeterminizedStates));
       }
     
    -  private int getTermID(BytesRef term) {
    -    Integer id = termToID.get(term);
    -    if (id == null) {
    -      id = termToID.size();
    -      if (term != null) {
    -        term = BytesRef.deepCopyOf(term);
    -      }
    -      termToID.put(term, id);
    +  /**
    +   * Gets an integer id for a given term.
    +   *
    +   * If there is no position gaps for this token then we can reuse the id 
for the same term if it appeared at another
    +   * position without a gap.  If we have a position gap generate a new id 
so we can keep track of the position
    +   * increment.
    +   */
    +  private int getTermID(int incr, int prevIncr, BytesRef term) {
    +    assert term != null;
    +    boolean isStackedGap = incr == 0 && prevIncr > 1;
    +    boolean hasGap = incr > 1;
    +    term = BytesRef.deepCopyOf(term);
    --- End diff --
    
    The deepCopyOf is only needed if you generate a new ID, not for an existing 
one.  
    
    BTW... have you seen BytesRefHash?  I think re-using that could minimize 
the code here to deal with this stuff.


> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
>                 Key: LUCENE-7603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7603
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser, core/search
>            Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can 
> use multi-term synonyms query time.  A "graph token stream" will be created 
> which which is nothing more than using the position length attribute on 
> stacked tokens to indicate how many positions a token should span.  Currently 
> the position length attribute on tokens is ignored during query parsing.  
> This issue will add support for handling these graph token streams inside the 
> QueryBuilder utility class used by query parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder

Reply via email to