[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder

ASF GitHub Bot (JIRA) Fri, 30 Dec 2016 02:20:08 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787395#comment-15787395
 ]


ASF GitHub Bot commented on LUCENE-7603:
----------------------------------------

Github user mikemccand commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/129#discussion_r94216469
  
    --- Diff: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ---
    @@ -210,82 +215,20 @@ private void finish() {
        */
       private void finish(int maxDeterminizedStates) {
         Automaton automaton = builder.finish();
    -
    -    // System.out.println("before det:\n" + automaton.toDot());
    -
    -    Transition t = new Transition();
    -
    -    // TODO: should we add "eps back to initial node" for all states,
    -    // and det that?  then we don't need to revisit initial node at
    -    // every position?  but automaton could blow up?  And, this makes it
    -    // harder to skip useless positions at search time?
    -
    -    if (anyTermID != -1) {
    -
    -      // Make sure there are no leading or trailing ANY:
    -      int count = automaton.initTransition(0, t);
    -      for (int i = 0; i < count; i++) {
    -        automaton.getNextTransition(t);
    -        if (anyTermID >= t.min && anyTermID <= t.max) {
    -          throw new IllegalStateException("automaton cannot lead with an 
ANY transition");
    -        }
    -      }
    -
    -      int numStates = automaton.getNumStates();
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          if (automaton.isAccept(t.dest) && anyTermID >= t.min && 
anyTermID <= t.max) {
    -            throw new IllegalStateException("automaton cannot end with an 
ANY transition");
    -          }
    -        }
    -      }
    -
    -      int termCount = termToID.size();
    -
    -      // We have to carefully translate these transitions so automaton
    -      // realizes they also match all other terms:
    -      Automaton newAutomaton = new Automaton();
    -      for (int i = 0; i < numStates; i++) {
    -        newAutomaton.createState();
    -        newAutomaton.setAccept(i, automaton.isAccept(i));
    -      }
    -
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          int min, max;
    -          if (t.min <= anyTermID && anyTermID <= t.max) {
    -            // Match any term
    -            min = 0;
    -            max = termCount - 1;
    -          } else {
    -            min = t.min;
    -            max = t.max;
    -          }
    -          newAutomaton.addTransition(t.source, t.dest, min, max);
    -        }
    -      }
    -      newAutomaton.finishState();
    -      automaton = newAutomaton;
    -    }
    -
         det = Operations.removeDeadStates(Operations.determinize(automaton, 
maxDeterminizedStates));
       }
     
    -  private int getTermID(BytesRef term) {
    +  private int getTermID(int incr, BytesRef term) {
         Integer id = termToID.get(term);
    -    if (id == null) {
    +    if (incr > 1 || id == null) {
    --- End diff --
    
    Hmm doesn't this mean that if the same term shows up, but with different 
`incr`, that it will get different `id` assigned?  But I think that is actually 
fine, since nowhere here do we depend on / expect that the same term must have 
the same id.


> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
>                 Key: LUCENE-7603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7603
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser, core/search
>            Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can 
> use multi-term synonyms query time.  A "graph token stream" will be created 
> which which is nothing more than using the position length attribute on 
> stacked tokens to indicate how many positions a token should span.  Currently 
> the position length attribute on tokens is ignored during query parsing.  
> This issue will add support for handling these graph token streams inside the 
> QueryBuilder utility class used by query parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder

Reply via email to