[GitHub] lucene-solr pull request #129: LUCENE-7603: Support Graph Token Streams in Q...

mikemccand Fri, 30 Dec 2016 02:20:08 -0800

Github user mikemccand commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/129#discussion_r94216469
  
    --- Diff: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ---
    @@ -210,82 +215,20 @@ private void finish() {
        */
       private void finish(int maxDeterminizedStates) {
         Automaton automaton = builder.finish();
    -
    -    // System.out.println("before det:\n" + automaton.toDot());
    -
    -    Transition t = new Transition();
    -
    -    // TODO: should we add "eps back to initial node" for all states,
    -    // and det that?  then we don't need to revisit initial node at
    -    // every position?  but automaton could blow up?  And, this makes it
    -    // harder to skip useless positions at search time?
    -
    -    if (anyTermID != -1) {
    -
    -      // Make sure there are no leading or trailing ANY:
    -      int count = automaton.initTransition(0, t);
    -      for (int i = 0; i < count; i++) {
    -        automaton.getNextTransition(t);
    -        if (anyTermID >= t.min && anyTermID <= t.max) {
    -          throw new IllegalStateException("automaton cannot lead with an 
ANY transition");
    -        }
    -      }
    -
    -      int numStates = automaton.getNumStates();
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          if (automaton.isAccept(t.dest) && anyTermID >= t.min && 
anyTermID <= t.max) {
    -            throw new IllegalStateException("automaton cannot end with an 
ANY transition");
    -          }
    -        }
    -      }
    -
    -      int termCount = termToID.size();
    -
    -      // We have to carefully translate these transitions so automaton
    -      // realizes they also match all other terms:
    -      Automaton newAutomaton = new Automaton();
    -      for (int i = 0; i < numStates; i++) {
    -        newAutomaton.createState();
    -        newAutomaton.setAccept(i, automaton.isAccept(i));
    -      }
    -
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          int min, max;
    -          if (t.min <= anyTermID && anyTermID <= t.max) {
    -            // Match any term
    -            min = 0;
    -            max = termCount - 1;
    -          } else {
    -            min = t.min;
    -            max = t.max;
    -          }
    -          newAutomaton.addTransition(t.source, t.dest, min, max);
    -        }
    -      }
    -      newAutomaton.finishState();
    -      automaton = newAutomaton;
    -    }
    -
         det = Operations.removeDeadStates(Operations.determinize(automaton, 
maxDeterminizedStates));
       }
     
    -  private int getTermID(BytesRef term) {
    +  private int getTermID(int incr, BytesRef term) {
         Integer id = termToID.get(term);
    -    if (id == null) {
    +    if (incr > 1 || id == null) {
    --- End diff --
    
    Hmm doesn't this mean that if the same term shows up, but with different 
`incr`, that it will get different `id` assigned?  But I think that is actually 
fine, since nowhere here do we depend on / expect that the same term must have 
the same id.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] lucene-solr pull request #129: LUCENE-7603: Support Graph Token Streams in Q...

Reply via email to