[
https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787395#comment-15787395
]
ASF GitHub Bot commented on LUCENE-7603:
----------------------------------------
Github user mikemccand commented on a diff in the pull request:
https://github.com/apache/lucene-solr/pull/129#discussion_r94216469
--- Diff:
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
---
@@ -210,82 +215,20 @@ private void finish() {
*/
private void finish(int maxDeterminizedStates) {
Automaton automaton = builder.finish();
-
- // System.out.println("before det:\n" + automaton.toDot());
-
- Transition t = new Transition();
-
- // TODO: should we add "eps back to initial node" for all states,
- // and det that? then we don't need to revisit initial node at
- // every position? but automaton could blow up? And, this makes it
- // harder to skip useless positions at search time?
-
- if (anyTermID != -1) {
-
- // Make sure there are no leading or trailing ANY:
- int count = automaton.initTransition(0, t);
- for (int i = 0; i < count; i++) {
- automaton.getNextTransition(t);
- if (anyTermID >= t.min && anyTermID <= t.max) {
- throw new IllegalStateException("automaton cannot lead with an
ANY transition");
- }
- }
-
- int numStates = automaton.getNumStates();
- for (int i = 0; i < numStates; i++) {
- count = automaton.initTransition(i, t);
- for (int j = 0; j < count; j++) {
- automaton.getNextTransition(t);
- if (automaton.isAccept(t.dest) && anyTermID >= t.min &&
anyTermID <= t.max) {
- throw new IllegalStateException("automaton cannot end with an
ANY transition");
- }
- }
- }
-
- int termCount = termToID.size();
-
- // We have to carefully translate these transitions so automaton
- // realizes they also match all other terms:
- Automaton newAutomaton = new Automaton();
- for (int i = 0; i < numStates; i++) {
- newAutomaton.createState();
- newAutomaton.setAccept(i, automaton.isAccept(i));
- }
-
- for (int i = 0; i < numStates; i++) {
- count = automaton.initTransition(i, t);
- for (int j = 0; j < count; j++) {
- automaton.getNextTransition(t);
- int min, max;
- if (t.min <= anyTermID && anyTermID <= t.max) {
- // Match any term
- min = 0;
- max = termCount - 1;
- } else {
- min = t.min;
- max = t.max;
- }
- newAutomaton.addTransition(t.source, t.dest, min, max);
- }
- }
- newAutomaton.finishState();
- automaton = newAutomaton;
- }
-
det = Operations.removeDeadStates(Operations.determinize(automaton,
maxDeterminizedStates));
}
- private int getTermID(BytesRef term) {
+ private int getTermID(int incr, BytesRef term) {
Integer id = termToID.get(term);
- if (id == null) {
+ if (incr > 1 || id == null) {
--- End diff --
Hmm doesn't this mean that if the same term shows up, but with different
`incr`, that it will get different `id` assigned? But I think that is actually
fine, since nowhere here do we depend on / expect that the same term must have
the same id.
> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
> Key: LUCENE-7603
> URL: https://issues.apache.org/jira/browse/LUCENE-7603
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/queryparser, core/search
> Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can
> use multi-term synonyms query time. A "graph token stream" will be created
> which which is nothing more than using the position length attribute on
> stacked tokens to indicate how many positions a token should span. Currently
> the position length attribute on tokens is ignored during query parsing.
> This issue will add support for handling these graph token streams inside the
> QueryBuilder utility class used by query parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]