markharwood commented on a change in pull request #1541:
URL: https://github.com/apache/lucene-solr/pull/1541#discussion_r444284423
##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java
##########
@@ -743,6 +792,30 @@ private Automaton
toAutomatonInternal(Map<String,Automaton> automata,
}
return a;
}
+ private Automaton toCaseInsensitiveChar(int codepoint, int
maxDeterminizedStates) {
+ Automaton case1 = Automata.makeChar(codepoint);
+ int altCase = Character.isLowerCase(codepoint) ?
Character.toUpperCase(codepoint) : Character.toLowerCase(codepoint);
+ Automaton result;
+ if (altCase != codepoint) {
+ result = Operations.union(case1, Automata.makeChar(altCase));
+ result = MinimizationOperations.minimize(result, maxDeterminizedStates);
+ } else {
+ result = case1;
+ }
+ return result;
+ }
Review comment:
An alternative would be an overhaul of RegExp.
* Introducing a Builder class for the parser with named properties for
settings
* separating the RegExp parser logic from the parsed objects (currently
they are the same class).
* separating rendering functions (toString, to Automaton, toStringTree) from
the parsed objects.
I'm not sure if we're at the tipping point where all of that would make
sense.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]