ChrisHegarty opened a new issue, #13706:
URL: https://github.com/apache/lucene/issues/13706

   There are a number of optimization in Elasticsearch that depend upon the 
automaton from a `RegExp` being total  - accepts all strings - [1] [2]. Changes 
in the upcoming Lucene 10, to not minimize automaton returned by `RegExp`, has 
broken the assumption that these optimisations were building upon. At least how 
they stand today, and I'm not sure how best to replicate the functionality in 
Lucene 10.
   
   For example this is fine:
   ```
   RegExp r = new RegExp("@");
   Automaton a = r.toAutomaton();
   assertTrue(a.isDeterministic());
   assertTrue(Operations.isTotal(a));
   ```
   , while this is not:
   ```
   RegExp r = new RegExp(".*");
   Automaton a = r.toAutomaton();
   assertTrue(a.isDeterministic());
   assertTrue(Operations.isTotal(a));  // <<< isTotal returns false
   ```
   
   Without an API to minimise (since `MinimizationOperations` is now 
test-only), I'm not sure how to re-code such optimizations. Or if we should be 
attempting to provide our own minimize implementation. Or if RegExp should be 
returning a total automaton for `.*`?
   
   [1] 
https://github.com/elastic/elasticsearch/blob/0426e1fbd5dbf1eb9dae07f9af3592569165f5de/x-pack/plugin/wildcard/src/main/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapper.java#L383
   [2] 
https://github.com/elastic/elasticsearch/blob/0426e1fbd5dbf1eb9dae07f9af3592569165f5de/x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/predicate/regex/AbstractStringPattern.java#L30


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to