ChrisHegarty opened a new issue, #13706: URL: https://github.com/apache/lucene/issues/13706
There are a number of optimization in Elasticsearch that depend upon the automaton from a `RegExp` being total - accepts all strings - [1] [2]. Changes in the upcoming Lucene 10, to not minimize automaton returned by `RegExp`, has broken the assumption that these optimisations were building upon. At least how they stand today, and I'm not sure how best to replicate the functionality in Lucene 10. For example this is fine: ``` RegExp r = new RegExp("@"); Automaton a = r.toAutomaton(); assertTrue(a.isDeterministic()); assertTrue(Operations.isTotal(a)); ``` , while this is not: ``` RegExp r = new RegExp(".*"); Automaton a = r.toAutomaton(); assertTrue(a.isDeterministic()); assertTrue(Operations.isTotal(a)); // <<< isTotal returns false ``` Without an API to minimise (since `MinimizationOperations` is now test-only), I'm not sure how to re-code such optimizations. Or if we should be attempting to provide our own minimize implementation. Or if RegExp should be returning a total automaton for `.*`? [1] https://github.com/elastic/elasticsearch/blob/0426e1fbd5dbf1eb9dae07f9af3592569165f5de/x-pack/plugin/wildcard/src/main/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapper.java#L383 [2] https://github.com/elastic/elasticsearch/blob/0426e1fbd5dbf1eb9dae07f9af3592569165f5de/x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/predicate/regex/AbstractStringPattern.java#L30 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org