rmuir commented on issue #13706: URL: https://github.com/apache/lucene/issues/13706#issuecomment-2325216584
I like the sameLanguage too, but I don't like the potential quadratic cost, considering we currently expect the calculation to be fast, and it is called on every automaton. I think it should be avoided in production code? As far as your counterexample, it is actually difficult to create such an automaton, you managed it with union! e,g, if you just create a state and add several ranges instead of one big range, they will be collapsed into one single range when you `finishState`: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java#L232 That's why I thought, there is something to be said for a very simple, constant-time check that will be practical as opposed to perfect: it will work for the stuff coming out of regex parser, or for "normal" stuff coming from the api (e.g. repeat). for that it needs to check two states (or same state twice) instead of one. But if you are able to implement it in linear time that solves all the cases, that would be great, let's do that instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org