Gus Heck created LUCENE-9696:
--------------------------------

             Summary: RegExp with group references
                 Key: LUCENE-9696
                 URL: https://issues.apache.org/jira/browse/LUCENE-9696
             Project: Lucene - Core
          Issue Type: Wish
            Reporter: Gus Heck


PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 
found performance benefits using our own RegExp class instead. Unfortunately 
RegExp does not currently report matching subgroups which is key to 
PatternTypingFilter's use (and probably useful in other endeavors as well).  
What's needed is reporting of sub-groups such that 

new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for 
"foobar" --> somehow reports getGroup(1) as "bar"

And getGroup() can be called on some object reasonably accessible to the code 
using RegExp in the first place.

Clearly there's a lot to be worked out there since the normal usage pattern 
converts things to a DFA / run Automaton etc, and subgroups are not a natural 
concept for those classes. But if this could be achieved without loosing the 
performance benefits, that would be interesting :).

Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575.  I 
won't be able to work on it any time soon to encourage anyone else interested 
to pick it up or to drop links or ideas in here. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to