Hi,

there have been many requests on this list which were regular expression 
related but not exactly pcre related. We have been worked on some of them, e.g. 
glob matching, but we usually haven't fully finished them and honestly never 
felt like they should be part of pcre. I always suggested that they should go 
some other project which focuses on doing such things. So I have started a new 
project called repan (regular expression pattern analyzer), which should do a 
lot of things except pattern matching. The code is available here:

https://github.com/zherczeg/repan

The core features are:
- Parsers for multiple regexp flavors. Currently it has a pcre and a javascript 
parser, so it is possible to run javascript regexp correctly with pcre without 
implementing anything in pcre. I think it would be good to remove the half 
completed \uxxxx support from pcre since no need for it anymore. On the long 
run other parsers could be added, e.g. glob or perl6.

- Do guided or unguided optimizations. Currently repan can do two of them. The 
first is smart removal of capturing brackets. This is useful if the application 
only needs the full match. The optimization removes only those brackets, which 
are not referenced by backreferences, recursions or conditional blocks, and it 
also updates these references,  so it is more advanced than the no-capure 
(?n:...) flag. The resulting pattern should run faster since no need to store 
capturing bracket data during matching.

The other is removal of unnecessary non-capturing brackets (i.e. those without 
repeat, alternatives and modifiers). Unfortunately these are part of the pcre 
byte code, and has some perf overhead during interpreted matching.

In the future more of these could be added.

- The last part is constructing new patterns. My aim is constructing pcre 
patterns only, but others might want to work on others as well.

The project is new, probably there are several bugs and missing features. But 
at least there is some code available now. Contributions are welcome, the 
project is (and will be) far less complex than pcre.

Regards,
Zoltan
 
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to