This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Generalised Additions to Regexs

=head1 VERSION

  Maintainer: Richard Proctor <[EMAIL PROTECTED]>
  Date: 22 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 274
  Version: 1
  Status: Developing

=head1 ABSTRACT

This proposes a way for generalised additions to regex capabilities.

=head1 DESCIPTION

Given that expansion of regexes could include (+...) and (*...) I have
been thinking about providing a general purpose way of adding
functionality.  Hence I propose that the entire (+...) syntax is
kept free from formal specification for this. (+ = addition)

A module or anything that wants to support some enhanced syntax
registers something that handles "regex enhancements".

At regex compile time, if and when (+foo) is found perl calls
each of the registered regex enhancements in turn, these:

1) Are passed the foo string as a parameter exactly as is.  (There is
an issue of actually finding the end of the generic foo.)

2) The regex enhancement can either recognise the content or not.

3) If not the enhancement returns undef and perl goes to the next regex
enhancement (Does it handle the enhancements as a stack (Last checked
first) or a list (First checked first?) how are they scoped?  Job here
for the OO/scoping fanatics)

4) If perl runs out of registered regex enhancements it reports an error.  

5) if an enhancement recognises the content it could do either of:

a) return replacement expanded regex using existing capabilities perl will
then pass this back through the regex compiler.

b) return a coderef that is called at run time when the regex gets to this
point.  The referenced code needs to have enough access to the regex internals
to be able to see the current sub-expression, request more characters, access
to relevant flags and visability of greediness.  It may also need a coderef
that is simarly called when the regex is being unwound when it backtracks.
These features would also be of interest to the existing code inside regexes
as well.


Thinking from that - the last case should be generalised (it is sort of
like my (?*{...}) from RFC 198 or an enhancement to (??{...}).  If so cases
(a) and (b) are the same as case (b) is just a case of returning (?*{...}) the
appropriate code.  

Following on, if (?{...}) etc code is evaluated
in forward match, it would be a good idea to likewise support some
code block that is ignored on a forward match but is executed when the
code is unwound due to backtracking.  Thus (?{ foo })(?\{ bar })
executes foo on the forward case and bar if it unwinds.  I dont
care at the moment what the syntax is - what about the concepts.
Think about foo putting something on a stack (eg the bracket to match
[RFC 145]) and bar taking it off for example.

Note:

I dont consider this RFC complete, but after posting this on the regex list
to no effect I am making it an RFC to see if it gets a little more feedback...

=head1 MIGRATION

This is a new feature - no compatibity problems

=head1 IMPLENTATION

This has not been looked at in detail, but the desciption above provides
some views as to how it may operate.

=head1 REFERENCES

RFC 145 - Bracket matching

RFC 198 - Boolean Regexes



Reply via email to