On Wed, Aug 30, 2000 at 04:07:51PM -0400, mike mulligan wrote:
> Can this be repackaged in such a way that it is a more natural extension of
> the existing regexp language?
> 
> The RFC notes that the look-behind construct (?<= pattern) can almost be
> used.  Two issues:  1. as currently implemented, the pattern must be of
> fixed length.  2. this is a zero-width assertion.
> 
> Speculation says the fixed length limitation was done because it offered a
> relatively quick hack.  A fixed length pattern allows you to go back in the
> matched-against string that many characters and match the pattern forwards.
> If the regexp engine could "go backwards", then the fixed-length restriction
> would be lifted.

Yes.  This was my goal.

> 
> The zero-width assertion might be an issue.  The RFC's example doesn't
> really get into this.

> 
> > Imagine a very long input string containing data such as this:
> >     ... GCAAGAATTGAACTGTAG ...
> > If you want to match text that includes the string GAAC, but only when it
> > follows GAATT or any one of a large number of other different
> possibilities,
> 
> If it important to be able to do both:
> 
>   $large = join '|', @possible'
>   $data =~ / (?<= $large) GAAC /x;   # Don't care which @possible?
> 
> and
> 
>   $data =~ m/ ($large) GAAC /x;   # Need $1 to say which @possible
> 
> Then perhaps a back-reference-setting look-behind could be implemented?
> Don't have an obvious syntax to use (back-tick == back-reference?), but
> something like:
> 
>   $data =~ m/ (?`<= $large) GAAC /x;   # Need $1 to say which @possible
> 
> 
> Does this ehanced look-behind satisfy the RFC's needs?
> 

As you say, we cannot overload the (?<= ... ) syntax, since that denotes
a zero-width assertion, and we want backreferences, which means that we
must consume part of the string, so the lookbehind I want can no longer
be a zero-width assertion.

If we using the syntax of (?`<= $variable_length), either the engine
has to start all over from the beginning of the string, which would
defeat the purpose, or it would have to match the regexp in
$variable_length in reverse, starting from the end of the (?`<= ... )
part of the regexp and going to the beginning.  

I thought that it was clearer to get the user to reverse the regexp
herself using (?r), so that she knows exactly what is going on, but
perhaps you are right and it would be better to build on the behavior
of (?<= ... ).

I will include your suggestion in the next version ... the syntax
might seem less eccentric than the currently proposed one.

I would propose that your version of the syntax might also function in
the middle of a regexp: /GHI(?`<=DEF)JKL(?`<=^ABC)MNO/ would match the
start of the alphabet (fixed-length example used for simplicity).

Any ideas for something better than (?`<= ... )?  How about just 
(?` ... ) or does that set up the expectation that there should be a
closing ` somewhere?

I suppose (?<= ... ) would need to be retained for zero-width
assertions.

Peter

Reply via email to