Re: RFC 112 (v3) Asignment within a regex
In <[EMAIL PROTECTED]>, "Richard Proctor" writes: :In general all assignments should wait to the very end, and then assign :them all. [...] If the expression finally fails the localised values :would unroll. Ah, I hadn't anticipated that - I had assumed you would get whatever was the last value set. Please can you make sure this is clearly explained in the next version of the RFC? Hugo
Re: RFC 112 (v3) Asignment within a regex
> On Fri, 29 Sep 2000 01:02:40 +0100, Hugo wrote: > > >It also isn't clear what parts of the expression are interpolated at > >compile time; what should the following leave in %foo? > > > > %foo = (); > > $bar = "one"; > > "twothree" =~ / (?$bar=two) (?$foo{$bar}=three) /x; > > It's not just that. You act as if this is assignment takes place > whenever a submatch succeeds. So: > > "twofour" =~ /(?$bar=two)($foo=three)/; > > Will $bar be set to "two", and $foo undef? I think not. Assignment > should be postponed to till the very end, when the match finally > succeeds, as a whole. In general all assignments should wait to the very end, and then assign them all. However before code callouts (?{...}) and enemies, the named assignments that are currently defined should be made (localised) so that the code can refer to them by name. If the expression finally fails the localised values would unroll. > > Therefore, I think that allowing just any l-value on the left of the "=" > sign, is not practical. Or is it? I think any simple scalar value is reasonable. > > OTOH I would rather have that all submatches would be assigned to a > hash, not to global or lexical variables. I have no clue about what > syntax that would need. That is in RFC 150, I think there is a case for both. Richard
Re: RFC 112 (v3) Asignment within a regex
On Fri, 29 Sep 2000 01:02:40 +0100, Hugo wrote: >It also isn't clear what parts of the expression are interpolated at >compile time; what should the following leave in %foo? > > %foo = (); > $bar = "one"; > "twothree" =~ / (?$bar=two) (?$foo{$bar}=three) /x; It's not just that. You act as if this is assignment takes place whenever a submatch succeeds. So: "twofour" =~ /(?$bar=two)($foo=three)/; Will $bar be set to "two", and $foo undef? I think not. Assignment should be postponed to till the very end, when the match finally succeeds, as a whole. Therefore, I think that allowing just any l-value on the left of the "=" sign, is not practical. Or is it? OTOH I would rather have that all submatches would be assigned to a hash, not to global or lexical variables. I have no clue about what syntax that would need. -- Bart.
Re: RFC 112 (v3) Asignment within a regex
In <[EMAIL PROTECTED]>, Perl6 RFC Librarian writes: :=head1 TITLE : :Asignment within a regex This document could do with running through a spellchecker. :Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... )!, :likewise the '=' could be any asignment operator. It isn't clear what the significance of the '!' is in that example. It also isn't clear what parts of the expression are interpolated at compile time; what should the following leave in %foo? %foo = (); $bar = "one"; "twothree" =~ / (?$bar=two) (?$foo{$bar}=three) /x; :=head2 Scoping : :The question of scoping for these assignments has been raised, but I don't :currently have a feel for the "best" way to handle this. Input welcome. I think it should be defined to act the same as in (??{...}), whenever we get around to defining that. :=head1 IMPLENTATION : :Currently all $scalars in regexes are expanded before the main regex compiler :gets to analyse the syntax. This problem also affects several other RFCs :(166 for example). The expansion of variables in regexes needs for these :(and other RFCs) to be driven from within the regex compiler so that the :regex can expand as and where appropriate. Changing this should not affect :any existing behaviour. That may not be necessary for this case; it may be enough just to tweak the parser slightly, to detect '(?$' (and maybe '(?\$'). Don't forget that the parser already successfully skips past '$' when we need it to. Hugo
RFC 112 (v3) Asignment within a regex
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Asignment within a regex =head1 VERSION Maintainer: Richard Proctor <[EMAIL PROTECTED]> Date: 16 Aug 2000 Last Modified: 23 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 112 Version: 3 Status: Developing =head1 ABSTRACT Provide a simple way of naming and picking out information from a regex without having to count the brackets. =head1 DESCRIPTION If a regex is complex, counting the bracketed sub-expressions to find the ones you wish to pick out can be messy. It is also prone to maintainability problems if and when you wish to add to the expression. Using (?:) can be used to surpress picking up brackets, it helps, but it still gets "complex". I would sometimes rather just pickout the bits I want within the regex itself. Suggested syntax: (?$foo= ... ) would assign the string that is matched by the patten ... to $foo when the patten matches. These assignments would be made left to right after the match has succeded but before processing a replacement or other results (or prior to a some (?{...}) or (??{...}) code). There may be whitespace between the $foo and the "=". Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... )!, likewise the '=' could be any asignment operator. The camel and the docs include this example: if (/Time: (..):(..):(..)/) { $hours = $1; $minutes = $2; $seconds = $3; } This then becomes: /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ This is more maintainable than counting the brackets and easier to understand for a complex regex. And one does not have to worry about the scope of $1 etc. =head2 Named Backrefs The first versions of this RFC did not allow for backrefs. I now think this was a shortcoming. It can be done with (??{quotemeta $foo}), but I find this clumsy, a better way of using a named back ref might be (?\$foo). =head2 Scoping The question of scoping for these assignments has been raised, but I don't currently have a feel for the "best" way to handle this. Input welcome. =head2 Brackets Using this method for capturing wanted content, it might be desirable to stop ordinary brackets capturing, and needing to use (?:...). I therefore suggest that as an enhancement to regexes that /b (bracket?) ordinary brackets just group, without capture - in effect they all behave as (?:...). =head1 CHANGES V3 - added bit about backrefs, and brackets. =head1 IMPLENTATION Currently all $scalars in regexes are expanded before the main regex compiler gets to analyse the syntax. This problem also affects several other RFCs (166 for example). The expansion of variables in regexes needs for these (and other RFCs) to be driven from within the regex compiler so that the regex can expand as and where appropriate. Changing this should not affect any existing behaviour. =head1 REFERENCES I brought this up on p5p a couple of years ago, but it was lost in the noise... RFC 166: Alternative lists and quoting of things Perlstorm #0040