Re: RFC 308 (v1) Ban Perl hooks into regexes
Perl6 RFC Librarian [EMAIL PROTECTED] writes: This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Ban Perl hooks into regexes =head1 VERSION Maintainer: Simon Cozens [EMAIL PROTECTED] Date: 25 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 308 Version: 1 Status: Developing =head1 ABSTRACT Remove C?{ code }, C??{ code } and friends. =head1 DESCRIPTION The regular expression engine may well be rewritten from scratch or borrowed from somewhere else. One of the scarier things we've seen recently is that Perl's engine casts back its Krakken tentacles into Perl and executes Perl code. This is spooky, tangled, and incestuous. (Although admittedly fun.) It's *loads* of fun. Though admittedly, I've not used it in any *real* code yet... It would be preferable to keep the regular expression engine as self-contained as possible, if nothing else to enable it to be used either outside Perl or inside standalone translated Perl programs without a Perl runtime. To do this, we'll have to remove the bits of the engine that call Perl code. In short: C?{ code } and C??{ code } must die. You don't *have* to remove 'em. You can just throw an exception during compilation if some hypothetical 'no regex subs' pragma is there. -- Piers '063039183598121887134041122600:1917131105:Jaercunrlkso tPh.'=~/^(.{6})* (.{6})[^:]*:(..)*(..).*:(??{'.{'.$2%$4.'}'})(.)(??{print$5})/x;print"\n"
Re: RFC 170 (v2) Generalize =~ to a special apply-to assignment operator
On Sun, Sep 17, 2000 at 05:41:57AM -, Perl6 RFC Librarian wrote: . Some criticized it as being too sugary, since this: $string =~ quotemeta;# $string = quotemeta $string; Is not as clear as the original. However, there is fairly similar precedent in: $x += 5; # $x = $x + 5; Looks great on scalars, but... @foo =~ shift; # @foo = $foo[0] ? @foo =~ unshift; # @foo = $foo[-1] ? Although I have to admit I like: @foo =~ grep !/\S/; But I'm not very keen on the idea of %foo =~ keys; -- A formal parsing algorithm should not always be used. -- D. Gries
Re: RFC 308 (v1) Ban Perl hooks into regexes
On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote: Remove C?{ code }, C??{ code } and friends. I'm putting the finishing touches on an RFC to drop (?{...}) and replace it with something far more localized, hence cleaner: assertions, also in Perl code. That way, /(?!\d)(\d+)(?{$1 256})/ would only match integers between 0 and 255. Communications between Perl code snippets inside a regex would be strongly discouraged. -- Bart.
Re: RFC 308 (v1) Ban Perl hooks into regexes
On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote: Remove C?{ code }, C??{ code } and friends. I'm putting the finishing touches on an RFC to drop (?{...}) and replace it with something far more localized, hence cleaner: assertions, also in Perl code. That way, /(?!\d)(\d+)(?{$1 256})/ would only match integers between 0 and 255. Communications between Perl code snippets inside a regex would be strongly discouraged. I can't believe that there currently isn't a means of killing a back-track based on perl-code. Looking through perlre it seems like you're right. I'm not really crazy about breaking backward compatibilty like this though. It shouldn't be too hard to find another character sequence to perform your above job. Beyond that, there's a growing rift between reg-ex extenders and purifiers. I assume the functionality you're trying to produce above is to find the first bare number that is less than 256 (your above would match the 25 in 256).. Easily fixed by inserting (?!\d) between the second and third aggregates. If you were to be more strict, you could more simply apply \b(\d+)\b... In any case, the above is not very intuitive to the casual observers as might be while ( /(\d+)/g ) { if ( $1 256 ) { $answer = $1; last; } } Likewise, complex matching tokens are the realm of a parser (I'm almost getting tired of saying that). Please be kind to your local maintainer, don't proliferate n'th order code complexities such as recursive or conditional reg-ex's. Yes, I can mandate that my work doesn't use them, but it doesn't mean that CPAN won't (and I often have to reverse engineer CPAN modules to figure out why something isn't working). That said, nobody should touch the various relative reg-ex operators. I look at reg-ex as a tokenizer, and things like (?...) which optimizes reading, and (?!..), etc are very useful in this realm. Just my $0.02 -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Tue, 26 Sep 2000 13:32:37 -0400, Michael Maraist wrote: I can't believe that there currently isn't a means of killing a back-track based on perl-code. Looking through perlre it seems like you're right. There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or assertions would be the only reason why I'd expect to be able to execute perl code every time a part of a regex is succesfully parsed. Simply look at RFC 197: a syntactic extension to regexes just to check if a number is within a range! That is absurd, isn't it? Would a simple way to include localized tests, *any*¨test, make more sense? I'm not really crazy about breaking backward compatibilty like this though. It shouldn't be too hard to find another character sequence to perform your above job. Me neither. But many prominent people in the Perl World have expressed their amazement when they found out that the purpose of embedding Perl in a regex wasn't aimed to just do this kind of tests. (?{...}) hasn't even been tried out yet by many people, let alone that they'd use it in production code. (?{...}) is notorious for dumping core. I can't see why it can't be recycled. After all, it still executes Perl code. Beyond that, there's a growing rift between reg-ex extenders and purifiers. I assume the functionality you're trying to produce above is to find the first bare number that is less than 256 (your above would match the 25 in 256).. You're forgetting about greediness. This test simply answers the question: "will this do?" If the answer is always yes, the regex will *always* match the same thing as it would do without this assertion. Compare it to other assertions, such as /\b/, anchors (/^/ and /$/), and lookahead and loobehind. These too don't really control what it would match. They can only express their veto. In any case, the above is not very intuitive to the casual observers as might be while ( /(\d+)/g ) { if ( $1 256 ) { $answer = $1; last; } } Maybe for this simple example. But the same can be said of lookahead and lookbehind. It takes a *bit* of getting used to, but it's very simple, and very powerful. IMO. Likewise, complex matching tokens are the realm of a parser (I'm almost getting tired of saying that). Please be kind to your local maintainer, don't proliferate n'th order code complexities such as recursive or conditional reg-ex's. I said nothing of recursive regexes. Again, just look at RFC 197, and see what complex rules people would like to cram into a regex. Or look at the examples in Friedl's book, to see what contortions people put themselves through, just to make sure that they only match numbers between 0 and 23: /[01]?[09]|2[0-3]/ /[01]?[4-9]|[012]?[0-3]/ So you think these are easy on the maintainer? I think not. A simple boolean expression, "match a number and it must be 23 or less", is far simpler, at least to me. -- Bart.
Re: RFC 308 (v1) Ban Perl hooks into regexes
There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or assertions would be the only reason why I'd expect to be able to execute perl code every time a part of a regex is succesfully parsed. Simply look at RFC 197: a syntactic extension to regexes just to check if a number is within a range! That is absurd, isn't it? Would a simple way to include localized tests, *any*¨test, make more sense? I'm trying to stick to a general philosophy of what's in a reg-ex, and I can almost justify assertions since as you say, \d, ^, $, (?=), etc are these very sort of things. I've been avoiding most of this discussion because it's been so odd, I can't believe they'll ultimately get accepted. Given the argument that it's unlikely that (?{code}) has been implemented in production, I can almost see changing it's symantics. From what I understand, the point would be to run some sort of perl-code and returned defined / undefined, where undefined forces a back-track. As you said, we shouldn't encourage full-fledged execution (since core dumps are common). I can definately see simple optimizations such as (?{$1 op const}), though other interesting things such as (?{exists $keywords{ $1 }}) might proliferate. That would expand to the general purpose (?{ isKeyword( $1 ) }), which then allows function calls within the reg-ex, which is just asking for trouble. One restriction might be to disallow various op-codes within the reg-ex assertion. Namely user-function calls, reg-ex's, and most OS or IO operations. A very common thing could be an optimal /(?\d+)(?{MIN $1 $1 MAX})/, where MIN and MAX are constants. -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
In 005501c027eb$43bafe60$[EMAIL PROTECTED], "Michael Maraist" writes: :As you said, we shouldn't encourage full-fledged execution (since core dumps :are common). Let's not redefine the language just because there are bugs to fix. Surely it is better to concentrate first on fixing the bugs so that we can then more fairly judge whether the feature is useful enough to justify its existence. :One restriction might be to disallow various op-codes within the reg-ex :assertion. Namely user-function calls, reg-ex's, and most OS or IO :operations. That seems quite unreasonable. Why do you _want_ to restrict someone from calling isKeyword($1) within the regexp, which will then read the keyword patterns from a file and check $1 against those patterns using regexps? It seems like an entirely reasonable and useful thing to do. Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
In [EMAIL PROTECTED], Bart Lateur writes: :On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote: : :Remove C?{ code }, C??{ code } and friends. : :I'm putting the finishing touches on an RFC to drop (?{...}) and replace :it with something far more localized, hence cleaner: assertions, also in :Perl code. That way, : : /(?!\d)(\d+)(?{$1 256})/ : :would only match integers between 0 and 255. I'd like to suggest an alternative semantic for this: rename (??{ code }) to (?{ code }), and use the newly freed (??{ code }) for the assertions. (I was about to write an RFC for just that, so I'm glad I can save a bit of time. :) Hugo
Re: RFC 170 (v2) Generalize =~ to a special apply-to assignment operator
Simon Cozens wrote: Looks great on scalars, but... @foo =~ shift; # @foo = $foo[0] ? @foo =~ unshift; # @foo = $foo[-1] ? Yes, if you wanted to do something that twisted. :-) It probably makes more sense to do something like these: @array =~ reverse; @vals =~ sort { $a = $b }; @file =~ grep /!^#/; Although I have to admit I like: @foo =~ grep !/\S/; Exactly! But I'm not very keen on the idea of %foo =~ keys; Again, that depends on whether or not you're Really Evil. ;-) -Nate