Re: RFC 308 (v1) Ban Perl hooks into regexes
> I'm trying to stick to a general philosophy of what's in a reg-ex, and I can > almost justify assertions since as you say, \d, ^, $, (?=), etc are these > very sort of things. I've been avoiding most of this discussion because > it's been so odd, I can't believe they'll ultimately get accepted. Given > the argument that it's unlikely that (?{code}) has been implemented in > production, I can almost see changing it's symantics. From what I > understand, the point would be to run some sort of perl-code and returned > defined / undefined, where undefined forces a back-track. > The proposal that MJD and I were working on still has a lot of rough edges, which may not be resolvable before the deadline. It proposes a mechanism which allowed the programmer to set up a block in which the flow of control was determined by the success or failure of statements within the block. Regex matches always determined whether flow would continue forward or back up; arbitrary Perl code did whatever it did; and a special function (which we called 'test' for the lack of a better name) allowed the programmer to use true/false conditions to force backtracking as required. MJD has had to withdraw from the development of the RFC, and it is not absolutely complete, but I'm still interested in trying to see if it can be generalized sufficiently to be a useful extension to the language. --- Joe M.
Re: RFC 308 (v1) Ban Perl hooks into regexes
In <[EMAIL PROTECTED]>, Tom Christiansen writes: :>I consider recursive regexps very useful: :> :> $a = qr{ (?> [^()]+ ) | \( (??{ $a }) \) }; : :Yes, they're "useful", but darned tricky sometimes, and in :ways other than simple regex-related stuff. For example, :consider what happens if you do : :my $regex = qr{ (?> [^()]+ ) | \( (??{ $regex }) \) }; : :That doesn't work due to differing scopings on either side :of the assignment. Yes, this is a problem. But it bites people in other situations as well: my $fib = sub { $_[0] < 2 ? 1 : &$fib($_[0] - 1) }; I haven't kept up with the non-regexp RFCs, but I hope someone has suggested an alternative scoping that would permit these cases to refer to the just-introduced variable. Perhaps we should special-case qr{} and sub{} - I can't offhand think of another area that suffers from this, and I don't think these two areas would suffer from an inability to refer to the same- -name variable in an outlying scope. A useful alternative might be a different special case. Plucking random grammar, perhaps: my $regex = qr{ (?> [^()]+ ) | \( ^^ \) }x; Certainly I think a simple self-reference is likely to be a common enough use that it would help to avoid the full deferred eval infrastructure, even when it works properly. :And clearly a non-regex approach could be more legible for :recursive parsing. Like any aspect of programming, if you use it regularly it will become easier to read. And comments are a wonderful thing. Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
>I consider recursive regexps very useful: > > $a = qr{ (?> [^()]+ ) | \( (??{ $a }) \) }; Yes, they're "useful", but darned tricky sometimes, and in ways other than simple regex-related stuff. For example, consider what happens if you do my $regex = qr{ (?> [^()]+ ) | \( (??{ $regex }) \) }; That doesn't work due to differing scopings on either side of the assignment. And clearly a non-regex approach could be more legible for recursive parsing. --tom Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
Re: RFC 308 (v1) Ban Perl hooks into regexes
i keep parsing the subject of this rfc as 'ban perl books' :) uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting The Perl Books Page --- http://www.sysarch.com/cgi-bin/perl_books The Best Search Engine on the Net -- http://www.northernlight.com
Re: RFC 308 (v1) Ban Perl hooks into regexes
In <[EMAIL PROTECTED]>, Bart Lateur writes: :On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote: : :>Remove C, C and friends. : :I'm putting the finishing touches on an RFC to drop (?{...}) and replace :it with something far more localized, hence cleaner: assertions, also in :Perl code. That way, : : /(?
Re: RFC 308 (v1) Ban Perl hooks into regexes
In <005501c027eb$43bafe60$[EMAIL PROTECTED]>, "Michael Maraist" writes: :As you said, we shouldn't encourage full-fledged execution (since core dumps :are common). Let's not redefine the language just because there are bugs to fix. Surely it is better to concentrate first on fixing the bugs so that we can then more fairly judge whether the feature is useful enough to justify its existence. :One restriction might be to disallow various op-codes within the reg-ex :assertion. Namely user-function calls, reg-ex's, and most OS or IO :operations. That seems quite unreasonable. Why do you _want_ to restrict someone from calling isKeyword($1) within the regexp, which will then read the keyword patterns from a file and check $1 against those patterns using regexps? It seems like an entirely reasonable and useful thing to do. Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
> There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or > assertions would be the only reason why I'd expect to be able to execute > perl code every time a part of a regex is succesfully parsed. Simply > look at RFC 197: a syntactic extension to regexes just to check if a > number is within a range! That is absurd, isn't it? Would a simple way > to include localized tests, *any*¨test, make more sense? I'm trying to stick to a general philosophy of what's in a reg-ex, and I can almost justify assertions since as you say, \d, ^, $, (?=), etc are these very sort of things. I've been avoiding most of this discussion because it's been so odd, I can't believe they'll ultimately get accepted. Given the argument that it's unlikely that (?{code}) has been implemented in production, I can almost see changing it's symantics. From what I understand, the point would be to run some sort of perl-code and returned defined / undefined, where undefined forces a back-track. As you said, we shouldn't encourage full-fledged execution (since core dumps are common). I can definately see simple optimizations such as (?{$1 op const}), though other interesting things such as (?{exists $keywords{ $1 }}) might proliferate. That would expand to the general purpose (?{ isKeyword( $1 ) }), which then allows function calls within the reg-ex, which is just asking for trouble. One restriction might be to disallow various op-codes within the reg-ex assertion. Namely user-function calls, reg-ex's, and most OS or IO operations. A very common thing could be an optimal /(?>\d+)(?{MIN < $1 && $1 > MAX})/, where MIN and MAX are constants. -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Tue, 26 Sep 2000 13:32:37 -0400, Michael Maraist wrote: > >I can't believe that there currently isn't a means of killing a back-track >based on perl-code. Looking through perlre it seems like you're right. There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or assertions would be the only reason why I'd expect to be able to execute perl code every time a part of a regex is succesfully parsed. Simply look at RFC 197: a syntactic extension to regexes just to check if a number is within a range! That is absurd, isn't it? Would a simple way to include localized tests, *any*¨test, make more sense? >I'm >not really crazy about breaking backward compatibilty like this though. It >shouldn't be too hard to find another character sequence to perform your >above job. Me neither. But many prominent people in the Perl World have expressed their amazement when they found out that the purpose of embedding Perl in a regex wasn't aimed to just do this kind of tests. (?{...}) hasn't even been tried out yet by many people, let alone that they'd use it in production code. (?{...}) is notorious for dumping core. I can't see why it can't be recycled. After all, it still executes Perl code. >Beyond that, there's a growing rift between reg-ex extenders and purifiers. >I assume the functionality you're trying to produce above is to find the >first bare number that is less than 256 (your above would match the 25 in >256).. You're forgetting about greediness. This test simply answers the question: "will this do?" If the answer is always yes, the regex will *always* match the same thing as it would do without this assertion. Compare it to other assertions, such as /\b/, anchors (/^/ and /$/), and lookahead and loobehind. These too don't really control what it would match. They can only express their veto. >In any case, the above is not very intuitive to the casual observers as >might be > >while ( /(\d+)/g ) { > if ( $1 < 256 ) { >$answer = $1; >last; > } >} Maybe for this simple example. But the same can be said of lookahead and lookbehind. It takes a *bit* of getting used to, but it's very simple, and very powerful. IMO. >Likewise, complex matching tokens are the realm of a parser (I'm almost >getting tired of saying that). Please be kind to your local maintainer, >don't proliferate n'th order code complexities such as recursive or >conditional reg-ex's. I said nothing of recursive regexes. Again, just look at RFC 197, and see what complex rules people would like to cram into a regex. Or look at the examples in Friedl's book, to see what contortions people put themselves through, just to make sure that they only match numbers between 0 and 23: /[01]?[09]|2[0-3]/ /[01]?[4-9]|[012]?[0-3]/ So you think these are easy on the maintainer? I think not. A simple boolean expression, "match a number and it must be 23 or less", is far simpler, at least to me. -- Bart.
Re: RFC 308 (v1) Ban Perl hooks into regexes
> On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote: > > >Remove C, C and friends. > > I'm putting the finishing touches on an RFC to drop (?{...}) and replace > it with something far more localized, hence cleaner: assertions, also in > Perl code. That way, > > /(? > would only match integers between 0 and 255. > > Communications between Perl code snippets inside a regex would be > strongly discouraged. I can't believe that there currently isn't a means of killing a back-track based on perl-code. Looking through perlre it seems like you're right. I'm not really crazy about breaking backward compatibilty like this though. It shouldn't be too hard to find another character sequence to perform your above job. Beyond that, there's a growing rift between reg-ex extenders and purifiers. I assume the functionality you're trying to produce above is to find the first bare number that is less than 256 (your above would match the 25 in 256).. Easily fixed by inserting (?!\d) between the second and third aggregates. If you were to be more strict, you could more simply apply \b(\d+)\b... In any case, the above is not very intuitive to the casual observers as might be while ( /(\d+)/g ) { if ( $1 < 256 ) { $answer = $1; last; } } Likewise, complex matching tokens are the realm of a parser (I'm almost getting tired of saying that). Please be kind to your local maintainer, don't proliferate n'th order code complexities such as recursive or conditional reg-ex's. Yes, I can mandate that my work doesn't use them, but it doesn't mean that CPAN won't (and I often have to reverse engineer CPAN modules to figure out why something isn't working). That said, nobody should touch the various relative reg-ex operators. I look at reg-ex as a tokenizer, and things like (?>...) which optimizes reading, and (?
Re: RFC 308 (v1) Ban Perl hooks into regexes
On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote: >Remove C, C and friends. I'm putting the finishing touches on an RFC to drop (?{...}) and replace it with something far more localized, hence cleaner: assertions, also in Perl code. That way, /(?
Re: RFC 308 (v1) Ban Perl hooks into regexes
Perl6 RFC Librarian <[EMAIL PROTECTED]> writes: > This and other RFCs are available on the web at > http://dev.perl.org/rfc/ > > =head1 TITLE > > Ban Perl hooks into regexes > > =head1 VERSION > > Maintainer: Simon Cozens <[EMAIL PROTECTED]> > Date: 25 Sep 2000 > Mailing List: [EMAIL PROTECTED] > Number: 308 > Version: 1 > Status: Developing > > =head1 ABSTRACT > > Remove C, C and friends. > > =head1 DESCRIPTION > > The regular expression engine may well be rewritten from scratch or > borrowed from somewhere else. One of the scarier things we've seen > recently is that Perl's engine casts back its Krakken tentacles into Perl > and executes Perl code. This is spooky, tangled, and incestuous. > (Although admittedly fun.) It's *loads* of fun. Though admittedly, I've not used it in any *real* code yet... > It would be preferable to keep the regular expression engine as > self-contained as possible, if nothing else to enable it to be used > either outside Perl or inside standalone translated Perl programs > without a Perl runtime. > > To do this, we'll have to remove the bits of the engine that call > Perl code. In short: C and C must die. You don't *have* to remove 'em. You can just throw an exception during compilation if some hypothetical 'no regex subs' pragma is there. -- Piers '063039183598121887134041122600:1917131105:Jaercunrlkso tPh.'=~/^(.{6})* (.{6})[^:]*:(..)*(..).*:(??{'.{'.$2%$4.'}'})(.)(??{print$5})/x;print"\n"
Re: RFC 308 (v1) Ban Perl hooks into regexes
From: "Simon Cozens" <[EMAIL PROTECTED]> > > A lot of what is trying to happen in (?{..}) and friends is parsing. > > That's not the problem that I'm trying to solve. The problem I'm trying > to solve is interdependence. Parsing is neither here nor there. Well, I recognize that your focus was not on parsing. However, I don't feel that perl-abstractness is a key deliverable of perl. My comment was primarly on how the world might be a better place with reg-ex's not getting into algorithms that are better solved elsewhere. I just thought it might help your cause if you expanded your rationale. -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
From: "Hugo" <[EMAIL PROTECTED]> > :Remove C, C and friends. > > Whoops, I missed this bit - what 'friends' do you mean? Going by the topic, I would assume it involves (?(cond) true-exp | false-exp). There's also the $^R or what-ever it was that is the result of (?{ }). Basically the code-like operations found in perl 5.005 and 5.6's perlre. -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
> On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote: > > I think the proposal that Joe McMahon and I are finishing up now will > > make these obsolete anyway. > > Good! The less I have to maintain the better... Sorry, I meant that it would make (??...) and (?{...}) obsolete, not that it will make your RFC obsolete. Our proposal is agnostic about whether (??...) and (?{...}) should be eliminated.
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 04:55:18PM -0400, Michael Maraist wrote: > A lot of what is trying to happen in (?{..}) and friends is parsing. That's not the problem that I'm trying to solve. The problem I'm trying to solve is interdependence. Parsing is neither here nor there. -- Intel engineering seem to have misheard Intel marketing strategy. The phrase was "Divide and conquer" not "Divide and cock up" (By [EMAIL PROTECTED], Alan Cox)
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote: > I think the proposal that Joe McMahon and I are finishing up now will > make these obsolete anyway. Good! The less I have to maintain the better... -- Keep the number of passes in a compiler to a minimum. -- D. Gries
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 11:31:08PM +0100, Hugo wrote: > In <[EMAIL PROTECTED]>, Perl6 RFC Librarian writes: > :=head1 ABSTRACT > : > :Remove C, C and friends. > > Whoops, I missed this bit - what 'friends' do you mean? Whatever even more bizarre extensions people will have suggested by now... -- DEC diagnostics would run on a dead whale. -- Mel Ferentz
Re: RFC 308 (v1) Ban Perl hooks into regexes
In <[EMAIL PROTECTED]>, Perl6 RFC Librarian writes: :=head1 ABSTRACT : :Remove C, C and friends. Whoops, I missed this bit - what 'friends' do you mean? Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
In <[EMAIL PROTECTED]>, Perl6 RFC Librarian writes: :It would be preferable to keep the regular expression engine as :self-contained as possible, if nothing else to enable it to be used :either outside Perl or inside standalone translated Perl programs :without a Perl runtime. : :To do this, we'll have to remove the bits of the engine that call :Perl code. In short: C and C must die. I would have thought it more reasonable, if you wish to create standalone translated Perl programs without a Perl runtime, to fail with a helpful error if you encounter a construct that won't permit it. You'll need to remove chunks of eval() and do() as well, otherwise, and probably more besides. In the context of a more shareable regexp engine, I would like to see (? and (?? stay, but they need to be implemented more cleanly. You could handle them quite nicely, I think, with just three well-defined external hooks: one to find the matching brace at the end of the code, one to parse the code, and one to run the code. Anyone wishing to re-use the regexp library could then choose either to keep the default drop-in replacements for those hooks (that die) or provide their own equivalents to the perl usage. I consider recursive regexps very useful: $a = qr{ (?> [^()]+ ) | \( (??{ $a }) \) }; .. and I class re-eval in general in the arena of 'making hard things possible'. But whether or not they stay, it would probably also be useful to have a more direct way of expressing simple recursive regexps such as the above without resorting to a costly eval. When I've tried to come up with an appropriate restriction, however, I find it very difficult to pick a dividing line. Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
I think the proposal that Joe McMahon and I are finishing up now will make these obsolete anyway.
Re: RFC 308 (v1) Ban Perl hooks into regexes
> Ban Perl hooks into regexes > > =head1 ABSTRACT > > Remove C, C and friends. > At first, I thought you were crazy, then I read >It would be preferable to keep the regular expression engine as >self-contained as possible, if nothing else to enable it to be used >either outside Perl or inside standalone translated Perl programs >without a Perl runtime. Which makes a lot of sence in the development field. Tom has mentioned that the reg-ex engine is getting really out of hand; it's hard enough to document clearly, much less be understandible to the maintainer (or even the debugger). A lot of what is trying to happen in (?{..}) and friends is parsing. To quote Star Trek Undiscovered Country, "Just because we can do a thing, doesn't mean we should." Tom and I have commented that parsing should be done in a PARSER, not a lexer (like our beloved reg-ex engine). RecDescent and Yacc do a wonderful job of providing parsing power within perl. I'd suggest you modify your RFC to summarize the above; that (?{}) and friends are parsers, and we already have RecDescent / etc. which are much easier to understand, and don't require too much additional overhead. Other than the inherent coolness of having hooks into the reg-ex code, I don't really see much real use from it other than debugging; eg (?{ print "Still here\n" }). I could go either way on the topic, but I'm definately of the opinion that we shouldn't continue down this dark path any further. -Michael
RFC 308 (v1) Ban Perl hooks into regexes
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Ban Perl hooks into regexes =head1 VERSION Maintainer: Simon Cozens <[EMAIL PROTECTED]> Date: 25 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 308 Version: 1 Status: Developing =head1 ABSTRACT Remove C, C and friends. =head1 DESCRIPTION The regular expression engine may well be rewritten from scratch or borrowed from somewhere else. One of the scarier things we've seen recently is that Perl's engine casts back its Krakken tentacles into Perl and executes Perl code. This is spooky, tangled, and incestuous. (Although admittedly fun.) It would be preferable to keep the regular expression engine as self-contained as possible, if nothing else to enable it to be used either outside Perl or inside standalone translated Perl programs without a Perl runtime. To do this, we'll have to remove the bits of the engine that call Perl code. In short: C and C must die. =head1 IMPLEMENTATION It's more of an unimplementation really. =head1 REFERENCES None.