Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Piers Cawley

Perl6 RFC Librarian [EMAIL PROTECTED] writes:

 This and other RFCs are available on the web at
   http://dev.perl.org/rfc/
 
 =head1 TITLE
 
 Ban Perl hooks into regexes
 
 =head1 VERSION
 
   Maintainer: Simon Cozens [EMAIL PROTECTED]
   Date: 25 Sep 2000 
   Mailing List: [EMAIL PROTECTED]
   Number: 308
   Version: 1
   Status: Developing
 
 =head1 ABSTRACT
 
 Remove C?{ code }, C??{ code } and friends.
 
 =head1 DESCRIPTION
 
 The regular expression engine may well be rewritten from scratch or
 borrowed from somewhere else. One of the scarier things we've seen
 recently is that Perl's engine casts back its Krakken tentacles into Perl
 and executes Perl code. This is spooky, tangled, and incestuous.
 (Although admittedly fun.)

It's *loads* of fun. Though admittedly, I've not used it in any *real*
code yet...

 It would be preferable to keep the regular expression engine as
 self-contained as possible, if nothing else to enable it to be used
 either outside Perl or inside standalone translated Perl programs
 without a Perl runtime.
 
 To do this, we'll have to remove the bits of the engine that call 
 Perl code. In short: C?{ code } and C??{ code } must die.

You don't *have* to remove 'em. You can just throw an exception during
compilation if some hypothetical 'no regex subs' pragma is there.

-- 
Piers
'063039183598121887134041122600:1917131105:Jaercunrlkso tPh.'=~/^(.{6})*
(.{6})[^:]*:(..)*(..).*:(??{'.{'.$2%$4.'}'})(.)(??{print$5})/x;print"\n"





Re: RFC 170 (v2) Generalize =~ to a special apply-to assignment operator

2000-09-26 Thread Simon Cozens

On Sun, Sep 17, 2000 at 05:41:57AM -, Perl6 RFC Librarian wrote:
. Some criticized it as being too sugary, since this:
 
$string =~ quotemeta;# $string = quotemeta $string;
 
 Is not as clear as the original. However, there is fairly similar
 precedent in:
 
$x += 5; # $x = $x + 5;

Looks great on scalars, but...

@foo =~ shift;   # @foo = $foo[0]  ?
@foo =~ unshift; # @foo = $foo[-1] ?

Although I have to admit I like:

@foo =~ grep !/\S/;

But I'm not very keen on the idea of

%foo =~ keys;

-- 
A formal parsing algorithm should not always be used.
-- D. Gries



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Bart Lateur

On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:

Remove C?{ code }, C??{ code } and friends.

I'm putting the finishing touches on an RFC to drop (?{...}) and replace
it with something far more localized, hence cleaner: assertions, also in
Perl code. That way,

/(?!\d)(\d+)(?{$1  256})/

would only match integers between 0 and 255.

Communications between Perl code snippets inside a regex would be
strongly discouraged.

-- 
Bart.



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Michael Maraist


 On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:

 Remove C?{ code }, C??{ code } and friends.

 I'm putting the finishing touches on an RFC to drop (?{...}) and replace
 it with something far more localized, hence cleaner: assertions, also in
 Perl code. That way,

 /(?!\d)(\d+)(?{$1  256})/

 would only match integers between 0 and 255.

 Communications between Perl code snippets inside a regex would be
 strongly discouraged.

I can't believe that there currently isn't a means of killing a back-track
based on perl-code.  Looking through perlre it seems like you're right.  I'm
not really crazy about breaking backward compatibilty like this though.  It
shouldn't be too hard to find another character sequence to perform your
above job.

Beyond that, there's a growing rift between reg-ex extenders and purifiers.
I assume the functionality you're trying to produce above is to find the
first bare number that is less than 256 (your above would match the 25 in
256).. Easily fixed by inserting (?!\d) between the second and third
aggregates.  If you were to be more strict, you could more simply apply
\b(\d+)\b...

In any case, the above is not very intuitive to the casual observers as
might be

while ( /(\d+)/g ) {
  if ( $1  256 ) {
$answer = $1;
last;
  }
}

Likewise, complex matching tokens are the realm of a parser (I'm almost
getting tired of saying that).  Please be kind to your local maintainer,
don't proliferate n'th order code complexities such as recursive or
conditional reg-ex's.  Yes, I can mandate that my work doesn't use them, but
it doesn't mean that CPAN won't (and I often have to reverse engineer CPAN
modules to figure out why something isn't working).

That said, nobody should touch the various relative reg-ex operators.  I
look at reg-ex as a tokenizer, and things like (?...) which optimizes
reading, and (?!..), etc are very useful in this realm.

Just my $0.02

-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Bart Lateur

On Tue, 26 Sep 2000 13:32:37 -0400, Michael Maraist wrote:



I can't believe that there currently isn't a means of killing a back-track
based on perl-code.  Looking through perlre it seems like you're right.

There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or
assertions would be the only reason why I'd expect to be able to execute
perl code every time a part of a regex is succesfully parsed. Simply
look at RFC 197: a syntactic extension to regexes just to check if a
number is within a range! That is absurd, isn't it? Would a simple way
to include localized tests, *any*¨test, make more sense?

I'm
not really crazy about breaking backward compatibilty like this though.  It
shouldn't be too hard to find another character sequence to perform your
above job.

Me neither. But many prominent people in the Perl World have expressed
their amazement when they found out that the purpose of embedding Perl
in a regex wasn't aimed to just do this kind of tests. (?{...}) hasn't
even been tried out yet by many people, let alone that they'd use it in
production code. (?{...}) is notorious for dumping core. I can't see why
it can't be recycled. After all, it still executes Perl code.

Beyond that, there's a growing rift between reg-ex extenders and purifiers.
I assume the functionality you're trying to produce above is to find the
first bare number that is less than 256 (your above would match the 25 in
256).. 

You're forgetting about greediness. This test simply answers the
question: "will this do?" If the answer is always yes, the regex will
*always* match the same thing as it would do without this assertion.
Compare it to other assertions, such as /\b/, anchors (/^/ and /$/), and
lookahead and loobehind. These too don't really control what it would
match. They can only express their veto.

In any case, the above is not very intuitive to the casual observers as
might be

while ( /(\d+)/g ) {
  if ( $1  256 ) {
$answer = $1;
last;
  }
}

Maybe for this simple example. But the same can be said of lookahead and
lookbehind. It takes a *bit* of getting used to, but it's very simple,
and very powerful. IMO.

Likewise, complex matching tokens are the realm of a parser (I'm almost
getting tired of saying that).  Please be kind to your local maintainer,
don't proliferate n'th order code complexities such as recursive or
conditional reg-ex's.

I said nothing of recursive regexes. Again, just look at RFC 197, and
see what complex rules people would like to cram into a regex. Or look
at the examples in Friedl's book, to see what contortions people put
themselves through, just to make sure that they only match numbers
between 0 and 23:

/[01]?[09]|2[0-3]/
/[01]?[4-9]|[012]?[0-3]/

So you think these are easy on the maintainer? I think not. A simple
boolean expression, "match a number and it must be 23 or less", is far
simpler, at least to me.

-- 
Bart.



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Michael Maraist

 There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or
 assertions would be the only reason why I'd expect to be able to execute
 perl code every time a part of a regex is succesfully parsed. Simply
 look at RFC 197: a syntactic extension to regexes just to check if a
 number is within a range! That is absurd, isn't it? Would a simple way
 to include localized tests, *any*¨test, make more sense?

I'm trying to stick to a general philosophy of what's in a reg-ex, and I can
almost justify assertions since as you say, \d, ^, $, (?=), etc are these
very sort of things.  I've been avoiding most of this discussion because
it's been so odd, I can't believe they'll ultimately get accepted.  Given
the argument that it's unlikely that (?{code}) has been implemented in
production, I can almost see changing it's symantics.  From what I
understand, the point would be to run some sort of perl-code and returned
defined / undefined, where undefined forces a back-track.

As you said, we shouldn't encourage full-fledged execution (since core dumps
are common).  I can definately see simple optimizations such as (?{$1 op
const}), though other interesting things such as (?{exists $keywords{ $1 }})
might proliferate.  That would expand to the general purpose (?{
isKeyword( $1 ) }), which then allows function calls within the reg-ex,
which is just asking for trouble.

One restriction might be to disallow various op-codes within the reg-ex
assertion.  Namely user-function calls, reg-ex's, and most OS or IO
operations.

A very common thing could be an optimal /(?\d+)(?{MIN  $1  $1  MAX})/,
where MIN and MAX are constants.

-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Hugo

In 005501c027eb$43bafe60$[EMAIL PROTECTED], "Michael Maraist" writes:
:As you said, we shouldn't encourage full-fledged execution (since core dumps
:are common).

Let's not redefine the language just because there are bugs to fix.
Surely it is better to concentrate first on fixing the bugs so that
we can then more fairly judge whether the feature is useful enough
to justify its existence.

:One restriction might be to disallow various op-codes within the reg-ex
:assertion.  Namely user-function calls, reg-ex's, and most OS or IO
:operations.

That seems quite unreasonable. Why do you _want_ to restrict someone
from calling isKeyword($1) within the regexp, which will then read
the keyword patterns from a file and check $1 against those patterns
using regexps? It seems like an entirely reasonable and useful thing
to do.

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Hugo

In [EMAIL PROTECTED], Bart Lateur writes:
:On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:
:
:Remove C?{ code }, C??{ code } and friends.
:
:I'm putting the finishing touches on an RFC to drop (?{...}) and replace
:it with something far more localized, hence cleaner: assertions, also in
:Perl code. That way,
:
:   /(?!\d)(\d+)(?{$1  256})/
:
:would only match integers between 0 and 255.

I'd like to suggest an alternative semantic for this: rename
(??{ code }) to (?{ code }), and use the newly freed (??{ code })
for the assertions. (I was about to write an RFC for just that, so
I'm glad I can save a bit of time. :)

Hugo



Re: RFC 170 (v2) Generalize =~ to a special apply-to assignment operator

2000-09-26 Thread Nathan Wiger

Simon Cozens wrote:
 
 Looks great on scalars, but...
 
 @foo =~ shift;   # @foo = $foo[0]  ?
 @foo =~ unshift; # @foo = $foo[-1] ?

Yes, if you wanted to do something that twisted. :-) It probably makes
more sense to do something like these:

   @array =~ reverse;
   @vals =~ sort { $a = $b };
   @file =~ grep /!^#/;
 
 Although I have to admit I like:
 
 @foo =~ grep !/\S/;

Exactly!
 
 But I'm not very keen on the idea of
 
 %foo =~ keys;

Again, that depends on whether or not you're Really Evil. ;-)

-Nate