Apoc 5 questions/comments

Dave Storrs Thu, 06 Jun 2002 22:41:02 -0700

Well, A5 definitely has my head spinning.  The new features seem amazingly
powerful...it almost feels like we're going to have two equally powerful,
equally complex languages living side-by-side:  one of them is called
"Perl" and the other one is called "Regexes".  Although they may talk to
one another, I really did come away feeling like they were completely
separate animals.


I admit I'm a bit nervous about that...so far, I'm completely sold on
(basically) all the new features and changes in Perl 6, and I'm eagerly
anticipating working with them.  But this level of change...I don't know.
I've spent a lot of time getting to be (reasonaly) good at Perl regular
expressions, and I don't like the thought of throwing out all or most of
that effort.  Somehow, this feels like we're trying to roll all of Prolog
into Perl, and I'm not sure I personally want to go there (note the
"personally"...YMMV).

For now, I'm just going to defer worrying about it until I see Exegesis 5,
since past experience has shown me that there is a good chance that all my
fears will be shown to be groundless once concrete examples are being
demonstrated.


In any case, I do have some specific questions:

-----------------

Page 8:
        s:3x:3rd /foo/bar/
        That changes the 3rd, 6th, and 9th occurrences.

Just to verify, this:

s:3rd /foo<3>/bar/

....would do the 3rd, 4th, and 5th, correct?

-----------------

Page 8:

The u1-u3 mods all say "level 1 support".  I assume this was a typo, and
they should go (u1 => 'level 1', u2 => 'level 2', u3 => 'level 3').

-----------------

Can modifiers abut the delimiter?

s:3x /foo/bar    # most (all?) examples looked like this
s:3x/foo/bar     # is this legal?

-----------------

Can we please have a 'reverse x' modifier that means "treat whitespace as
literals"?  Yes, we are living in a Unicode world now and your data could
theoretically be coming in from a different character set than expected.
But there are times when it won't...when (for example), you wrote the data
out yourself, or you're operating on files that are generated and
maintained purely in-house, so they are guaranteed to be in the same
character set as the Perl source code you're writing.  I understand the
arguments for the way the defaults are set.  I even agree with them.  But
you will NEVER convince me that the first example below is not easier to
read than any of the alternatives:

/FATAL ERROR\:    Process (\d+) received signal\: (\d+)/
/FATAL ERROR\:\ \ \ \ Process\ (\d+)\ received\ signal\:\ (\d+)/
/FATAL ERROR\: \h+ Process \h+ (\d+) \h+ received \h+ signal: \h+ (\d+)/
/FATAL ERROR\: \s+ Process \s+ (\d+) \s+ received \s+ signal: \s+ (\d+)/

(Yes, I know that the last one matches vertical whitespace and
therefore means something slightly different than the others.)

If this means that we need to store a byte or two to remember what
character set the originally-read-in code was in before being converted to
UTF-8 (or whatever we're using internally), so that we know what character
set to assume "literal ws" refers to...well, that seems like a small
price to pay for a lot of convenience.

-----------------

Page 9:
        my $foo = ?/.../;  # boolean context, return whether matched,
        my $foo = +/.../;  # numeric context, return count of matches
        my $foo = _/.../;  # string context, return captured/matched string

This 'initial character to force evaluation' rule initially seemed
annoying, but the more I think about it, the more I like it; one
character isn't much to type, and it makes it extremely clear why you're
doing the match (i.e., what you're trying to get back).  Kudos to our
Fearless Language Designer!

-----------------

I am a little unclear on what the difference is between these two:
        my @foo = <$rx>;
        my @foo = m/<$rx>/;

If I understand correctly, it works like this:

my @stuff;
$_ = "foofoofoo";
$rx = /:each foo/;

for (0..2) { @stuff = <$rx> }
    # above line is equialent to following 3 lines:
@stuff = ('foo', 'foo', 'foo');
@stuff = ();
@stuff = ();

for (0..2) { @stuff = m/<$rx>/ }
    # above line is equialent to following 3 lines:
@stuff = ('foo', 'foo', 'foo');
@stuff = ('foo', 'foo', 'foo');
@stuff = ('foo', 'foo', 'foo');

Is that correct?

-----------------

Page 10:

        You could also use the {'...'} construct for comments, but then
        you risk warnings about "useless use of a string in void context".

Could we automagically turn off that warning inside such constructs, when
the only thing there was a string?  (Perhaps there could be a switch
that prevented it from being turned off, if people really wanted to
see it; if so, make it be OFF by default, so it needs to be enabled,
much like 'use strict.')

-----------------

Page 11:

        / pattern ::: { code() or fail } /  # fails entire rule

Farther down:

        A pattern nested within a closure is classified as its own rule,
        however, so it never gets the chance to pass out of a {...}
        closure.

If I understand correctly, that means that this:

/ pattern ::: { $regex or fail } /

would NOT fail the entire rule...correct?  The only reason that the
'code() or fail' construct fails the entire rule is because the code is
immediately present within the closure and is not "interpolated" (that
isn't the right word, I know) in through a subrule.

-----------------

Page 12:

        When the entire match succeeds, the top-level node is returned as
        a result object....  The name of the result object is $0.

If the name of the result object returned from a successful match is $0,
where is the name of the currently-executing program stored?

-----------------

Page 12:

        my &rx := /(xxx)/;

Shoud that be a $ instead of a & on the rx variable?

-----------------

Page 13:

        / $2:=(.*?), \h* $1:=(.*) /

Does this imply that $1, $2, etc are now read-write outside of regexen?

-----------------

Page 23:

        $string =~ (@array = split);

I would like to move that this be a deprecated usage right from day one
(or undocumented, either one). It's fine if it works, but it's so
counterintuitive that it should simply be one of the weird crannies of the
language that fall out of the parsing rules.

Vale tanto.

-----------------

How are 'fail' and 'die' different inside a regex?

Can subroutines that aren't used in regexen use 'fail' to throw an
exception?  If so, how is it different from 'die' when used outside a
regex?



David Storrs

Apoc 5 questions/comments

Reply via email to