Re: Perl6 Operator List

Larry Wall Fri, 25 Oct 2002 19:46:35 -0700

On Fri, 25 Oct 2002, Michael Lazzaro wrote:
: What's the Official Perl difference between a named unary op and a 
: one-arg universal method?


I didn't give the other half of the answer.  A method is a term,
not an operator.  It's the . in front of it that's the operator...

It's just that, in indirect-object syntax, the colon on

    length $object:

is optional, so it looks a lot like unary operator.  But I think
the precedence is probably LISTOP, not UNIOP, at least if we
stick with the Perl 5 approach of any listop grabbing all the
available args to the right.  I don't see a good way to keep
the precedence of length and friends at UNIOP--at least, not
without having universal subroutines that just pass their
single arguments off to the object as a real method.  The
question is whether there's any way to keep Perl 5's

    print length $a, "\n";

so it still parses as expected.  It seems a bit silly to have to
declare a universal sub like

    sub *length ($x) { $x.length() }

On the other hand, we would like to make methods obey the same
argument parsing rules as subs.  Which means that the above behavior
could be implied for untyped objects via

    class Object {
        method length ($x) {...}
    }

without having to declare a universal sub.  But that depends on our
assumption that any method call's syntax can be determined by looking
at the type of its left side.  That has ramifications if the declared
type of the left side is a base class and we really want to call a
method in a derived class that exceeds the contract of the base class.

We can probably defer some of these decisions till run time, such
as whether to interpret an @foo argument in scalar or list context.
But changing the precedence of length from a LISTOP to a UNIOP can't
be deferred that way.  Which is why we either need the parser to know
the uniop declaration of

    sub *length ($x) { $x.length() }

or we have to make

    print length $a, "\n";

illegal, and require people to say one of:

    print length($a), "\n";
    print (length $a), "\n";
    print $a.length, "\n";

if there's a following list.  The latter approach seems quite a bit
cleaner, in that it doesn't require either the parser or the programmer
to maintain special knowledge about a unary function called "length".

I think we also need to fix this:

    print (length $a), "\n";

The problem with Perl 5's rule, "If it looks like a function, it *is*
a function", is that the above doesn't actually look like a function
to most people.  I'm thinking we need a rule that says you can't put
a space before a dereferencing (...), just as you can't with {...}
or [...].  If you want to, then, as with {...} or [...] you have to
use .(...) instead.  That is,

    print .(length $a), "\n";

means

    print(length $a), "\n";

but

    print (length $a), "\n";

means

    print( (length $a), "\n" );

If we ever allow a syntax like C++'s foo<bar> for who knows what
purpose, then it would have to follow the same rules, since it would
otherwise be ambigous with a < operator.  So maybe we should start
telling people not to say things like $a<$b when they mean $a < $b.
One could argue that this rule should be followed for all bracketing
syntax, including Unicode.  That would be consistent, at least.  The
real name of subscripts is then always with the dot:

    operator:.[]        # subscript []
    operator:.{}        # subscript {}
    operator:.()        # subscript () aka function args
    operator:.<>        # subscript <> (reserved)
    ...

    operator:[]         # array composer
    operator:{}         # hash or closure
    operator:()         # regular parens
    operator:<>         # an op that screws up <, <<, <=, and <=>   :-)
    ...

That's assuming that matched brackets are always recognized and assumed to
have an expression in the middle.

Actually, it's not clear that operator:<> would mess up binary <
and friends.  It looks as if those four are really:

    term:[]             # array composer
    term:{}             # hash or closure
    term:()             # regular parens
    term:<>             # the input symbol AKA call the iterator
    ...

So we note that we can actually get away with having all of:

    operator:.<>
    operator:<
    term:<>

without ambiguity (assuming a consistent space rule).  However,
if we ever had

    operator:{

we couldn't do the trick of assuming an implicit operator before a
block in

    if $a eq $b  {...}

But now note how we could have all three of

    $a++        # operator:.++
    $a ++ $b    # operator:++
    ++$b        # term:++

by applying the rule to non-bracketing characters as well.  Basically,
operator:.op vs operator:op allows us to distinguish postfix ops from
binary ops, if we want.  That might be cool.

But we have a problem if we want to specify a binary operator that begins
with dot.  So it probably has to be:

    postfix:++
    infix:++
    prefix:++

or some such.  That still leaves us with a problem if they define:

    postfix:!   # factorial
    infix:!     # xor superposition
    prefix:!    # logical negation

The problem is now that you can't say

    $x .!       # still factorial!

if you want to put space before the postfix !, because we comandeered the
dot for bitops.  Hmm.  Maybe that was a mistake.  Something to ponder.

I expect we can't just rely on bracket properties in Unicode for understanding
how to parse operators though, or we can't write things like:

    term:''     # single quoted string.

Either we need a placeholder, or something that says to treat matched chars
as bracket chars.  The placeholder is more general, particularly if it
specifies the grammar rule to parse the inside:

    term:'<singletext>'
    term:(<expr>)

Note also that we do have to distinguish term from prefix, since they leave
the lexer in a different state of expectation afterwards.

So we end up not with:

    postfix:<>
    infix:<
    term:<>

but rather

    postfix:<lt><mumble><gt>
    infix:<lt>
    term:<lt><expr><gt>

Now you can finally have your

    infix:<ws>

operator.  :-)

It will, of course, totally break the lexer.  Your choice, though.

Larry

Re: Perl6 Operator List

Reply via email to