RFC 128 (v4) Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

Perl6 RFC Librarian Mon, 25 Sep 2000 11:24:27 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

=head1 VERSION

  Maintainer: Damian Conway <[EMAIL PROTECTED]>
  Date: 17 Aug 2000
  Last Modified: 25 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 128
  Version: 4
  Status: Frozen

=head1 ABSTRACT

This RFC proposes that subroutine argument context specifiers be
extended in several ways, including allowing parameters to be typed and
named, and that a syntax be provided for binding arguments to named
parameters.

=head1 CHANGES

Added section describing named parameter interaction with named higher-order
function placeholders.

=head1 DESCRIPTION

It is proposed that the existing subroutine "prototype" mechanism
be replaced by optional formal parameter lists that allow parameters
to be named and their contexts specified.

The syntax for this would be:

        sub subname ( type context(s) parameter_name : parameter_attributes ,
                      type context(s) parameter_name : parameter_attributes ,
                      type context(s) parameter_name : parameter_attributes ;
                      # end of required parameters
                      type context(s) parameter_name : parameter_attributes ,
                      # etc.
                    ) : subroutine_attributes
        { body }

Each of the four components of a parameter specification -- type,
context, name, and attributes -- would be optional.

=head2 Contexts

The context specifiers would be:

        $       parameter is scalar
        @       parameter is array (eats remaining args)
        %       parameter is hash (eats remaining args)
        /       parameter is qr'd string
        &       parameter is subroutine reference or block
        *       parameter is typeglob (assuming they still exist)
        ""      parameter is bareword or character string
        ()      parameter is an explicitly parenthesized list

Note that any of these specifiers may appear in any position in a
parameter list (especially C<&>, which would no longer be constrained to
the first position).


The following prefix context modifier would be available:

        \             parameter must be a reference,
                      argument is magically en-referenced if necessary


The following context attributes would be available:

        :lazy         argument is lazily evaluated

        :uncurried    (& only) terminate curry propagation on argument

        :noautoviv    that is a (possibly nested) hash element or array
                      element is not autovivified.

        :repeat{m,n}  argument is variadic within the specified range

The following subsections describe each of these in detail.


The following grouping operator would also be available:

        (...)   specifies that the argument(s) are to be 
                treated collectively (i.e. by modifiers and attributes)


=head3 Automagically en-referenced arguments

The C<\> modifier causes the modified parameter to automagically
convert its corresponding argument to a reference without list flattening.
The most common usage is in passing hashes and arrays as a single argument.

Note that the semantics of C<\> attribute would be altered
slightly from those of Perl 5, so that a reference is I<always> passed for
that parameter. It would, of course, retain its magical
en-referencing coercion:

        \$         argument must be scalar ref or start with $
                   scalar var magically en-referenced

        \@         argument must be array ref or start with @,
                   array var magically en-referenced

        \%         argument must be hash ref of start with %,
                   hash var magically en-referenced

        \/         argument must be qr'd string or /.../ or m/.../
                   /.../ or m/.../ magically qr'd to en-reference

        \&         arg must be sub reference, curried function, or block
                   block converted to anonymous sub ref

        \*         argument must be typeglob ref of start with *,
                   typeglob magically en-referenced

        \""        argument must be a string reference or a bareword,
                   bareword magically stringified and en-referenced

        \()        argument must be a parenthesized list or an anonymous
                   list constructor
                   parenthesized list is magically en-referenced



=head3 Lazy evaluation

If the C<lazy> attribute is used for a particular parameter, that parameter
is lazily evaluated. This means that it is only evaluated when the
corresponding named parameter (see below) -- or the corresponding element
of @_ -- is first accessed in some way, after which the evaluated value
is stored in the element in the usual way. Passing the parameter to another
subroutine or returning it as an lvalue does not count as an access.
Evaluating it in an C<eval> block always counts.

If the C<lazy> attribute is applied to a C<@> parameter (which eats the
remaining arguments), those remaining arguments are not evaluated
until the corresponding element of the array is accessed. Iteration
through such an array (i.e. in a C<for> or C<foreach>) only evaluates
one element per iteration.

If the C<lazy> attribute is applied to a C<%> parameter (which eats the
remaining arguments), the odd arguments (that are mapped to keys) are 
immediately evaluated, but the even arguments (that map to values)
are not evaluated until the corresponding entry of the hash is accessed.
Iteration through such a hash (i.e. via C<each> or C<values>) only
evaluates one element per iteration.

For example:

        sub firstdef(@:lazy) { defined($_) && return $_ for (@_); }

        sub enervate($:lazy) { return $_[0] }

        sub Klingon::OP_TERNARY ($,$:lazy,$:lazy) 
        {
                if ( $_[0]->debaseToTerran() ) { return eval{$_[1]} }
                return eval{$_[2]};
        }

Note the use of explicit C<eval>'s in the last example, to force the
lazy arguments to evaluate before being returned.


=head3 Controlling curry propagation

RFC 23 proposes the addition of higher order functions, via argument/operand
placeholders. However, when a subroutine call includes a curried argument,
there is an ambiguity as to how far "outwards" the currying should propagate.
For example:

        $num_nodes = traverse( $root, $sum += ^_ );

might mean:

        $num_nodes = sub{ traverse( $root, $sum += $_[0] ) };

if currying continued to the outermost subroutine, or:

        $num_nodes = traverse( $root, sub{$sum += $_[0]} );

if it were restricted to the second argument.

As the former interpretation is the proposed default behaviour, some
syntactic means of requesting the latter interpretation is required.

It is proposed that a parameter context attribute -- C<uncurried> -- be
added to handle this. Any parameter with the C<uncurried> attribute would
prevent curry propagation to the surrounding subroutine call.
Thus, with the declaration:

        sub traverse ($,$:uncurried);

the call:

        $num_nodes = traverse( $root, $sum += ^_ );

would be equivalent to:

        $num_nodes = traverse( $root, sub{$sum += $_[0]} );

whereas the declaration:

        sub traverse ($,$);

would allow the curried argument to "infect" the entire surrounding call:

        $num_nodes = sub{ traverse( $root, $sum += $_[0] ) };

Note that the curry control only applies to the argument whose parameter
has the C<uncurried> attribute. So:

        sub traverse ($,$:uncurried);
        $num_nodes = traverse( ^_ , $sum += ^_ );

means:

        $num_nodes = sub { traverse( $_[0], sub{$sum += $_[0]} ) };

The currying of the second argument is restricted to its argument slot, whilst
the currying of the first argument propagates outwards to encompass the entire
call to C<traverse>.


=head3 Variadic parameter lists

It would be possible to specify parameter lists consisting of an
arbitrary number of specified parameters, using the variadic attribute
C<repeat{I<m>,I<n>}>.

A parameter specification such as:

        sub max($:repeat{2,20}) { ... }

is equivalent to:

        sub max($,$;$,$,$,$,$,$,$,$,$,$,$,$,$,$,$,$,$,$) { ... }

That is, the C<:repeat> attribute specifies the range of arguments that
the specified (scalar) parameter may represent.
                
If I<m> is omitted it is zero; if I<n> is omitted it is ~0 (maximum
unsigned integer).

For example, to specify a subroutine named C<most> that takes two or more
magically enreferenced arrays and returns the one with the most elements:

        sub most ( \@:ref repeat{2,} ) {
                my $max = shift;
                for (@_) {
                        $max = $_ if @$max < @$_;
                }
                return @$max;
        }

        my @most = most @x, @y, @z;


Or consider a subroutine that takes an alternating sequence of pairs of:

=over 4

=item *

a lazily evaluated, non-curry-propagating scalar expressions, followed by

=item * 

a bareword

=back

which then returns the stringification of the first bareword following any
expression that evaluates to true:

        sub first ( ($:lazy uncurried, ""):repeat{,} ) {
                while (my ($true, $str) = splice @_, 0, 2) {
                        return $str if $true;
                }
        }

        my $first = first
                        $x < 10 => little,
                        $x < 20 => middle,
                        $x < 30 => large;

Note the use of grouping parentheses to cause the alternating
scalar/bareword sequence to be repeated.


=head3 Preventing argument autovivification

When entries of nested hashes are passed to a subroutine:

        func( $hash{key}{subkey}{subsubkey} );

the intermediate entries in the nested hash (i.e. C<$hash{key}> and
C<$hash{key}{subkey}> in the above example) are atovivified, whether or
not the argument value itself is every accessed within the subroutine.
This is particularly galling if one or more of the nested hashes is
undefined, since it means the higher-level entries will have keys
created unnecessarily.

Specifying the C<:noautoviv> attribute on a subroutine parameter
would cause the corresponding argument to be evaluated in a special
"non-autovivifying" context, unless it is used as an lvalue.

In such a non-autovivifying context, the non-existence of any
intermediate nested hash would cause the entire nested hash access to
immediately evaluate to C<undef>, without any autovivification.

For example:

        sub func1 ( $:noautoviv ) { ... }
        sub func2 ( $ )           { ... }

        my %hash;
        print keys %hash;                       # prints ""

        func1( $hash{key}{subkey} );
        print keys %hash;                       # prints ""

        func2( $hash{key}{subkey} );
        print keys %hash;                       # prints "key"


If the parameter is used in an lvalue manner within the subroutine:
then autovivification is still applied (at the point where the argument
is used as an lvalue). For example:

        sub func3 ( $:noautoviv ) {
                if (rand > 0.5) { $_[0] = 0 }   # autovivifies argument
                else            { print $_[0] } # does not autovivify argument
        }


        sub func4 ( \$:noautoviv ) {    # always autovivifies (compiler warning)
                ...
        }


Note that this implies that C<:noautoviv> parameters are automatically C<:lazy>.


=head3 Block parameters and arguments

As noted above, C<&> parameters could appear in any position in the parameter
list, allowing raw blocks as arguments anywhere in the argument list.

It is proposed that raw blocks that are subroutine arguments need not
be separated by commas from adjacent arguments (on either side):

        sub on ( "", & ) {
                $handler{$_[0]} = $_[1];
        }

        # and later...

        on Error::Numeric { die $@; };
        on Error::Range   { $_[0]--; };
        on Error          { ref($_[0])->handle(); };

Furthermore, it is proposed that if a subroutine's parameter list ends
in a C<&> and the subroutine is called in a void context, that the
following semi-colon be optional:

        on Error::Numeric {
                die $@;
        }

        on Error::Range {
                $_[0]--;
        }

        on Error {
                ref($_[0])->handle();
        }


=head3 Context classes

The revised syntax would also allow I<context classes> to be specified.
A context class aggregates two or more alternative contexts, allowing
any one of them to be the context for corresponding argument.

For example:

        sub mymap ([\/&$], @) {...}
        
Here, the first argument must be either a /.../ pattern (or qr), or a
block (or sub ref), or a scalar. In parsing that argument, the various
possible contexts are considered left-to-right and the first context
that allows the argument to be parsed is used.

Note that context classes may also have attributes:

        sub mymap ([\/&$]:lazy uncurried}, @) {...}

In this example, no matter what the first argument is, it is lazily evaluated
and does not propagate currying.

A context class may only contain context specifiers that yield scalar
parameters. Hence, a context class may contain any of the following
specifiers (any of which may also have C<lazy> or C<uncurried> attributes):

        $       /       \$      \/
        &       *       \&      \*      
        ""              \""     \()
                        \@      \%

but not:

        @       %       ()

A context class always yields a scalar parameter.


=head2 Parameter names

Each parameter may optionally (and independently) be given a name.
This name is specified after the parameter's context specifer.
The declaration of a parameter name creates a lexical variable of the
same name in the scope of the subroutine body. Named C<@> and C<%>
parameters create a lexical array or hash respectively. All other
named parameters create a lexical scalar.

For example:

        sub doublemap (&mapsub, @args) {        # creates my($mapsub,@args)
                my @mapped;
                push @mapped, $mapsub->(splice @args, 0, 2) while @args;
                return @mapped;
        }

Note that the context specifier can still be any valid specifier:

        sub lazymap ([&\/$]mapper : lazy uncurried, $max, @args:lazy) {
                my @mapped;
                switch (ref $mapper) {
                        case 'CODE'  { push @mapped, $mapper->(shift)
                                                while @args && $max--; }
                        case 'REGEX' { push @mapped, shift() =~ m/$mapper/
                                                while @args && $max--; }
                        case ''      { push @mapped, $mapper
                                                while @args && $max--; }
                }
                return @mapped;
        }


=head3 Named arguments

It is further proposed that arguments may be passed by name, and that
named arguments may be passed in any order.

An argument would be associated with a named parameter by prefixing it
with a standard Perl label (i.e. an identifier-colon sequence). For example:

        @mapped = doublemap(args: @list, mapsub: ^a+^b);

On encountering labelled arguments in a subroutine call, the interpreter
would examine the named parameters to determine their contexts,
evaluate ththe labelled arguments (in left-to-right sequence) in the
context specified by the corresponding named parameters (or I<not>
evaluate them for lazy contexts!). The resulting values would then be
assigned to the corresponding named parameters.

Any unlabelled arguments would then be evaluated and assigned (again in
left-to-right sequence) to any remaining parameters. Those nameless
evaluations would be carried out in the respective contexts specified by
the remaining parameters.

It would be an error to:

        * Define two named parameters with the same name, unless they
          can be distinguished by context. 

        * Label two arguments with the same name, unless there are 
          two context-distinguishable named parameters of that name.

If a subroutine was called with a labelled argument for which there was
no named parameter, the label would be ignored and the argument treated
as unlabelled, unless the subroutine had been declared with a
C<strict_args> attribute.


=head3 Interaction with named placeholders

It is further proposed that when named placeholders are used to curry a
function, the resulting subroutine would have named parameters. If the
curried function mixed named, ordinal, and anonymous placeholders, the
resulting subroutine would have a mixture of named and unnamed parameters.

For example:

        my $selector = ^condition ? ^2 : ^_;

would be equivalent to:

        my $selector = sub ($condition,$,$) { $condition ? $_[2] : $_[1] };

This would make currying out the condition clearer:

        my $select_on_val = $selector->(condition: $val);


=head2 Types

It is proposed that parameters may be given types: either the name of
a class, or the name of a builtin type (such as 'ARRAY', 'HASH',
'CODE', etc.)

If a parameter has a type (C<T>) then the following additional
constraints are placed upon it and its value:

=over 4

=item 1.

The parameter's specified (or implicit) context must yield a scalar value.

=item 2.

The scalar value of the bound argument (say, $val) must satisfy
C<UNIVERSAL::isa($val,'T')>.

=item 3.

If the parameter is named, the corresponding lexical variable will be
typed to class C<T>, unless C<T> is the name of a built-in type:
'SCALAR', 'HASH', 'CODE', etc.  (and maybe even then, if typed lexicals
were to be extended to built-in types)

=item 4.

If the subroutine has the attribute C<:multi>, then the typed parameter
takes part in the multiple dispatching of the subroutine (see forthcoming
RFC).

=back

For example:

        sub traverse (Tree $root, $subref:uncurried) {...}

This specifies that the first argument must be a Tree object, or an object of a
class derived from Tree. The corresponding lexical variable would be equivalent
to:

        my Tree $root;


=head3 Using builtin type names

The ability to specify the names of builtin types as parameter types offers
additional flexibility in controlling argument interpretation. For example,
the specification:

        sub demo(ARRAY $a, @b) {...}    # version 1

constrains the argument to be an array reference, but does not invoke a 
magical en-referencing context, the way this would:

        sub demo(\@a, @b) {...}         # version 2

Thus, a call like:

        demo(@LOL);

will succeed under version 1 (binding $LOL[0] to $a,
and the rest of @LOL to @b), provided $LOL[0] is an array reference.

Under version 2, the call to C<demo> would fail, since C<\@LOL> will be
bound to $a and there will be nothing left to bind to @b.



=head2 Banishment of the term "prototype"

It is further proposed that parameter lists I<never> be referred to
as "prototypes", and that use of the term be a flameworthy offence.
The preferred nomenclature would be "parameter list", or perhaps 
"signature".


=head1 MIGRATION ISSUES

This proposal has the potential to break a small number of cases
where a backslashed context specifier would now match a reference
argument that it previously complained about.

Also, the suggested regularization of semantics for backslash means
that a C<\$> argument is passed as a reference, not a value.


=head1 IMPLEMENTATION

Definitely S.E.P.


=head1 REFERENCES

RFC 21 (v1): Replace C<wantarray> with a generic C<want> function

RFC 22 (v1): Builtin switch statement

RFC 23 (v2): Higher order functions

RFC 57 (v1): Subroutine prototypes and parameters

RFC 84 (v1): Replace => (stringifying comma) with => (pair constructor)

RFC 97 (v1): prototype-based method overloading

[Numerous other RFC's make use of, or reference to, this mechanism]
RFC 128 (v4) Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

Reply via email to