New EEP draft: Pinning operator ^ in patterns

Richard Carlsson Thu, 24 Dec 2020 13:37:09 -0800

The ^ operator allows you to annotate already-bound pattern variables as
^X, like in Elixir. This is less error prone when code is being refactored
and moved around so that variables previously new in a pattern may become
bound, or vice versa, and makes it easier for the reader to see the intent
of the code.


See also https://github.com/erlang/otp/pull/2951

Ho ho ho,

        /Richard & the good folks at WhatsApp

    Author: Richard carlsson <carlsson.richard(at)gmail(dot)com>
    Status: Draft
    Type: Standards Track
    Created: 21-Dec-2020
    Erlang-Version: 24
    Post-History: 24-Dec-2020
****
EEP XXX: Pinning operator ^ in patterns
----


Abstract
========

This EEP proposes the addition of a new unary operator `^` for
explicitly marking variables in patterns as being already bound.  This
is known as "pinning" in Elixir - see [Elixir doc][the Elixir
documentation].

For example:

    f(X, Y) ->
        case X of
            {a, Y} -> ok;
            _ -> error
        end.

could be written more explicitly:

    f(X, Y) ->
        case X of
            {a, ^Y} -> ok;
            _ -> error
        end.

In Elixir, this operator is strictly necessary for being able to refer
to the value of a bound variable as part of a pattern, because
variables in patterns are always regarded as being new shadowing
instances (like in Erlang's fun clause heads), unless explicitly
pinned.

In Erlang, they would be optional, but are still a good idea because
they make programs more robust under edits and refactorings, and
furthermore allow the use of pinned variables in fun clause heads and
in comprehension generator patterns.


Specification
=============

A new unary operator `^` is added to Erlang, called the "pinning
operator".  It may only be used in patterns, and only on variables.
Its meaning is that the "pinned" variable is to be interpreted in the
enclosing environment of the pattern, and its value used in its place
for that position in the pattern.

In current Erlang, this behaviour is what happens automatically in
ordinary matching constructs if the variable is already bound in the
enclosing environment.  In the following example:

    f(X, Y) ->
        case X of
            {a, Y} -> {ok, Y};
            _ -> error
        end.

the use of `Y` in the pattern is regarded as a reference to the
function parameter `Y`, instead of as introducing a new variable, and
the `Y` in the clause body is then that same parameter.  Therefore,
annotating the pattern variable as `^Y` in this case does not change
the behaviour of the program, but makes the intent explicit:

    f(X, Y) ->
        case X of
            {a, ^Y} -> {ok, Y};
            _ -> error
        end.

For fun expressions and list comprehension generator patterns, the
pinning operator makes the language more expressive.  Take the
following Erlang code:

    f(X, Y) ->
        F = fun ({a, Y}) -> {ok, Y};
                (_) -> error
            end,
        F(X).

Here, the occurrence of `Y` in the clause head of the fun `F` is a new
variable instance, shadowing the `Y` parameter of `f(X, Y)`, and the
fun clause will match any value in that position.  The `Y` in the
clause body is the one bound in the clause head.  However, using the
pinning operator, we can selectively match on variables bound in the
outer scope:

    f(X, Y) ->
        F = fun ({a, ^Y})  -> {ok, Y};
                (_) -> error
            end,
        F(X).

In this case, there is no new binding of `Y`, and the use of `Y` in
the fun clause body refers to the function parameter.  But it is also
possible to combine pinning and shadowing in the same pattern:

    f(X, Y) ->
        F = fun ({a, ^Y, Y})  -> {ok, Y};
                (_) -> error
            end,
        F(X).

In this case, the pinned field refers to the value of the function
function parameter, but there is also a new shadowing binding of `Y`
to the third field of the tuple.  The use in the fun clause body now
refers to the shadowing instance.

Generator patterns in list comprehensions or binary comprehensions
follow the same rules as fun clause heads, so with pinning we can for
example write the following code:

    f(X, Y) ->
        [{b, Y} || {a, ^Y, Y} <- X].

where the `Y` in `{b, Y}` is the shadowing instance bound to the third
element of the pattern tuple.

Finally, a new compiler flag `warn_unpinned_vars` is added, disabled
by default, which if enabled makes the compiler emit warnings about
all uses of already bound variables in patterns that are not
explicitly annotated with the `^` operator.  This allows users to
migrate their code module by module towards using explicit pinning in
all their code.  If pinning becomes the norm in Erlang, this flag
could be turned on by default, and eventually, the pinning operator
could become strictly required for referring to already bound
variables in patterns.


Rationale
=========

The explicit pinning of variables in patterns make programs more
readable, because the intent of the code becomes clear.  When already
bound variables are used in Erlang without any annotation, anyone
reading a piece of code must first study it closely to understand
which variables will be bound at the point of a pattern, before they
can tell whether any pattern variable is a new binding or implies an
equality assertion.  This is easy to miss even for experienced
Erlangers, be it during code reviews or while trying to understand a
piece of poorly commented code.

Perhaps more importantly, pinning also makes programs more robust
under edits and refactorings.  Take our previous example, and add a
print statement:

    f(X, Y) ->
        io:format("checking: ~p", [Y]),
        case X of
            {a, Y} -> {ok, Y};
            _ -> error
        end.

Suppose someone renames the function parameter from `Y` to `Z` and
updates the print statement but forgets to update the use in the case
clause.  Without an explicit pinning annotation, the change would be
quietly allowed, but the `Y` in the pattern would be interpreted as a
new variable that will match any value, which will then be used in the
body.  This changes the behaviour of the program.  If the use in the
pattern had been annotated as `^Y`, the compiler would have generated
an error "Y is unbound" and the mistake would have been caught.

When code is being modified to add a feature or fix a bug, a
programmer might want to introduce a new variable for a temporary
result.  In a long function body, this risks introducing a new bug.
Consider the following:

    g(Stuff) ->
       ...
       Thing = case ... of
                   {a, T} -> T;
                   _ -> 0
               end,
       ...
       {ok, [Thing|Stuff]}.

Here, `T` is a new variable, clearly intended as just a temporary and
local variable for extracting the second element of the tuple.  But
suppose that someone adds a binding of the name `T` further up in the
function body, without noticing that the name is already in use:

    g(Stuff) ->
       ...
       T = q(Stuff) + 1,
       io:format("~p", [p(T)]),
       ...
       Thing = case ... of
                   {a, T} -> T;
                   _ -> 0
               end,
       ...
       {ok, [Thing|Stuff]}.

Now the first clause of the case switch will only match if the second
element of the tuple has the exact same value as the previously
defined `T`.  Again, the compiler quietly accepts this change, while
if it had been instructed to warn about all non-annotated uses of
already bound variables in patterns, this mistake would have been
detected.


Shadowing in Funs and Comprehensions
------------------------------------

In funs and comprehensions, pinning also lets us do things that
otherwise requires additional temporary variables.  Consider the
following code:

    f(X, Y) ->
        F = fun ({a, Y}) -> {ok, Y};
                (_) -> error
            end,
        F(X).

Since the `Y` in the clause head of the fun is a new shadowing
instance, the pattern will match any value in that position.  To match
only the value passed as `Y` to `f`, a clause guard must be added, and
a temporary variable be used to access the outer `Y`:

    f(X, Y) ->
        OuterY = Y,
        F = fun ({a, Y}) when Y =:= OuterY -> {ok, Y};
                (_) -> error
            end,
        F(X).

We could instead rename the inner use of `Y` to avoid shadowing, but
the equality test must still be written as an explicit guard:

    f(X, Y) ->
        F = fun ({a, Z}) when Z =:= Y -> {ok, Y};
                (_) -> error
            end,
        F(X).

With the help of the pinning operator, such things are no longer a
concern, and we can simply write:

    f(X, Y) ->
        F = fun ({a, ^Y}) -> {ok, Y};
                (_) -> error
            end,
        F(X).

Furthermore, in the odd case that the pattern would both need to
access the surrounding definition of `Y` as well as introduce a new
shadowing binding, this can be easily written using pinning:

    f(X, Y) ->
        F = fun ({a, ^Y, Y})  -> {ok, Y};
                (_) -> error
            end,
        F(X).

but in current Erlang, two separate temporary variables would be
required:

    f(X, Y) ->
        OuterY = Y,
        F = fun ({a, Temp, Y}) when Temp =:= OuterY -> {ok, Y};
                (_) -> error
            end,
        F(X).

As explained before, the same goes for patterns in generators of
comprehensions.



Backwards Compatibility
=======================

The addition of a new and previously unused operator `^` does not
affect the meaning of existing code, and the compiler will not emit
any new warnings or errors for existing code, unless explicitly
enabled with `warn_unpinned_vars`.  This change is therefore fully
backwards compatible.



Implementation
==============

The implementation can be found in [PR #2951][pr].



Copyright
=========

This document has been placed in the public domain.


[Elixir doc]: https://elixir-lang.org/getting-started/pattern-matching.html#the-pin-operator
    "Elixir pattern matching - the pin operator"

[pr]: https://github.com/erlang/otp/pull/2951
    "#2951: Add a new operator ^ for pinning of pattern variables"



[EmacsVar]: <> "Local Variables:"
[EmacsVar]: <> "mode: indented-text"
[EmacsVar]: <> "indent-tabs-mode: nil"
[EmacsVar]: <> "sentence-end-double-space: t"
[EmacsVar]: <> "fill-column: 70"
[EmacsVar]: <> "coding: utf-8"
[EmacsVar]: <> "End:"
[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "

_______________________________________________
eeps mailing list
[email protected]
http://erlang.org/mailman/listinfo/eeps

New EEP draft: Pinning operator ^ in patterns

Reply via email to