[elixir-core:11436] Re: [Proposal] Overload capture operator to support tagged variable captures

Christopher Keele Wed, 28 Jun 2023 17:22:34 -0700

An obvious modification to this proposal would be to introduce a new 
operator for this application, instead of overloading the capture operator. 
I avoided it because I liked the association with capturing *something*, I 
like keeping the language surface area slim, and I'm not competent enough 
with leex tokenizing or erlang to implement an entirely new construct in 
the parser, so repurposing & in my prototype got me very far.


However, I would like to make it clear that a new operator is on the table 
for this proposal. It would allow us to define more precise precedence, 
binding, and parsing rules without lots of checks in the compiler for what 
we are capturing.

AFAICT remaining punctuation characters on the standard english QWERTY 
keyboard that do not correspond to semantically meaningful tokens in Elixir 
are the tilde (`), the question mark (?), and the dollar sign ($).

   - Using the tilde is interesting/complicated because of its association 
   with macro unquoting in other languages. I don't like how difficult it is 
   to notice visually compared to what this syntax is doing.
   - Using the question mark is problematic because of its association with 
   optionality and null-chaining-avoidance in other languages. I don't like 
   how it semantically overlaps with the convention of ending predicate method 
   names in Elixir with it.
   - Using the dollar sign is promising. It is associated with all sorts of 
   odd jobs in other languages, such as the rarely used global variable sigil 
   in ruby, the very used local variable sigil in php, and the DOM ID accessor 
   in JavaScript browsers. However, I don't like how it is a very localized 
   character, so might be difficult to type on international or regional 
   keyboards. However, in my research on keyboards over the years, most the 
   commonly-used keyboards have it, and because of the prevalence in other 
   language's syntax, most programmers have solutions to typing it regardless.


On Wednesday, June 28, 2023 at 6:56:18 PM UTC-5 Christopher Keele wrote:

> This is a formalization of my concept here 
> <https://groups.google.com/g/elixir-lang-core/c/oFbaOT7rTeU/m/BWF24zoAAgAJ>, 
> as a first-class proposal for explicit discussion/feedback, since I now 
> have a working prototype 
> <https://github.com/elixir-lang/elixir/compare/main...christhekeele:elixir:tagged-variable-capture>
> .
>
> *Goal*
>
> The aim of this proposal is to support a commonly-requested feature: 
> *short-hand 
> construction and pattern matching of key/value pairs of associative data 
> structures, based on variable names* in the current scope.
>
> *Context*
>
> Similar shorthand syntax sugar exists in many programming languages today, 
> known variously as:
>
>    - Field Punning <https://dev.realworldocaml.org/records.html> — OCaml
>    - Record Puns 
>    <https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/record_puns.html> 
>    — Haskell
>    - Object Property Value Shorthand 
>    
> <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#property_definitions>
>  
>    — ES6 Javascript
>
> This feature has been in discussion for a decade, on this mailing list (1 
> <https://groups.google.com/g/elixir-lang-core/c/4w9eOeLvt-8/m/WOkoPSMm6kEJ>, 
> 2 
> <https://groups.google.com/g/elixir-lang-core/c/NoUo2gqQR3I/m/WTpArTGMKSIJ>, 
> 3 
> <https://groups.google.com/g/elixir-lang-core/c/3XrVXEVSixc/m/NHU2M4QFAQAJ>, 
> 4 
> <https://groups.google.com/g/elixir-lang-core/c/OvSQkvXxsmk/m/bKKHbBxiCwAJ>, 
> 5 
> <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/1W-d_XAlBgAJ>
> , 6 <https://groups.google.com/g/elixir-lang-core/c/oFbaOT7rTeU>) and the 
> Elixir forum (1 
> <https://elixirforum.com/t/proposal-add-field-puns-map-shorthand-to-elixir/15452>,
>  
> 2 
> <https://elixirforum.com/t/shorthand-for-passing-variables-by-name/30583>, 
> 3 
> <https://elixirforum.com/t/if-you-could-change-one-thing-in-elixir-language-what-you-would-change/19902/17>,
>  
> 4 
> <https://elixirforum.com/t/has-map-shorthand-syntax-in-other-languages-caused-you-any-problems/15403>,
>  
> 5 
> <https://elixirforum.com/t/es6-ish-property-value-shorthands-for-maps/1524>, 
> 6 
> <https://elixirforum.com/t/struct-creation-pattern-matching-short-hand/7544>),
>  
> and has motivated many libraries (1 
> <https://github.com/whatyouhide/short_maps>, 2 
> <https://github.com/meyercm/shorter_maps>, 3 
> <https://hex.pm/packages/shorthand>, 4 <https://hex.pm/packages/synex>). 
> These narrow margins cannot fit the full history of possibilities, 
> proposals, and problems with this feature, and I will not attempt to 
> summarize them all. For context, I suggest reading this mailing list 
> proposal 
> <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/1W-d_XAlBgAJ> 
> and this community discussion 
> <https://elixirforum.com/t/proposal-add-field-puns-map-shorthand-to-elixir/15452>
>  in 
> particular.
>
> However, in summary, this particular proposal tries to solve a couple of 
> past sticking points:
>
>    1. Atom vs String 
>    
> <https://groups.google.com/g/elixir-lang-core/c/NoUo2gqQR3I/m/IpZQHbZk4xEJ> 
>    key support
>    2. Visual clarity 
>    
> <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/NBkAVto0BAAJ> 
>    that atom/string matching is occurring
>    3. Limitations of string-based sigil parsing 
>    <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/TiZw6xM3BAAJ>
>    4. Easy confusion 
>    
> <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/WRhXxHDfBAAJ> 
>    with tuples
>
> I have a working fork of Elixir here 
> <https://github.com/christhekeele/elixir/tree/tagged-variable-capture> 
> where this proposed syntax can be experimented with. Be warned, it is buggy.
>
> *Proposal: Tagged Variable Captures*
>
> I propose we overload the unary capture operator (*&*) to accept 
> compile-time atoms and strings as arguments, for example *&:foo* and 
> *&"bar"*. This would *expand at compile time* into *a tagged tuple with 
> the atom/string and a variable reference*. For now, I am calling this a 
> *"tagged-variable 
> capture"*  to differentiate it from a function capture.
>
> For the purposes of this proposal, assume:
>
> {foo, bar} = {1, 2}
>
> Additionally,
>
>    - Lines beginning with # ==  indicate what the compiler expands an 
>    expression to.
>    - Lines beginning with # =>  represent the result of evaluating that 
>    expression.
>    - Lines beginning with *# !> * represent an exception.
>
> *Bare Captures*
>
> I'm not sure if we should support *bare* tagged-variable capture, but it 
> is illustrative for this proposal, so I left it in my prototype. It would 
> look like:
>
> &:foo
> # == *{:foo, foo}*
> # => {:foo, 1}
> &"foo"
> # == *{"foo", foo}*
> # => {"foo", 1}
>
> If bare usage is supported, this expansion would work as expected in match 
> and guard contexts as well, since it expands before variable references are 
> resolved:
>
> {:foo, baz} = &:foo
> *# == {:foo, baz} = {:foo, foo}*
> # => {:foo, 1}
> baz
> # => 1
>
> *List Captures*
>
> Since capture expressions are allowed in lists, this can be used to 
> construct Keyword lists from the local variable scope elegantly:
>
> list = [&:foo, &:bar]
> # == *list = [{:foo, foo}, {:bar, bar}]*
> # => [foo: 1, bar: 2]
>
> This would work with other list operators like *|*:
>
> baz = 3
> list = [&:baz | list]
> # == *list = [**{:baz, baz} **| **list**]*
> # => [baz: 3, foo: 1, bar: 2]
>
> And list destructuring:
>
> {foo, bar, baz} = {nil, nil, nil}
> [&:baz, &:foo, &:bar] = list
> *# == [{:baz, baz}, {:foo, foo}, {:bar, bar}] = list*
> # => [baz: 3, foo: 1, bar: 2]
> {foo, bar, baz}
> # => {1, 2, 3}
>
> *Map Captures*
>
> With a small change to the parser, 
> <https://github.com/elixir-lang/elixir/commit/0a4f5376c0f9b4db7d71514d05df6b8b6abc96a9>
>  
> we can allow this expression inside map literals. Because this expression 
> individually gets expanded into a tagged-tuple before the map associations 
> list as a whole are processed, it allow this syntax to work in all existing 
> map/struct constructs, like map construction:
>
> map = %{&:foo, &"bar"}
> *# == %{:foo => foo, "bar" => bar}*
> # => %{:foo => 1, "bar" => 2}
>
> Map updates:
>
> foo = 3
> map = %{map | &:foo}
> *# == %{map | :foo => foo}*
> # => %{:foo => 3, "bar" => 2}
>
> And map destructuring:
>
> {foo, bar} = {nil, nil}
> %{&:foo, &"bar"} = map
> *# == %{:foo => foo, "bar" => bar} = map*
> # => %{:foo => 3, "bar" => 2}
> {foo, bar}
> # => {3, 2}
>
> *Considerations*
>
> Though just based on an errant thought 
> <https://groups.google.com/g/elixir-lang-core/c/oFbaOT7rTeU/m/BWF24zoAAgAJ> 
> that popped into my head yesterday, I'm unreasonably pleased with how well 
> this works and reads in practice. I will present my thoughts here, though 
> again I encourage you to grab my branch 
> <https://github.com/christhekeele/elixir/tree/tagged-variable-capture>, 
> compile 
> it from source 
> <https://github.com/christhekeele/elixir/tree/tagged-variable-capture#compiling-from-source>,
>  and 
> play with it yourself!
>
> *Pro: solves existing pain points*
>
> As mentioned, this solves flaws previous proposals suffer from:
>
>    1. Atom vs String 
>    
> <https://groups.google.com/g/elixir-lang-core/c/NoUo2gqQR3I/m/IpZQHbZk4xEJ> 
> key 
>    support
>    This supports both.
>    2. Visual clarity 
>    
> <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/NBkAVto0BAAJ> 
> that 
>    atom/string matching is occurring
>    This leverages the appropriate literal in question within the syntax 
>    sugar.
>    3. Limitations of string-based sigil parsing 
>    <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/TiZw6xM3BAAJ>
>    This is compiler-expansion-native.
>    4. Easy confusion 
>    
> <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/WRhXxHDfBAAJ> 
> with 
>    tuples
>    %{&:foo, &"bar"} is very different from {foo, bar}, instead of 
>    1-character different.
>    
> Additionally, it solves my main complaint with historical proposals: 
> syntax to combine a variable identifier with a literal must either obscure 
> that we are building an identifier, or obscure the key/string typing of the 
> literal.
>
> I'm proposing overloading the capture operator rather than introducing a 
> new operator because the capture operator already has a semantic 
> association with messing with variable scope, via the nested integer-based 
> positional function argument syntax (ex *& &1*).
>
> By using the capture operator we indicate that we are messing with an 
> identifier in scope, but via a literal atom/string we want to associate 
> with, to get the best of both worlds.
>
> *Pro: works with existing code*
>
> The capture today operator has well-defined compile-time-error semantics 
> if you try to pass it an atom or a string. All compiling Elixir code today 
> will continue to compile as before.
>
> *Pro: works with existing tooling*
>
> By overloading an existing operator, this approach works seamlessly for me 
> with the syntax highlighters I have tried it with so far, and reasonable 
> with the formatter.
>
> In my experimentation I've found that the formatter wants to rewrite *&:baz 
> *to *(&:baz)* pretty often. That's good, because there are several edge 
> cases in my prototype where not doing so causes it to behave strangely; I'm 
> sure it's resolving ambiguities that would occur in function captures that 
> impact my proposal in ways I have yet fully anticipated.
>
> *Pros: minimizes surface area of the language*
>
> By overriding the capture operator instead of introducing a new operator 
> or sigil, we are able to keep the surface area of this feature slim.
>
> *Cons: overloads the capture operator*
>
> Of course, much of the virtues of this proposal comes from overloading the 
> capture operator. But it is an already semantically fraught syntactic sugar 
> construct that causes confusion to newcomers, and this would place more 
> strain on it.
>
> We would need to augment it with more than the meager error message 
> modification 
> <https://github.com/elixir-lang/elixir/commit/3d83d21ada860d03cece8c6f90dbcf7bf9e737ec#diff-92b98063d1e86837fae15261896c265ab502b8d556141aaf1c34e67a3ef3717cL199-R207>
>  in 
> my prototype, as well as documentation and anticipate a new wave of 
> questions from the community upon release.
>
> This inelegance really shows when considering embedding a tagged variable 
> capture inside an anonymous function capture, ex *& &1 = &:foo*. In my 
> prototype I've chosen to allow this rather than error on "nested captures 
> not allowed" (would probably become: "nested *function* captures not 
> allowed"), but I'm not sure I found all the edge-cases of mixing them in 
> all possible constructions.
>
> Additionally, since my proposal now allows the capture operator as an 
> associative element inside map literal parsing, that would change the 
> syntax error reported by providing a function capture as an associative 
> element to be generated during expansion rather than during parsing. I am 
> not fluent enough in leex to have have updated the parser to preserve the 
> exact old error, but serendipitously what it reports in my prototype today 
> is pretty good regardless, but I prefer the old behaviour:
>
> Old:
> %{& &1}
> # !> ** (SyntaxError) syntax error before '}'
> # !> |
> # !> 1 | %{& &1}
> # !> | ^
> New:
> %{& &1}
>
> *# => error: expected key-value pairs in a map, got: & &1*
> *# => ** (CompileError) cannot compile code (errors have been logged)*
>
> *Cons: here there be dragons I cannot see*
>
> I'm quite sure a full implementation would require a lot more knowledge of 
> the compiler than I am able to provide. For example, *&:foo = &:foo *raises 
> an exception where *(&:foo) = &:foo* behaves as expected. I also find the 
> variable/context/binding environment implementation in the erlang part of 
> the compiler during expansion to be impenetrable, and I'm sure my prototype 
> fails on edge cases there.
>
> *Open Question: the pin operator*
>
> As this feature constructs a variable ref for you, it is not clear if/how 
> we should support attempts to pin the generated variable to avoid new 
> bindings. In my prototype, I have tried to support the pin operator via the 
> *&^:atom *syntax, though I'm pretty sure it's super buggy on bare 
> out-of-data-structure cases and I only got it far enough to work in 
> function heads for basic function head map pattern matching.
>
> *Open Question: charlists*
>
> I did not add support for charlist tagged variable captures in my 
> prototype, as it would be more involved to differentiate a capture of list 
> mean to become a tagged tuple from a list representing the AST of a 
> function capture. I would not lose a lot of sleep over this.
>
> *Open Question: allowed contexts*
>
> Would we even want to allow this syntax construct outside of map literals? 
> Or list literals?
>
> I can certainly see people abusing the 
> bare-outside-of-associative-datastructure syntax to make some neigh 
> impenetrable code where it's really unclear where assignment and pattern 
> matching is occuring, and relatedly this is where I see a lot of odd 
> edge-case behaviour in my prototype. I allowed it to speed up the 
> implementation, but it merits more discussion.
>
> On the other hand, this does seem like an... interesting use-case:
>
> error = "rate limit exceeded"
> &:error # return error tuple
>
> *Thanks for reading! What do you think?*
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/eed446cd-2c0c-405e-a5e4-a25ccbec31b6n%40googlegroups.com.

[elixir-core:11436] Re: [Proposal] Overload capture operator to support tagged variable captures

Reply via email to