Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Larry Garfield Thu, 22 Feb 2024 15:57:52 -0800

On Wed, Feb 21, 2024, at 11:02 PM, Matthew Weier O'Phinney wrote:
> On Wed, Feb 21, 2024 at 12:57 PM Larry Garfield <la...@garfieldtech.com> 
> wrote:
>> After much on-again/off-again work, Ilija and I are back with a more 
>> polished property access hooks/interface properties RFC.  It’s 99% unchanged 
>> from last summer; the PR is now essentially complete and more robust, and we 
>> were able to squish the last remaining edge cases.
>> 
>> Baring any major changes, we plan to bring this to a vote in mid-March.
>> 
>> https://wiki.php.net/rfc/property-hooks
>> 
>> It’s long, but that’s because we’re handling every edge case we could think 
>> of.  Properties involve dealing with both references and inheritance, both 
>> of which have complex implications.  We believe we’ve identified the most 
>> logical handling for all cases, though.
>
> Once again in reading the proposal, the first thing I'm struck by are 
> the magic "$field" and "$value" variables inside accessors. The first 
> time they are used, they're used without explanation, and they're 
> jarring.
>
> Additionally, once you start defining the behavior of accessors... you 
> don't start with the basics, but instead jump into some of the more 
> esoteric usage, which does nothing to help with the questions I have.
>
> So, first:
>
> - Start with the most basic, most expected usage for each of reading 
> and writing properties.
> - I need a better argument for why the $field and $value variables 
> exist. Saying they're macros doesn't help those not deep into 
> internals. As a user, why do they exist?

For $field, it's not a requirement.  It's mostly for copy-paste convenience.  A 
number of people have struggled on this point so if the consensus is to leave 
out $field and just use $this->propName directly, we can accept that.  They can 
be re-added if reusable hook packages are added in the future (as noted in 
Future Scope).

For $value, it's to avoid boilerplate.  For the majority case, you'll be just 
operating on an individual value trivially.  Checking it's range, or 
uppercasing it, or whatever.  Requiring the developer to provide a name 
explicitly is just extra work; it's much the same as how PHP doesn't require 
you to pass $this as the first argument to a method explicitly, the way Python 
and Rust do.  It's just understood that $this exists, and once you learn that 
it's obvious.

On the occasions where you do want to specify an alternate name for some 
reason, or more likely you're providing a wider type, you can.  But in the 
typical case it would just be one more thing for the dev to have to type out.  
This is especially true in what I expect to be a common case, which is promoted 
constructor arguments with an extra validator set hook on them.

It also introduces some ambiguity.  If I specify only the name, does that mean 
I'm widening the type to mixed?  Or just that I'm omitting the name?  If 
specifying the name is rare, that's not really a big deal.  If it's required in 
every case, it's a confusion point  in every case.

In the interest of transparency. for comparison:

* Kotlin appears to require an argument name, but by convention recommends 
using "value".
* Swift makes it optional, with a default name of "newValue".  (Same logic as 
in the RFC.)
* C# ... as far as I can tell, doesn't support a custom name at all; it's 
always called "value", implicitly.

> Second: you don't have examples of defining BOTH get and set OTHER than 
> when using expressions for both accessors or a mix. I'm actually 
> unclear what the syntax is when both are defined. Is there supposed to 
> be a `;` terminating each? Or  a `,`? Or just an empty line? Again, 
> this is one of the more common scenarios. It needs to be covered early, 
> and clearly.

... huh.  I thought we had one in there somewhere.  I will add one, thanks.  
Though to clarify, there's no separator character.

public string $foo {
    get {
        // ...
    }
    set {
        // ...
    }
}

> Third: the caveats around usage with arrays... give me pause. While I'm 
> personally trying to not use arrays as much as possible, a lot of code 
> I see or contribute to still does, and the fact that an array property 
> that uses a write accessor doesn't allow the same level of access as a 
> normal array property is something I see leading to a lot of confusion 
> and errors. I don't have a solution, but I worry that this one thing 
> alone could be enough to prevent the passage of the RFC.

We completely agree that it's a suboptimal situation.  But as explained, it is 
the way it is because it's not possible (as far as we can tell) to fully 
support hooks on array properties.  If you can think of one, please share, 
because we'd love to make this part better.  I don't like it either, but we 
haven't found a way around it.  And that caveat doesn't seem like a good enough 
reason to not support hooks everywhere we actually can.

> Fourth: the syntax around inheritance is not intuitive, as it does not 
> work in the same way as the rest of the language. I'm talking about 
> this:
>
>     public int $x {
>         get: 2 * parent::$x::get()
>     }
>
> Why do we need to use the accessors here? Why wouldn't it just be 
> `parent::$x`?

Almost.  Ilija spent some time looking into this, and it's possible with 
caveats.

First, there's then no way to differentiate between "access parent hook" and 
"read the static property $x on the parent".  Arguably that's not a common 
case, but it is a point of confusion.

The larger issue is that parent::$x can't be just a stand-in for the backing 
value in all cases.  While supporting that for the = operator is 
straightforward enough, it wouldn't give us access to ++, --, <=, and the dozen 
or so other operators that could conceivably apply.  In theory those could all 
be implemented manually, but Ilija described that as "hundreds to thousands of 
lines of code" to do, which... is not time or code well spent. :-)  Especially 
as this is a very edge-case situation to begin with.  (In practice, I expect 
auto-generated ORM proxy code to be the primary user of accessing a parent 
property from a child hook.  I can think of very few other cases where I'd want 
to use it.)

So we have the choice between making $a = parent::$prop and parent::$prop = $a 
work, but *nothing else*, inexplicably (creating confusion) or the slightly 
longer syntax that wouldn't support those other operations anyway so there's no 
confusion.

We feel the current approach is the better trade off, but if the consensus 
generally is for the shorter-but-inconsistent syntax, that can be changed.

> I want to be clear: I really like the idea behind this feature, and 
> overall, I appreciate the design. From a user perspective, though, the 
> above are things that I found jarring as they vary quite a bit from our 
> current language design.
>> 
>> Note the FAQ question at the end, which explains some design choices.
>> 
>> There’s one outstanding question, which is slightly painful to ask: 
>> Originally, this RFC was called “property accessors,” which is the 
>> terminology used by most languages.  During early development, when we had 4 
>> accessors like Swift, we changed the name to “hooks” to better indicate that 
>> one was “hooking into” the property lifecycle.  However, later refinement 
>> brought it back down to 2 operations, get and set.  That makes the “hooks” 
>> name less applicable, and inconsistent with what other languages call it.
>> 
>> However, changing it back at this point would be a non-small amount of grunt 
>> work. There would be no functional changes from doing so, but it’s lots of 
>> renaming things both in the PR and the RFC. We are willing to do so if the 
>> consensus is that it would be beneficial, but want to ask before putting in 
>> the effort.
>
> I personally would go with just "accessors". 

On Thu, Feb 22, 2024, at 12:06 AM, Deleu wrote:

> This is a long, huge and comprehensive work, congratz to the authors.
>
> It clearly shows that so much thought and work has been put into it 
> that it makes me cautious to even ask for further clarification. 
>
>> Javascript have similar features via a different syntax, although that 
>> syntax would not be viable for PHP
>
> Why not?

> It feels quite a natural syntax for PHP and from someone oblivious to 
> the internal work, it appears to be a slight marginal change to the 
> existing RFC. Given the extensive work of this RFC, it seems pretty 
> obvious that this syntax will not work, I just don't know why.

I've added an FAQ section explaining why the Python/JS approach wouldn't really 
work.  To be clear, Ilija and I spent 100+ hours doing research and design 
before we started implementation (back in mid-late 2022).  We did seriously 
consider the JS-style syntax, but in the end we found it created more problems 
than it solves.  For the type of language PHP is (explicit typed properties), 
doing it on the property itself is a much cleaner approach.

On Thu, Feb 22, 2024, at 1:14 AM, Robert Landers wrote:

> I apologize if this has already been covered:
>
>> There are two shorthand notations supported, beyond the optional argument to 
>> set.
>> First, if a hook's body is a single expression, then the { } and return 
>> statement may be omitted and replaced with =>, just like with arrow 
>> functions.
>
> Does => do any special auto-capturing of variables like arrow
> functions or is it just a shorthand? 

No, there is nothing to capture.  Inside the hook body you have $this the same 
as any other method.

> Also, is this a meaningful
> shorthand to the example a little further down:
>
> public string $phone {
>     set = $this->sanitizePhone(...);
> }
>
> or do we always have to write it out?
>
> public string $phone {
>     set => $field = $this->sanitizePhone($value);
> }

Currently no, the set is always void; you have to write the value yourself.  
See the FAQ section on the topic for a detailed explanation.

However, I just had a long discussion with Ilija and there is one possibility 
we could consider: Use the return value only on the shorthand 
(arrow-function-like) syntax.

So you could do either of these, which would be equivalent:

set {
  $this->phone = $this->santizePhone($value);
}

set => $this->santizePhone($value);

This would have the advantage of offering a return-to-set mechanism, as well as 
even shorter syntax in the simple case (and no question of $field vs 
$this->propName).  But it would have the disadvantage of being potentially 
inconsistent between the short and long version.  It also would mean the short 
version is incompatible with virtual properties; using a short-set would create 
a backing value, so it's non-virtual.  But since "simple validation, maybe in a 
promoted constructor property" is likely to be one of the main use cases, it 
would simplify that case.

Not sure if that's a trade up or not.  Thoughts from the list?

> Would __PROPERTY__ be set inside sanitizePhone() as well?

No.  Like __CLASS__, it is materialized at compile time and has no meaning 
outside of its intended scope.

> You mention several ways values are displayed (whether or not they use
> the get-hook), but what does the default implementation of
> `__debugInfo()` look like now (or is that out of scope or a silly
> question?)

var_dump() shows the underlying backing value, bypassing get, since it's 
debugging the object state.  If you implement __debugInfo(), that's a method 
like any other and you can do what you like, though you'll be reading through 
get hooks (just like using __serialize()).

> For attributes, it would be nice to be able to target hooks
> specifically with attributes instead of also all methods (e.g., a
> Attribute::TARGET_GET_HOOK const). For example, if I were writing a
> serialization library, I may want to specify #[UseRawValue] only on
> getters to ensure that only the raw value is serialized instead of the
> getter (which may be specific to the application logic, or
> #[GetFromMethod] to tell the serialization library to get the value
> from a completely different method. It wouldn't make sense to target
> just any method with that attribute.

This feels very niche, honestly.  Those would naturally have to be sub-cases of 
TARGET_METHOD anyway, so method-targeted attributes would need to be supported 
regardless.  That makes hook-specific-targeting an easy and non-breaking add-on 
for a future RFC if it turns out to be useful in practice.

--Larry Garfield

Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Reply via email to