Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Stephen Reay Fri, 01 Mar 2024 08:17:54 -0800

> On 28 Feb 2024, at 21:25, Larry Garfield <la...@garfieldtech.com> wrote:
> 
> On Wed, Feb 28, 2024, at 7:49 AM, Stephen Reay wrote:
> 
> *snip*
> 
>>> == Re virtual properties:
>>> 
>>> Ilija and I talked this through, and there's pros and cons to a `virtual` 
>>> keyword.  Ilija also suggested a `backed` keyword, which forces a backed 
>>> property to exist even if it's not used in the hook itself.
>>> 
>>> * Adding `virtual` adds more work for the developer, but more clarity.  It 
>>> would also mean $this->$propName or $this->{__PROPERTY__} would work "as 
>>> expected", since there's no auto-detection for virtual-ness.  On the 
>>> downside, if you have a could-be-virtual property but never actually use 
>>> the backing value, you have an extra backing value hanging around in memory 
>>> that is inaccessible normally, but will still show up in some serialization 
>>> formats, which could be unexpected.  If you omit one of the hooks and 
>>> forget to mark it virtual, you'll still get the default of the other 
>>> operation, which could be unexpected.  (Mostly this would be for a 
>>> virtual-get that accidentally has a default setter because you forgot to 
>>> mark it `virtual`.)
>>> * Doing autodetection as now, but with an added "make a backing value 
>>> anyway" flag would resolve the use case of "My set hook just calls a 
>>> method, and that method sets the property, but since the hook doesn't 
>>> mention the property it doesn't get created" problem.  It would also allow 
>>> for $this->$propName to work if a property is explicitly backed.  On the 
>>> flipside, it's one more thing to think about, and the above example it 
>>> solves would be trivially solved by having the method just return the value 
>>> to set and letting the set hook do the actual write, which is arguably 
>>> better and more reliable code anyway.
>>> * The status quo (auto-detection based on the presence of $this->propName). 
>>>  This has the advantage it "just works" in the 95% case, without having to 
>>> think about anything extra.  The downside is it does have some odd edge 
>>> cases, like needing $this->propName to be explicitly used.  
>>> 
>>> I don't think any is an obvious winner.  My personal preference would be 
>>> for status quo (auto-detect) or explicit-virtual always.  I could probably 
>>> live with either, though I think I'd personally favor status quo.  Thoughts 
>>> from others?
>>> 
>> 
>> I agree that a flag to make the field *virtual* (thus disabling the 
>> backing store) makes more sense than a flag to make it backed; It's 
>> also easier to understand when comparing hooked properties with regular 
>> properties (essentially, backed is the default, you have to opt-in to 
>> it being virtual). I don't think the edge cases of "auto" make it 
>> worthwhile just to not need "virtual".
> 
> Uh, I can't tell if you're saying "use status quo" or "use the virtual 
> keyword."  Please clarify. :-)

I'm saying I think an explicit `virtual` keyword is the better option of the 
three (because the benefits of the 'auto' mode don't outweigh the edge cases it 
introduces).
> 
>>> == Re reference-get
>>> 
>>> Allowing backed properties to have a reference return creates a situation 
>>> where any writes would then bypass the set hook, and thus any validation 
>>> implemented there.  That is, it makes the validation unreliable.  A major 
>>> footgun.  The question is, do we favor caveat-emptor flexibility or 
>>> correct-by-construction safety?  Personally I always lead toward the 
>>> latter, though PHP in general is... schizophrenic about it, I'd say. :-)
>>> 
>>> At this point, we'd much rather leave it blocked to avoid the issue; it's 
>>> easier to enable that loophole in the future if we really want it than to 
>>> get rid of it if it turns out to have been a bad idea.
>>> 
>>> There is one edge case that *might* make sense: If there is no set hook 
>>> defined, then there's no set hook to worry about bypassing.  So it may be 
>>> safe to allow &get on backed properties IFF there is no set hook.  I worry 
>>> that is "one more quirky edge case", though, so as above it may be better 
>>> to skip for now as it's easier to add later than remove.  But if the 
>>> consensus is to do that, we're open to it.  (Question for everyone.)
>>> 
>> 
>> I don't have strong feeling about this, but in general I usually tend 
>> to prefer options that are consistent, and give power/options to the 
>> developer. If references are opt-in anyway, I see that as accepting the 
>> trade-offs. If a developer doesn't want to allow by-ref modifications 
>> of the property, why would they make it referenceable in the first 
>> place? This sounds a bit like disallowing regular public properties 
>> because they might be modified outside the class - that's kind of the 
>> point, surely.
>> 
>>> == Re 
>>> 
>>> == Re arrays
>>> 
>>>> Regarding arrays, have you considered allowing array-index writes if
>>> an &get hook is defined? i.e. "$x->foo['bar'] = 42;" could be treated
>>> as semantically equivalent to "$_temp =& $x->foo; $_temp['bar'] = 42;
>>> unset($_temp);"
>>> 
>>> That's already discussed in the RFC:
>>> 
>>>> The simplest approach would be to copy the array, modify it accordingly, 
>>>> and pass it to set hook. This would have a large and obscure performance 
>>>> penalty, and a set implementation would have no way of knowing which 
>>>> values had changed, leading to, for instance, code needing to revalidate 
>>>> all elements even if only one has changed.
>>> 
>>> Unless we were OK with that bypassing the set hook entirely if defined, 
>>> which, as noted above, means any safety guarantees provided by a set hook 
>>> are bypassed, leading to untrustworthy code.
>>> 
>>> == Re hook shorthands and return values
>>> 
>>> Ilija and I have been discussing this for a bit, and we've both budged a 
>>> little. :-)  Here's our counter-proposal:
> 
> *snip*
> 
>> I think the examples given are clear, and the lack of the top-level 
>> short closure-esque version makes it more obvious. Forgive me, I must 
>> have missed some of the previous comments - is there a reason the 
>> 'full' setter can't return a value, for the sake of consistency? I 
>> understand that you don't want "return to set" to be the *only* option, 
>> for the sake of e.g. change/audit logging type functionality (i.e. set 
>> and then some action to record that the change was made), but it seems 
>> a little odd and inconsistent to me that the return value of a short 
>> closure would be used when the return value of the long version isn't. 
>> This isn't really a major issue, I'm just curious if there was some 
>> explanation about it?
> 
> Mainly because it introduces a lot more complexity.  As far as I'm aware, 
> determining at runtime whether or not to make use of the return value is 
> impossible.  (Ie, we cannot differentiate between "return", "no return 
> statement", and "return null".)  So it would require compile time branching 
> to generate two different pathways after lexically detecting if there is a 
> "return" token present somewhere in the hook body.  That is probably doable, 
> technically, but introduces more complexity to an already necessarily-large 
> RFC.  It's also something that is simple enough to add later as an option 
> without any BC breaks or implications for other parts of the design, so it's 
> safe to punt.  (Some things are harder to punt on than others, as noted, but 
> this one is easy/safe to skip for now.)
> 

Right, i hadn't considered that whole "everything implicitly returns, even if 
it's null" scenario. Makes sense. The inconsistency is still a bit jarring but 
I understand the reasoning now, thanks.

>>> == Re the $value variable in set
> 
> *snip*
> 
>>> So what makes the most sense to me is to keep $value optional, but IF you 
>>> specify an alternate name, you must also specify a type (which may be 
>>> wider).  So these are equivalent:
>>> 
>>> public int $foo { set (int $value) => $value + 1 }
>>> public int $foo { set => $value + 1 }
>>> 
>>> And only those forms are legal.  But you could also do this, if the 
>>> situation called for it:
>>> 
>>> public int $foo { set(int|float $num) => floor($num) + 1; }
>>> 
>>> This "all or nothing" approach seems like it strikes the best balance, 
>>> gives the most flexibility where needed while still having the least 
>>> redundancy when not needed, and when a name/type is provided, its behavior 
>>> is the same as for a method being inherited.
>>> 
>>> Does that sound acceptable?  (Again, question for everyone.)
>>> 
>> 
>> My only question with this is the same as I had in an earlier reply 
>> (and I'm not sure it was ever answered directly?), and you allude to 
>> this yourself: everywhere *else*, `($var)` means a parameter with type 
>> `mixed`. Why is the type *required* here, when you've specifically said 
>> you want to avoid boilerplate? If we're going to assume people can 
>> understand that `(implicit property-type $value) is implicit, surely we 
>> can also assume that they will understand "specifying a parameter 
>> without a type" means the parameter has no type (i.e. is `mixed`).
>> 
>> Again, for myself I'd be likely to type it (or regular parameters, 
>> properties, etc) as `mixed` if that's what I want *anyway*, but the 
>> inconsistency here seems odd, unless there's some until-now unknown 
>> drive to deprecate type-less parameters/properties/etc.
> 
> If we went this route, then an untyped set param would likely imply "mixed", 
> just like on methods.  Which, since mixed is the super type of everything, 
> would still technically work, but would weaken the type enforcement and thus 
> static analysis potential.  (Just as a method param can be widened in a child 
> class all the way to mixed/omitted, and it would be unwise for all the same 
> reasons.)
> 

Having a `mixed` param in the set hook shouldn't weaken the actual backing 
parameter though - when the hook writes to `$this->prop`, the parent type is 
still enforced, *surely*? If not, why not?

As for how static analysis tools handle this concept - I'd have thought it's 
too early to suggest what static analysis tools will or won't support given how 
much they already support based on less-formal syntax like docblocks. It's 
*already* possible to have a property that is reported (to static analysis  
tools/IDEs/etc) as a fixed type, but accepts a wider type on write, with the 
current mish-mash of typed properties, docblocks, magic getters and setters, 
and the bizarre `unset` behaviour. This is simply converting that into 
standardised syntax. The RFC itself proposes a scenario where a wider type is 
accepted in the set hook. I find it hard to believe that a static analysis tool 
can model "this property is `Foo` but it accepts `string|Foo` on write" but not 
"this property is `Foo` but it accepts `mixed` on write". Heck if that's such a 
problem what is said tool going to do when someone *explicitly* widens the 
parameter to `mixed` in a set hook? 

I would argue that if the language is providing better support for typed 
properties by adding 'hooks' like this, the need for static analysis of those 
specific parts reduces greatly - if someone wants to accept `mixed` when 
storing as a string, and convert it in the hook, and the language can **enforce 
those types at runtime**, why should some hypothetical static analysis be a 
hangup for that? 

> In the RFC as currently written, omitted means "derive from the property," 
> which is a concept that doesn't exist in methods; the closest equivalent 
> would be if omitting a type in a child method parameter meant "use the 
> parent's type implicitly," which is not how that works right now.

For the third time: I'm well aware of how parameter types work everywhere else, 
and that's why I'm asking why the same behaviour isn't being followed here? 

- You've said you want to avoid boilerplate; and
- You've said you would expect most people to just use the implicit `same-type 
$value` parameter; and
- You've acknowledged that the existing 'standard' is that a parameter without 
a type is considered `mixed`; and
- You've acknowledged in your RFC that there is a use-case for wanting to 
accept a wider type than what a property stores internally.

So why then is it so unacceptable that the existing standard be followed, such 
that a set hook with an "untyped" parameter would be treated as `mixed` as it 
is everywhere else? 

Yes, I know you said "widening to `mixed` is unwise". I don't seem to recall 
amongst all the type-related previous RFCs, any that suggested that child 
parameters widening to `mixed` (either explicitly or implicitly) should be 
deprecated, so I'm sorry but I don't see much value in that argument.

> 
> --Larry Garfield
> 


Cheers

Stephen
Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Reply via email to