Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Pierre Joye Sat, 21 Feb 2015 12:18:07 -0800

On Sat, Feb 21, 2015 at 11:11 AM, Zeev Suraski <z...@zend.com> wrote:
> Sorry for the previous prematurely sent email, looks like I found a new
> keyboard shortcut :)
>
>> -----Original Message-----
>> From: Anthony Ferrara [mailto:ircmax...@gmail.com]
>> Sent: Saturday, February 21, 2015 8:12 PM
>> To: Zeev Suraski
>> Cc: PHP internals
>> Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC
>>
>> Zeev,
>>
>> First off, thanks for putting forward a proposal. I look forward to a
>> patch
>> that can be experimented with.
>>
>> There are a few concerns that I have about the proposal however:
>>
>> > Proponents of Strict STH cite numerous advantages, primarily around code
>> safety/security. In their view, the conversion rules proposed by Dynamic
>> STH
>> can easily allow ‘garbage’ input to be silently converted into arguments
>> that
>> the callee will accept – but that may, in many cases, hide
>> difficult-to-find
>> bugs or otherwise result in unexpected behavior.
>>
>> I think that's partially mis-stating the concern.
>
> I don't think it's mis-stating the key concern.  At least not based on what
> I've heard from most people here over the last few months.


I think this argument should be avoided from now on.

We surely can go wild and provide names and numbers, all being better
than other. But at the end of the day, we vote on a proposal. A
proposal  written by one or many persons (name them or leave them)
with a clear specification, patch, impact (backed by tests), etc. Any
other random estimation or popularity contests are pointless and
counter productive.


>> > Proponents of Dynamic STH bring up consistency with the rest of the
>> language, including some fundamental type-juggling aspects that have been
>> key tenets of PHP since its inception. Strict STH, in their view, is
>> inconsistent
>> with these tenets.
>>
>> Dynamic STH is apparently consistency with the rest of the language's
>> treatment of scalar types. It's inconsistent with the rest of the
>> languages
>> treatment of parameters.
>
> Not in the way Andrea proposed it, IIRC.  She opted to go for consistency
> with internal functions.  Either way, at the risk of being shot for talking
> about spiritual things, Dynamic STH is consistent with the dynamic spirit of
> PHP, even if there are some discrepancies between its rule-set and the
> implicit typing rules that govern expressions.  Note that in this RFC I'm
> actually suggesting a possible way forward that will align *all* aspects of
> PHP, including implicit casting - and have them all governed by a single set
> of rules.

You did not answer my questions about BC. Changing the way we do it is
much more likely to break things than providing a choice to move to
another mode. I am in favor of do not break things by default and give
the option to actually use strict typing when desired (yes, I repeat
myself here).

>> However there's an important point to make here: a lot of best practice
>> has
>> been pushing against the way PHP treats scalar types in certain cases.
>> Specifically around == vs === and using strict comparison mode in
>> in_array,
>> etc.
>
> I think you're correct on comparisons, but not so much on the rest.  Dynamic
> use of scalars in expressions is still exceptionally common in PHP code.
> Even with comparisons, == is still very common - and you'd use == vs. ===
> depending on what you need.

I do not think using legacy codes to determine which (optional)
features should be implemented in php is the right way. Really not.

>> So while it appears consistent with the rest of PHP, it only does so if
>> you
>> ignore a large part of both the language and the way it's commonly used.
>
> Let's agree to disagree.  That's one thing we can always agree on!  :)

I am not sure there is something to agree on but something to actually
validate against existing codes. We can't do it until this RFC has a
patch.

>> In reality, the only thing PHP's type system is consistent at is being
>> inconsistent.
>
> I'd have to partially agree with you here;  But if you read the RFC through
> including its future recommendations, you'd see it's perhaps the first
> attempt in 20 years to fix that.  Instead of doing that through the
> introduction of a 3rd (albeit simplistic rule-set that only pays attention
> to zval.type) - a creation of a single set of rules that will be consistent
> across the whole language, beginning with userland and internal functions.

I agree we should fix that. I however disagree that the fix may break
BC. Many proposed that back to 5.0 and we did not agree on changing
that. The situation now is no different.

>> In the "Changes To Internal Functions" section, I think all three types
>> are
>> significantly flawed:
>>
>> 1. "Just Do It" - This is problematic because a very large chunk of code
>> that
>> worked in 5.x will all of a sudden not work in 7.0. This will likely
>> create a
>> python 2/3 issue, as it would require a LOT of code to be changed to make
>> it
>> compatible.
>>
>> 2. "Emit E_DEPRECATED" - This is problematic because raising errors (even
>> if
>> suppressed) is not cheap. And the potential for raising one for a
>> non-trivial
>> percentage of every native function call has the potential to have a
>> MASSIVE
>> performance impact for code designed for 5.x. Without a patch to test, it
>> can't really be codified, but it would be a shame to lose the performance
>> gains made with 7 because we're triggering 100's, 1000's or 10000's of
>> errors
>> in a single application run...
>>
>> 3. "Just Do It but give users an option to not" - This has the problems
>> that
>> E_DEPRECATED has, but it also gets us back to having fundamental code
>> behavior controlled by an INI setting, which for a very long time this
>> community has generally seen as a bad thing (especially for portability
>> and
>> code re-use).
>
> I do too, and I was upfront about their cons, not just pros.  And yet, they
> all bring us to a much better outcome within a relatively short period of
> time (in the lifetime of a language) than the Dual Mode will.



>> > Further, the two sets can cause the same functions to behave
>> > differently depending on where they're being called
>>
>> I think that's misleading. The functions will always behave the same.
>> The difference is how you get data into the function. The behavior
>> difference
>> is in your code, not the end function.
>
> I'll be happy to get a suggestion from you on how to reword that.
> Ultimately, from the layman user's point of view, she'd be calling foo()
> from one place and have it accept her arguments, and foo() from another
> place and have it reject the very same arguments.
>
>> > For example, a “32” (string) value coming back from an integer column in
>> > a
>> database table, would not be accepted as valid input for a function
>> expecting
>> an integer.
>>
>> There's an important point to consider here. You're relying on information
>> outside of the program to determine program correctness.
>> So to say "coming back from an integer column" requires concrete
>> knowledge and information that you can't possibly have in the program.
>> What happens when some DBA changes the column type to a string type.
>> The data will still work for a while, but then suddenly break without
>> warning
>> when a non-integer value comes in. Because the value-information comes
>> from outside.
>
> Of course we're relying on information coming from outside, as we all know,
> this is one of the most common use cases for PHP.
> While theoretically you're right, in practice, in the vast majority of cases
> it wouldn't play out like that.  The string column won't be tested
> exclusively with "123" inputs.  As soon as there's a non-numeric-string
> input, it'll fail.  That's likely to happen very early in the process, and
> that's before considering that if there's such a huge mismatch between the
> semantic meaning of the column and what the function expects - the problem
> is likely to be found even sooner, since the function will simply not
> perform its intended job.
>
> On the flip-side, imagine that same developer using strict types.  Feeding
> the function that integer in string form gets rejected.  What are her
> options?  The developer is likely  to just explicitly cast the value into an
> int, giving up on any and all sanitization that coercive types would offer
> her, happily accepting "Apples" and "100 Dalmatians" as valid inputs.  That,
> on the other hand, is a *very* likely scenario.

You are underestimating the knowledge and experiences of our users. I
do  not think developers looking for strict types will do what you are
suggesting.

>> With strict mode, you'd have to embed a cast (smart or explicit) to
>> convert to
>> an integer at the point the data comes in.
>
> First, I'm not aware of smart/safe casts being available or proposed at this
> point.
> Secondly, why at the point the data comes in?  That would be ideal for
> static analyzers, but it's probably a lot more common that it will be done
> at the first point in time where it gets rejected.
>
>> Additionally, with the dual-mode proposal DB interactions can be in weak
>> mode and have the exact behavior you're describing here. Giving the user
>> the
>> choice, rather than making assumptions.
>
> This is bound to be misquoted and used against me, but I don't think it's a
> good idea to give the user the choice in such a way.  I could have sworn
> that you tweeted the quote about perfection being not when there's nothing
> left to add, but nothing left to remove, but perhaps it was someone else.
> Either way, two modes are worse than one, if we can come up with a good
> single unified mode that addresses *most* cases.

I disagree. One mode remains untouch, fully BC ensures a smooth and
fast (or lest slow) migration to 7. The 2nd mode will attract the non
negligible amount of users looking forward to have such mode.

On the other hand, changing the default mode is very likely to be a
real pain during migration. This is not the kind of things that are
easy to catch, cannot be automated (like code conversions and the
likes).

> Remember you can always implement custom type checking to your heart's
> content.  You can easily implement if (!is_int($foo)) { exit; } in the
> not-so-common-cases where accepting "42" as 42 might be disastrous.
> However, on the caller side, forcing people to clutter their code with
> casts - many casts - either explicit casts or custom ones - is going to
> affect a lot more developers in a lot more places.  The bang for the buck of
> adding strict mode is just not there, in my humble opinion of course.

Now you are misleading readers about what it is proposed. A user of a
library will never ever be forced to cast. This is not what the dual
proposal does and it is not what it will do. The strict mode is
confined to the given library files and code, where the library
authors decided to enable it. I would very much appreciate to stop
using this as an argument as it is simply not correct. We do not more
confusions about the respective proposals.


>> > Strict zval.type based STH effectively eliminates this behavior, moving
>> > the
>> burden of worrying about type conversion to the user.
>>
>> Correct. And you say that as if it's a bad thing. Being explicit about
>> type
>> conversions isn't what you'd do in a 10 line-of-code script where you can
>> realize what the types are by just thinking about it. But on large scale
>> systems
>> exposing the type conversions to the user gives the power to actually
>> understand the codebase when you can't fit the whole thing in your head at
>> the same time.
>
> I have a hard time connecting to the 'power' approach.  I think developers
> want their code to work, with minimal effort, and be secure.  Coercive
> scalar type hints will do an excellent job at that.  Strict type hints will
> be more work, are bound to a lot of trigger "Oh come on" responses, and as a
> special bonus - proliferate the use of explicit casts.  Let me top that -
> you'd have developers who think they're security conscious, because they're
> using strict mode - with code that's full of explicit casts.

Again, speculations on random numbers or code reviews.

>> > It is our position that there is no difference at all between strict
>> > and coercive typing in terms of potential future AOT/JIT development -
>> > none at all
>>
>> So really what you're saying is that you disagree with me publicly. A
>> statement which I said on the side, and I said should not impact RFC or
>> voting
>> in any way. And is in no part in my RFC at all. Yet brought up again.
>
> We listed all what we believe to be misconceptions that were brought up on
> internals.  As recently as yesterday, you had a PHP power user (Larry) that
> was under the strong impression Strict STH would yield substantial
> performance benefits.

We  agreed that performance is totally irrelevant to this discussions.
And this time my team has provided numbers to back this statement,
after Dmitry's reply worrying (a bit) about the performance impact,
for the 1st time in this discussion). So let move on this aspect,
waste of time :)

> Given that it was claimed in the past, and since we
> can't assume every voter reads every last word that's written on internals@
> threads, it was important to list that here even if it's not mentioned in
> the Strict/Dual mode RFC.

We could add this statement in all related RFCs: "performance is not
impacted by this RFC, in any direction". And move on.

> It's also worth mentioning that there are people who *assume* that strict
> type hints can somehow help performance, without being domain experts at
> neither the engine nor JIT, even if they weren't exposed to the explicit
> statements that suggested that on blogs and on internals@ - adding to the
> importance of making it clear that there are no performance benefits to that
> approach.

Everyone, even "experts", are  pretty much assuming a lot of things
about type hinting. We should focus on the design and concept behind
that, not a potential advantages or other "upcoming" new features but
the actual benefits of each proposal from an implementation, clarity,
taste or applications requirements point of views.

>> > Static Analysis. It is the position of several Strict STH proponents
>> > that Strict STH can help static analysis in certain cases. For the
>> > same reasons mentioned above about JIT, we don't believe that is the
>> > case
>>
>> This is patently false.
>
> It's actually patently true.  We don't believe that is the case.  QED.

Both are true and false, let call it the Schroedinger Question of the
day. Refer to my previous line for this question as well. This is not
in the scope of these RFCs.

> While at it, can we stop using that 'patently false', and stick for
> constructive wording such as 'I disagree'?

I see nothing wrong with "patently". I see much more wrong to totally
ignore feedback, ideas, replies etc. while playing the politically
correct writer. But I am being OT again.

Cheers,
-- 
Pierre

@pierrejoye | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Reply via email to