Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)

Stanislav Malyshev Sun, 22 Feb 2015 18:30:06 -0800

Hi!

> It rejects code because doing code generation on the dynamic case is
> significantly harder and more resource intensive. Could that be built
> in? Sure. But it's a very significant difference from generating the
> static code.


I can appreciate that. Dynamic typing is hard to translate into
statically typed code efficiently. But I don't see how that is related
to PHP having strict types - surely even strict types do not make PHP
statically typed, in fact, I don't see how they improve much - so far
you've shown me code examples that you compiler *wouldn't* handle. I
don't see not being able to handle code is an advantage. Could I see
examples of code that strict model *can* handle and that work better in
that model?

> And even if we generated native code for the dynamic code, it would
> still need variants, and hence ZPP at runtime. Hence the static code
> has a significant performance benefit in that we can indeed bypass
> type checks as shown in the PECL example a few messages up (more than
> a few).

I don't see how you can bypass type checks unless you know the variable
types at the time of the call, from some external source or some
information you collected about the code. If you know that, you could as
well generate the same check-less code for weak/dynamic model.

> Passing a float to an integer parameter would result in a runtime
> E_RECOVERABLE_ERROR if the float has "dataloss".
> 
> So in the case I cited: foo($someint / 2), that will generate an
> E_RECOVERABLE_ERROR in Zeev's proposal, as well as in the static
> typing mode of mine.

It sounds like you've missed the part of my reply where I was saying
that I am considering the case of even numbers.

>>> With coercive typing as proposed in Ze'ev's RFC, that would need to
>>> happen anyway. In both proposals that would generate a runtime error.
>>
>> No, it wouldn't need to happen since no-DL conversion is allowed.
> 
> Sure it would. 3/2 is 1.5. Which would fatal if I passed it to
> foo(int) under Zeev's RFC. Because of data loss.

Again, you seem to miss the part where I said that we're considering a
non-DL case. For DL case, both behave the same so there's indeed no
difference (while you claimed there's some advantage for strict model?)

> This very particular case, yes, because of the simplicity of the types
> involved. But with strict typing you only need to look at 1 success
> case, but with coercive typing you need to look at many more.

I do not see why you can ignore the fact that your assumptions about the
variable types could be wrong with strict typing. PHP is not a static
typed language, so unless you can prove definitely that the variable
absolutely can not be anything other than the prescribed type (prior to
the call), you still need to have code that accounts for the other
possibility. If you can, however, prove that, both strict and dynamic
typing would behave exactly the same!
You could, of course, build your static analyzer in a way that would
reject every code where it can not prove all types - however I hope you
understand it is not an option for PHP core?

> Also, in many (I'd argue most) cases coercive has to either issue a
> warning (it doesn't know) or error on valid and functioning code.
> Example:
> 
> function isdivisibleby2(string $foo): bool {
>     if (preg_match('(\D)', $foo)) {
>         return false;
>     }
>     return 0 == ($int % 2);
> }
> 
> function something2(string $foo): int {
>     if (!isdivisibleby2($foo)) {
>         return 10;
>     }
>     return foo($foo / 2);
> }
> 
> This code would never raise a runtime error in Zeev's coercive
> proposal. However, when looking at it statically, you cant tell
> (unless you've got a regex decompiler).
> So static analysis on dynamic types will either error on valid code,
> or not error on invalid code (and I'm not even talking about the
> halting problem here).

True, but PHP is built on dynamic types, and neither proposal changes
that. So you either propose to make PHP fully statically typed (which I
hope you do not) or say static analysis is not perfect - which I
wholeheartedly agree, but then again CS is full of unsolvable problems,
and static analysis is, unfortunately, reduceable ultimately to one of
them, so no wonder here. The same case would, of course, be true with
strict and non-strict runtime typing - simply because PHP is not
statically typed.

> Whereas with strict typing, the error would appear in both cases
> (static and runtime). And you could fix it.

If you are saying that you can construct code, containing an error,
which will be missed by coercive typing but would fail (not necessarily
because of this specific error, but because of type mismatch) with
strict typing, it is of course trivially true. But so what? This in no
way proves strict typing caught the error - to prove that, the type
failure should be causally connected to the error, in your examples it
is not.

Moreover, you somehow bring example of the code that is actually not
wrong, practically speaking (as it divides by 2 the number that is
actually divisible by 2) and say it produces an error and it is good? I
somehow miss the point of how it is a good thing.

> More dangerous?

Yes, of course, explicit casts would be more dangerous since they may
hide errors, as I have shown you in one of the past emails. Explicit
casts are much more powerful than implicit ones, and thus are more
dangerous if used inappropriately (such as to override type system that
prevents one from doing the right thing).

> I've shown it a few times in this thread. So far nobody has said "not
> possible" to the code sample I showed above. But I'll quote it here
> again:
> 
> PHP_FUNCTION(test_strict) {
>     zend_bool valid_return = 0;
>     if (!zend_parse_parameters(...)) {
>         return;
>     }
>     internal_test_strict(&valid_return);
> }
> 
> void internal_test_strict(zend_bool *valid_return) {
>     //outer_code
>     zend_bool foo_valid = 0;
>     internal_strict_foo(x, &foo_valid);
>     if (!foo_valid) {
>         throw_error();
>     }
> }
> 
> That has a demonstrable performance benefit. And while it may be
> possible with a limited subset of dynamic types, the analyzer is
> significantly harder to build (and uses more resources) to determine
> the types as effectively as you'd need to with strict types.

I'm not sure I understand - where exactly in this code the performance
benefit is happening? And how internal_test_strict gets the x variable?
What type it has (in C)? What ensures it is indeed of that type and the
value that corresponds to it in PHP is always of the same and not some
other type? Something is missing here.

Also the part which is missing - *after* all the above - is an
explanation why the same code can not be generated in coercive model.

> You want to write non-robust code, great! That's what weak/coercive
> mode is for. You want the ability to have some type sanity? That's
> what strict mode is for.

No, in fact there's absolutely no difference in initial code for both
models with regard to odd numbers. Same for the first iteration (with
(int)). But in the final code, strict model is actually *worse* than
weak - because it was set up for failure by the addition of (int). It is
exactly the opposite of your claim - in your example, it is
weak/coercive model that is more robust, and (int) added by your model
is what hides the errors.

> NOOO, don't misunderstand me. The majority of the cases of a type
> mismatch indicate that you're doing something wrong. In fact, I'd

How do you know that? It looks like a case of circular logic - analyzer
is good because if it's says something wrong then it's something wrong.
We didn't yet prove that.

> argue there are only 2 reasons to use an explicit cast:
> 
> 1. Being explicit that you want to drop precision: $x / $y
> 2. Because of internal functions returning improper types: floor($x + $y)

These are *valid* reasons. But your model adds another one - "without
the cast the code just doesn't work even if I *know* the value is OK,
but the type does not match". That is - as I repeat again and again - is
the worst flaw of this model, that there are cases where people know the
values are right but your model doesn't and the only way to make it know
it is to use a sledgehammer - the explicit case.

> The majority of type errors, the correct action is to not add a cast
> but to fix the types.

Nope, at least not in dynamic language with majority string
inputs/outputs, like PHP is.

> Look at basically every other typed language. Do you see casts
> everywhere? No. Because type errors are **fixed**. Not just blindly
> casted.

Actually, I do see a number of casts - i.e., when processing data from
files which contain sets of integers in Python, I'd have to do something
like x = [int(n) for n in x] all the time.
But that again does not relate to the advantage of strict typing - those
languages are fully strict typed from the start, PHP is not and
realistically you can't expect any production code to be even
majority-typed for the next 5 to 10 years.

> No, but it does VASTLY increase the complexity and resource
> consumption of such a analyzer (dealing with coercive types).

I'm sorry, but nothing you said so far does not provide any proof of
such VAST increase. In fact, I can not see how it matters at all. Yes,
such analyzer would not reject some code it otherwise would reject, but
that by itself is not a VAST increase in complexity. You need to account
for dynamic nature of PHP anyway, if your analyzer is going to be worth
anything.

> I showed you several. I'm not going to go in circles because we have a
> failure in communication.

It is true that you showed me several examples of code, however I do not
see how these examples prove any of your claims except for the trivial
one that strict typing would reject some code that dynamic typing would not.

-- 
Stas Malyshev
smalys...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)

Reply via email to