Re: [PHP-DEV] Asymmetric visibility Reflection API problems

2024-10-04 Thread Ilija Tovilo
Hi Valentin

Thank you for bringing up your concerns.

On Thu, Oct 3, 2024 at 3:32 PM Valentin Udaltsov
 wrote:
>
> We are going to talk about the asymmetric visibility RFC [2] and the
> new `ReflectionProperty::is(Private|Protected)Set()`
> methods. Consider a class and reflected facts about it's properties [3]:
>
> ```php
> class A
> {
> // isPrivateSet() = true (OK)
> public private(set) mixed $public_private_set;
>
> // isPrivateSet() = false (I would expect true)
> private private(set) mixed $private_private_set;
>
> // isPrivateSet() = false (I would expect true)
> private mixed $private;
>
> // isProtectedSet() = true (OK, explained in the RFC)
> public readonly mixed $public_readonly;
>
> // isProtectedSet() = false (I would expect true)
> protected readonly mixed $protected_readonly;
>
> // isProtectedSet() = false (I would expect true)
> protected protected(set) readonly mixed $protected_protected_set_readonly;
>
> // isPrivateSet() = false (OK), isProtectedSet() = false (OK)
> public bool $virtual_no_set_hook { get => true; }
> }
> ```

As mentioned on GitHub, I do understand and agree that this may seem
confusing without more context.
However, this behavior for flags is really not unique to asymmetric visibility.
For example:

```php
interface I {
public function test();
}

var_dump((new ReflectionMethod('I', 'test'))->isAbstract());
//> bool(true)
```

The `abstract` flag is not (and cannot) be specified explicitly, but it is added
implicitly. Similarly, the `protected(set)` in `protected protected(set)` is
redundant, so the compiler removes it to omit additional visibility checks. For
the same reason, we don't even have an internal flag (at runtime) for
`public(set)`, as it is always removed at compile time.

As mentioned, these methods are generally quite "dumb" and simply expose these
internal flags to userland. This is reflected by `getModifiers()`, which returns
a bitmask over all flags. `getModifiers()` should stay consistent with
`isPublic()`, as well as all the other methods. This will be increasingly
difficult if we start tweaking the implementation of these methods.

I have proposed to instead solve this by introducing new functions,
`ReflectionProperty::isReadable()` and `isReadable()`. They would not only
handle visibility and asymmetric visibility, but also scope, virtual-ness,
hooks and readonly (including __clone). Additionally, if some new
concept is ever
added, the functions may adapt. I have created a simple PoC [1].

> Here's what I propose:
>
> - `ReflectionProperty::isPublic()` returns `true` only if the property
> is symmetrically public. For `public readonly $prop`
> it will return `false`, because it's implicitly `public protected(set)
> readonly $prop`. This is a BC break, but a
> smaller one, given that libraries now take "readonly" into account.
> - `ReflectionProperty::isProtected()` returns `true` only if the
> property is symmetrically protected. For `protected readonly $prop`
> it will return ``true``, because it's implicitly `protected
> protected(set) readonly $prop`. Fully backward compatible.
> - `ReflectionProperty::isPrivate()` returns `true` only if the
> property is symmetrically private. Fully backward compatible.
> - Add `ReflectionProperty::isPublicGet()`: returns `true` if property
> is `public` or `public XXX(set)`.
> - Add `ReflectionProperty::isProtectedGet()`: returns `true` if
> property is `protected` or `protected XXX(set)`.
> - Add `ReflectionProperty::isPrivateGet()`: returns `true` if property
> is `private` or `private private(set)`.
> - Add `ReflectionProperty::isPublicSet()`: returns `true` if property
> is symmetrically public (asymmetric public
> set is not possible).
> - Change `ReflectionProperty::isProtectedSet()`: returns `true` if
> property is symmetrically protected or has an
> asymmetric protected set.
> - Change `ReflectionProperty::isPrivate()`: returns `true` if property
> is symmetrically private or has an asymmetric
> private set.

I very much disagree with changing `isPublic` and friends. It would be a
considerable BC break which isn't acceptable in the RC phase. I also don't think
it's necessarily more correct. With this RFC, properties have two visibilities:

1. The visibility of the entire property, which we are all used to.
2. The visibility of the set operation, which was introduced with
asymmetric visibility.

Essentially, the get operation continues to inherit the visibility from the
property. This is what `isPublic()` and friends are referring to. Of course, you
may argue that an equally valid mental model is that the property no longer has
a visibility, but an additional get visibility. However, this does not
accurately reflect the implementation, nor, more importantly, the syntax.

> The most difficult question is whether we can have these or similar
> changes in 8.4 after RC1... Does a BC break
> in the current implementation (if we agree that it exists

Re: [PHP-DEV] VCS Account Request: ayesh

2024-09-26 Thread Ilija Tovilo
On Thu, Sep 26, 2024 at 1:45 PM Tim Düsterhus  wrote:
>
> Am 2024-09-25 21:26, schrieb Ayesh Karunaratne:
> > I see the account is already approved.
>
> Unless you did not yet accept the invitation to the GitHub organization,
> it appears that you were not invited there yet. If that is the case,
> someone should send the invite.

Thanks for the hint, I sent Ayesh an invite to the php-src team.

Ilija


Re: [PHP-DEV] Which IDE do you recommend for php-src development?

2024-09-15 Thread Ilija Tovilo
Hi Carlos

On Sat, Sep 14, 2024 at 11:44 PM Barel  wrote:
>
> For C/C++ development I usually use CLion from Jetbrains but I tried to use 
> it with php-src and was unable to get it to work properly. CLion really 
> insists on using CMake and has only quite limited support for makefiles. 
> After trying to get it to work unsuccessfully I am ready to try something 
> else.
>
> So which IDE would you recommend for php-src development? I understand that 
> people probably have many different preferences but I wondered if there was 
> something that most php internals developers used.

I tried to like CLion, but it was extremely laggy for me every time I
tried it. php-src contains some rather large files, and even some dsl
(zend_vm_def.h) that didn't work well at all. I also didn't find a way
to add simple highlighting support for PHP test files.

VSCode works surprisingly well, I've used it for C development for
years. I created a simple guide for setting it up, including gdb. It
shouldn't take you more than 10 minutes. [1]

Zed is also already a decent option that essentially just works out of
the box. Unfortunately, it's currently lacking debugging support. [2]

> One important feature would be to easily work with the project running on a 
> docker container

Can't help you there, I don't use Docker for C development.

Ilija

[1] http://php.github.io/php-src/introduction/ides/visual-studio-code.html
[2] https://zed.dev/


Re: [PHP-DEV] json_encode() and Generators / RFC Karma

2024-08-30 Thread Ilija Tovilo
On Fri, Aug 30, 2024 at 12:19 PM Philip Hofstetter
 wrote:
>
> Username is pilif

RFC karma was granted. Good luck with the RFC!


Re: [PHP-DEV] json_encode() and Generators / RFC Karma

2024-08-30 Thread Ilija Tovilo
Hi Philip

On Fri, Aug 30, 2024 at 9:37 AM Philip Hofstetter
 wrote:
>
> If you think this is a worthwhile thing to do an RFC for, I would kindly ask 
> for somebody to grant me RFC karma, so I can start working on one.

To grant you RFC karma, I need to know your wiki.php.net username.


Ilija


Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Ilija Tovilo
Hi Rowan

On Sun, Aug 25, 2024 at 6:06 PM Rowan Tommins [IMSoP]
 wrote:
>
> On 25/08/2024 16:29, Bilge wrote:
> > You can write, `include(1 + 1);`, because `include()` accepts an
> > expression. You will get: "Failed opening '2' for inclusion". Should
> > we restrict that? No, because that's just how expressions work in any
> > context where they're allowed.
>
>
> I think a better comparison might be the "new in initializers" and
> "fetch property in const expressions" RFCs, which both forbid uses which
> would naturally be allowed by the grammar. The rationale in those cases
> was laid out in
> https://wiki.php.net/rfc/new_in_initializers#unsupported_positions and
> https://wiki.php.net/rfc/fetch_property_in_const_expressions#supporting_all_objects

I don't agree with that. Constant expressions in PHP already only
support a subset of operations that expressions do. However, default
is proposed to be a true expression, i.e. one that compiles to
opcodes. Looking at the `expr` nonterminal [1] I can't see any
productions that are restricted in the context they can be used in,
even though plenty of them are nonsensical (e.g. exit(1) + 2).

Furthermore, new in initializers was disallowed in some contexts not
because it would be nonsensical, but because it posed technical
difficulties.

I also believe some of the rules you've laid out would be hard to enforce.

> 1) The expression should be reasonably guaranteed to produce the same type as 
> the actual default.

Even the simple cases of ??, ?: can easily break this rule.

Furthermore, context restriction is easily circumvented. E.g.

foo((int) default); // This is not allowed
foo((int) match (true) { default => default }); // Let me just do that

I'm not sure context restriction is worthwhile, if 1. we can't do it
properly anyway and 2. there are no technical reasons to do so.

Ilija

[1] 
https://github.com/php/php-src/blob/3f4028d3d9d63e1dae012a9c350141493b30825f/Zend/zend_language_parser.y#L1198


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Ilija Tovilo
Hi Stephen

On Sat, Aug 24, 2024 at 1:54 PM Stephen Reay  wrote:
>
> Thanks for clarifying. Out of curiosity, how much optimisation do you imagine 
> would be possible if the lookups were done the same was as classes (ie no 
> fallback, names must be local, qualified or imported with `use`)?

I haven't measured this case specifically, but if unqualified calls to
local functions are indeed rare (which the last analysis seems to
indicate), then it should make barely any difference. Of course, if
your code makes lots of use of them, then the story might be
different. That said, the penalty of an ambiguous internal call is
much higher than that of a user, local call, given that internal calls
sometimes have special optimizations or can even be entirely executed
at compile time. For local calls, it will simply lead to a double
lookup on first execution.

> I am aware this is a BC break. But if it's kosher to discuss introducing a 
> never ending BC break I don't see why this isn't a valid discussion either. 
> It would give *everyone* that elusive 2-4% performance boost, would resolve 
> any ambiguity about which function a person intended to call (the claimed 
> security issue) and would bring consistency with the way classes/etc are 
> referenced.

>From my analysis, there were 2 967 unqualified calls to local
functions in the top 1 000 repositories. (Disclaimer: There might be a
"use function" at the top for some of these, the analysis isn't that
sophisticated.)

I also ran the script to check for unqualified calls to global
functions (or at least functions that weren't statically visible in
that scope in any of the repositories files), and there were ~139 000
of them. It seems like this is quite a different beast. To summarize:

1. Flipping lookup order: ~a few dozens of changes
2. Global only: ~3 000 changes
3. Local only: ~139 000 changes

While much of this can be automated, huge diffs still require
reviewing time, and can lead to many merge conflicts which also take
time to resolve. I would definitely prefer to go with 1. or
potentially 2.

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Ilija Tovilo
Stephen

On Sat, Aug 24, 2024 at 2:00 PM Stephen Reay  wrote:
>
> When I said this thread reads like an April fools joke that wasn't a 
> challenge you know.

We *just* had somebody temporarily banned for ad-hominem attacks like
a week ago. Please familiarize yourself with the mailing list rules.
They apply to everyone.

https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md

Most significantly:

> a. Make everybody happier, ...

and

> 1. Do not post when you are angry. Any post can wait a few hours. Review your 
> post after a good breather, or a good nights sleep.

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Ilija Tovilo
On Fri, Aug 23, 2024 at 9:41 PM Rowan Tommins [IMSoP]
 wrote:
>
> On 23 August 2024 18:32:41 BST, Ilija Tovilo  wrote:
> >IMO, 1. is too drastic. As people have mentioned, there are tools to
> >automate disambiguation. But unless we gain some other benefit from
> >dropping the lookup entirely, why do it?
>
> I can think of a few disadvantages of "global first":
>
> - Fewer code bases will be affected, but working out which ones is harder. 
> The easiest migration will probably be to make sure all calls to namespaced 
> functions are fully qualified, as though it was "global only".

To talk about more concrete numbers, I now also analyzed how many
relative calls to local functions there are in the top 1000 composer
packages.

https://gist.github.com/iluuu1994/9d4bbbcd5f378d221851efa4e82b1f63

There were 4229 calls to local functions that were statically visible.
Of those, 1534 came from thecodingmachine/safe, which I'm excluding
again for a fair comparison. The remaining 2695 calls were split
across 210 files and 27 repositories, which is less than I expected.

The calls that need to be fixed by swapping the lookup order are a
subset of these calls, namely only the ones also clashing with some
global function. Hence, the process of identifying them doesn't seem
fundamentally different. Whether the above are "few enough" to justify
the BC break, I don't know.

> - The engine won't be able to optimise calls where the name exists locally 
> but not globally, because a userland global function could be defined at any 
> time.

When relying on the lookup, the lookup will be slower. But if the
hypothesis is that there are few people relying on this in the first
place, it shouldn't be an issue. It's also worth noting that many of
the optimizations don't apply anyway, because the global function is
also unknown and hence a user function, with an unknown signature.

> - Unlike with the current way around, there's unlikely to be a use case for 
> shadowing a namespaced name with a global one; it will just be a gotcha that 
> trips people up occasionally.

Indeed. But this is a downside of both these approaches.

> None of these seem like showstoppers to me, but since we can so easily go one 
> step further to "global only", and avoid them, why wouldn't we?
>
> Your answer to that seems to be that you think "global only" is a bigger BC 
> break, but I wonder how much difference it really makes. As in, how many 
> codebases are using unqualified calls to reference a namespaced function, but 
> *not* shadowing a global name?

I hope this provides some additional insight. Looking at the analysis,
I'm not completely opposed to your approach. There are some open
questions. For example, how do we handle functions declared and called
in the same file?

namespace Foo;
function bar() {}
bar();

Without a local fallback, it seems odd for this call to fail. An
option might be to auto-use Foo\bar when it is declared, although that
will require a separate pass over the top functions so that functions
don't become order-dependent.

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Ilija Tovilo
On Fri, Aug 23, 2024 at 8:19 PM John Coggeshall  wrote:
>
> 1. People who don't think BC is a problem, and would like to drop
> either the global or local lookup entirely, requiring disambiguation.
>
> There is also an option of swapping the priority, making local lookups 
> secondary to global lookups -- and to override that behavior you would 
> require disambiguation. It doesn't have to mean drop the lookup entirely.

Right, that's my proposal. My point was that few (nobody except you
and me, I believe) showed particular excitement for this approach.


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Ilija Tovilo
On Fri, Aug 23, 2024 at 8:11 PM Stephen Reay  wrote:
>
> On 24 Aug 2024, at 00:32, Ilija Tovilo  wrote:
>
> > I obviously also disagree with 3. as I wouldn't have sent this
> > proposal otherwise. :) Performance improvements are hard to come by
> > nowadays. It was measured on real codebases (Symfony and Laravel).
>
> Just to make sure I'm not misunderstanding something here - Symphony and 
> Laravel (or anyone else) can get the exact same performance benefit you're 
> talking about, if they use either a leafing `\` or `use function...` in their 
> own code, today, correct?

As I wrote in my e-mail:

> For some context: Before proposing this change, I asked Symfony if
> they were interested in disambiguating calls to improve performance,
> given I did some work to make fully qualified calls faster. But they
> were not, stating that the change would be too verbose for their
> liking.
>
> Making unqualified calls to mean local would force Symfony into making
> the change, which is not the approach I'm interested in taking.

Yes, the full performance benefits can be achieved by prefixing your
entire codebase, which includes not only your own code but also the
code of your framework. How much you benefit from just converting your
own code of course depends on how much you rely on vendor code.

> I'm not going to pretend to know how many files either project has that uses 
> global functions, but neither of those two existing options seems 
> particularly "hard to come by" in my mind.

The "hard to come by" part is referring to the engine, which is quite
optimized for the current semantics. Some of PHPs quirky semantics
make it hard to improve it further, this being one of them.


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Ilija Tovilo
On Fri, Aug 23, 2024 at 5:49 PM Rowan Tommins [IMSoP]
 wrote:
>
> Other proposals aim to shift that balance - leaving some inconsistency, but 
> less compatibility break.
>
> And most users don't object to using a leading backslash, they just (quite 
> reasonably) have no idea what impact it has on the ability of the engine to 
> optimise their code.

For some context: Before proposing this change, I asked Symfony if
they were interested in disambiguating calls to improve performance,
given I did some work to make fully qualified calls faster. But they
were not, stating that the change would be too verbose for their
liking.

Making unqualified calls to mean local would force Symfony into making
the change, which is not the approach I'm interested in taking. Making
them global would likely reduce breakage by much, but not nearly as
much as keeping the fallback.

>From reading the responses, it seems we have three primary camps:

1. People who don't think BC is a problem, and would like to drop
either the global or local lookup entirely, requiring disambiguation.
2. People who do think BC is a problem, and would like some opt-in
mechanism to explicitly pick global or local scope.
3. People who aren't convinced that the performance improvements are
worth it to begin with, or that the developers themselves are
responsible for disambiguation.

IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it? Consistency with class
lookups is a factor, but is it enough to break a large portion of
codebases? The summed up time of every maintainer installing and
running a tool that modifies a large portion of the codebase, and then
dealing with conflicts in existing branches is not miniscule. Fixing
local calls will also require context from other files to correctly
disambiguate. I'm not aware if any tools actually consider context, or
just take the naive approach of making known, internal calls global,
and leaving the rest.

2. misses the point of the immediate performance gains without
modifications to the codebase. Even if the disambiguation itself is a
one-liner, it still needs to be added to every codebase and every
file, and still requires fixing actual local calls that may be made
within the same file.

I obviously also disagree with 3. as I wouldn't have sent this
proposal otherwise. :) Performance improvements are hard to come by
nowadays. It was measured on real codebases (Symfony and Laravel).

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-21 Thread Ilija Tovilo
Hi John

On Wed, Aug 21, 2024 at 8:02 PM John Coggeshall  wrote:
>
> This is an attack vector for every application and I would argue should be a 
> real concern for the vast majority of applications  out there -- any which 
> rely on namespace-based frameworks and composer packages from untrustworthy 
> sources. It's not just Wordpress -- literally every single PHP application 
> that uses a publicly available framework and consumes external composer 
> packages should be FQing their internal function calls. The natural behavior 
> of the language shouldn't be the insecure way of doing things for the sake of 
> maintaining BC compatibility with existing, insecure, code.

Including a malicious composer package already allows for arbitrary
code execution, do you really need more than that?

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-20 Thread Ilija Tovilo
Hi Levi

On Tue, Aug 20, 2024 at 5:14 PM Levi Morrison
 wrote:
>
> I have long been in favor of a larger BC break with better language
> consistency. Class lookup and function lookup with respect to
> namespaces should be treated the same. The difficulty is getting a
> majority of people to vote yes for this. Keep in mind that qualifying
> every global function is annoying but probably can be somewhat
> automated, and will bring better performance. So again, this improves
> the existing code even without upgrading.
>
> Yes, there would be complaints about it. Yes, there are probably some
> people or projects who wouldn't upgrade. I don't particularly care, as
> there are increasingly more operating systems and companies providing
> LTS support for long periods of time. Probably Zend.com will offer LTS
> support for the last PHP 8.X release, and possibly there will be some
> distro which also has it. I believe it's the right thing to do
> because:
>
>  1. It's faster.
>  2. It enables function autoloading in a similar manner to class autoloading.
>  3. It's more consistent, and simpler to teach and maintain.
>
> It's rare that you get all of these together, often you have to make
> tradeoffs within them.

The approach I originally proposed also solves 1. and 2. (mostly) with
very little backwards incompatibility. Consistency is absolutely
something to strive for, but not at the cost of breaking most PHP
code.

To clarify on 2.: The main issue with function autoloading today is
that the engine needs to trigger the autoloader for every unqualified
call to global functions, given that the autoloader might declare the
function in local scope. As most unqualified calls are global calls,
this adds a huge amount of overhead.

Gina solved this in part by aliasing the local function to the global
one after the first lookup. However, that still means that the
autoloader will trigger for every new namespace the function is called
in, and will also pollute the function table.

Reversing the lookup order once again avoids local lookup when calling
global functions in local scope, which also means dodging the
autoloader. The caveat is that calling local functions in local scope
triggers the autoloader on first encounter, but at least it can be
marked as undeclared in the symbol table once, instead of in every
namespace, which also means triggering the autoloader only once.

Ilija


Re: [PHP-DEV] Require C11 in PHP 8.4

2024-08-13 Thread Ilija Tovilo
Hi Levi

On Tue, Aug 13, 2024 at 5:00 PM Levi Morrison
 wrote:
>
> On Tue, Aug 13, 2024 at 8:17 AM Ilija Tovilo  wrote:
> >
> > Just to state it officially: You object to switching to C11 in 8.4? In
> > that case, we'll have to postpone.
>
> I don't object. I think it's smarter to wait, and do more than just
> redeclare a few typedefs. But I won't block it.

Thank you for the clarification. In that case, I'd say let's go
forward with the proposal. I agree that we should be making use of
more C11 features in the future, once the opportunity presents itself.

> If only documentation changes, then I think the risk is lower. We
> definitely should not be doing configure checks in 8.4 with how little
> time there is, except maybe _one_ that attempts to redeclare a typedef
> to see if it compiles, and give a nicer error message.

I agree. I will create a PR to provide a ./configure time error
message if the compiler doesn't support typedef redeclarations.
Adopting C11 does mean we can start using other C11 features too,
though that should naturally happen only on (future) master rather
than 8.4, given that ABI freeze is only a little over a month away.

Ilija


Re: [PHP-DEV] Require C11 in PHP 8.4

2024-08-13 Thread Ilija Tovilo
Hi Levi

On Mon, Aug 12, 2024 at 5:56 PM Levi Morrison
 wrote:
>
> Given the timetable, I wouldn't change the C std requirements for 8.4.

Just to state it officially: You object to switching to C11 in 8.4? In
that case, we'll have to postpone.

> I would stop relying on the typedef and forward declare only the
> struct, and that. Note that although Windows supports C11, it does not
> support all features including atomics. Someone chimed in to say that
> they are available, but this doesn't match the information I got from
> a coworker who did a similar test. Given conflicting information and
> the short timetable, I think we should lean towards being cautious. I
> hope for 8.5/9.0 we can move to C11/C17 which can improve the typedef
> situation, simplify our atomics handling, and more.

I'm a bit confused about the relevance of C11 atomics. As I'm sure
you're aware, they remain optional in C17/C23. So, we'll need to
support a fallback, my suggestion was not to remove the fallback.

Essentially, code-wise, nothing would change if we adopt C11, except
being allowed to redeclare typedefs. Apart from that, only the
documentation would change.

Ilija


Re: [PHP-DEV] [DISCUSSION] C++ Enhancements in Zend API

2024-08-12 Thread Ilija Tovilo
On Mon, Aug 12, 2024 at 9:00 PM Lanre  wrote:
>
> I didn’t realize this was an open mic for Rust devs to flaunt their 
> ignorance, but since you’ve decided to chime in, let me spell it out for you. 
> Rust has absolutely nothing to do with this discussion, so try to stay on 
> topic. Nowhere did I mention a move to C++; perhaps reading comprehension 
> isn’t your strong suit.

May I remind you again that we expect all participants on this list to
be polite. Ad-hominem attacks are not tolerated. Please freshen up on
the mailing list rules.

https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md

Ilija


Re: [PHP-DEV] Require C11 in PHP 8.4

2024-08-11 Thread Ilija Tovilo
> Giovanni's remark that this would impact many people was challenged by
> Jakub [1] which didn't get a response. I believe it's safe to assume
> that this isn't the case.

I somehow forgot to link this reference, sorry about that.

[1] https://externals.io/message/124706#124717


Re: [PHP-DEV] Require C11 in PHP 8.4

2024-08-11 Thread Ilija Tovilo
Hi Christoph

On Sat, Aug 10, 2024 at 2:19 PM Christoph M. Becker  wrote:
>
> On 01.08.2024 at 23:57, Ilija Tovilo wrote:
>
> So skimming the whole discussion[1] it seems that most are generally
> fine with bumping the requirements to C11, except for Giovanni Giacobbi
> (whose draft PR[2] had no further discussion so far), and maybe for some
> uncertainties regarding some less used compilers.

Giovanni's remark that this would impact many people was challenged by
Jakub [1] which didn't get a response. I believe it's safe to assume
that this isn't the case.

> And if we're going with C11, figuring out the details (which
> configuration check to use, or only documenting the requirement) can
> still be decided somehwat later, in my opinion.

The consensus seems to be that bumping to C11 is ok. While keeping C99
compatibility wouldn't be a big hassle, it doesn't seem necessary
anymore.

I would agree with Christoph that documentation is enough for now. If
your compiler happens to support all the C11 features we require,
without full C11 compliance, then that's ok. If we want to reduce bug
reports, a configure check for typedef redeclarations, nudging people
in the right direction with an appropriate message might help.

Verifying C11 support might be tricky, given that the compiler flags
are not standardized. But I'm not a build system expert, there might
be an approach I'm not aware of.

Ilija


Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-05 Thread Ilija Tovilo
On Mon, Aug 5, 2024 at 3:33 PM Nick Lockheart  wrote:
>
> This is a different problem that could be solved by a sandbox API.

Not sure which case we were talking about then. ClockMock is what I've
been referencing all along.

> > Well, ok. But then we're back to prefixing global calls, which
> > defeats the purpose of the proposal.
>
> Global functions would only need a prefix `\` in the *very rare* cases
> where local functions are set as the default. For most people, the \
> would be omitted, as globals would be set as default for unqualified
> function names.

Right. But apart from mocking, what are these cases? If performance
were no longer an issue, "using global functions" just makes the
language harder to use, removing the local fallback. "using local
functions" may be useful in namespaced code making many calls to
functions within the same namespace. In that case, it would probably
be more useful to switch the lookup order back instead. If you want to
pay zero performance penalty, you can prefix global calls with \.
You'd need to do that with "using local functions" anyway.

As for mocking: If the code needs to change either way, why not make
it testable in the first place, e.g. through dependency injection for
time()? At least this only requires changing the calls that are
mocked, instead of all the calls that aren't.

The main benefit of the approach from ClockMock is that your code
(probably) doesn't need to change. I do think that the entire approach
is hacky, and probably worth solving on a language-level, at least if
possible without adding limitations to the engine. A good first start
would be to know what functions are commonly mocked with this
approach.

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-05 Thread Ilija Tovilo
On Mon, Aug 5, 2024 at 1:23 PM Derick Rethans  wrote:
>
> On Fri, 2 Aug 2024, Ilija Tovilo wrote:
>
> > As for providing a migration path: One approach might be to introduce
> > an INI setting that performs the function lookup in both local and
> > global scope at run-time, and informs the user about the behavioral
> > change in the future.
>
> That INI setting would control the *warning*, and not the
> *functionlity*, right?

Yes, that was my suggestion. First, in a future minor version, an INI
option could be added that would warn when finding both a local and
global function when performing an unqualified function call. The only
reason to hide this behind a setting is to avoid the cost of a double
lookup in production code when one isn't necessary, i.e. when calling
local functions in some namespace.

> I am surprised that it is that much of a performance benefit as well,
> but I am also concerned about the BC impact. But if that isn't too much,
> then I guess we need to consider this, but only for a major version. Not
> something I believe we can change in a 8.x version.

Sure, waiting for 9.0 sounds reasonable if we were to choose this approach.

Ilija


Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-05 Thread Ilija Tovilo
On Mon, Aug 5, 2024 at 2:27 PM Nick Lockheart  wrote:
>
>
> > I'm not sure your proposal solves the mocking problem. If the engine
> > is to interpret all non-fq calls as global or local, how would a
> > library include your file while switching this configuration, when it
> > is implemented as some directive in the file?
>
> I'm not sure I understand this question.

Consider this example:

 > Also, how would only singular functions be mocked when there is no
> > fallback to the global scope for the rest of the functions used
> > within the file? That would necessitate mocking all functions, even
> > the unmodified ones.
>
> If a developer needed to override built-in functions (in a specific
> file) with local ones, and still use some built-in functions, they
> would then need to use the fully-qualified `\function();` to call the
> built-in.

Well, ok. But then we're back to prefixing global calls, which defeats
the purpose of the proposal. I think it's much preferable to look for
a different, more robust mocking solution that also works for
unnamespaced code and fully qualified calls.

Ilija


Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-05 Thread Ilija Tovilo
On Mon, Aug 5, 2024 at 10:22 AM Nick Lockheart  wrote:
>
> > Sorry, my language was not precise enough. Your proposal suggests
> > making unqualified calls global when the directive is present,
> > whereas my proposal suggests keeping local scope as a fallback, hence
> > the two not being compatible.
>
> Keeping local scope as a fallback would still require an NS lookup
> opcode.

In the general sense: Yes, we would need to keep the NS lookup opcode.
However, functions that are known to exist in global scope would not
compile to it. The same goes for your solution, as code that doesn't
opt-into disambiguated lookups will still require the
ZEND_INIT_NS_FCALL_BY_NAME opcode.

> My proposal is that there still be an NS lookup as the default
> behavior, but allow an optional override.
>
> That is, users would have three options:
>
> (1) resolve all unqualified function names to global (no NS lookup)
> (2) resolve all unqualified function names to local (no NS lookup)
> (3) Use the default behavior, which at present, is local first, then
> global, with a namespace lookup. (no change to user code, keeps BC)
>
> Your proposal reverses the order of the default NS lookup from local
> first to global first.
>
> That's not mutually exclusive.

Sure. But it's questionable whether the motivation of introducing one
when the other already exists is still there.

> If my proposal *and* yours were *both* accepted, users would have these
> options:
>
> (1) resolve all unqualified function names to global (no NS lookup)
> (2) resolve all unqualified function names to local (no NS lookup)
> (3) Use the default behavior, which would now be global first, then
> local, with a namespace lookup. (no change to user code, keeps BC
> working, but changes the default automatic lookup order)
>
> I believe that my propsal also helps yours be more viable, too. Because
> if your proposal was accepted alone, and the NS lookup order changed,
> then it would no longer be possible to do unit testing with stubs that
> replace built-in functions, UNLESS there was also an option for
> developers to instruct the compiler to use local functions over global
> ones.

I'm not sure your proposal solves the mocking problem. If the engine
is to interpret all non-fq calls as global or local, how would a
library include your file while switching this configuration, when it
is implemented as some directive in the file? Also, how would only
singular functions be mocked when there is no fallback to the global
scope for the rest of the functions used within the file? That would
necessitate mocking all functions, even the unmodified ones.

> > What I'm saying is that:
> >
> > 1. If the vote fails, it might have been because some people don't
> > want opt-in behavior. Niels just voiced this opinion.
>
> Every token added to a PHP file is opt-in behavior that changes how the
> parser works.

I don't understand what you mean. What Niels was suggesting is that
adding more context dependency is undesired. I.e. language semantics
should not be dependent on configuration.

> I think something needs to be at least as formal as an RFC or people
> won't take it seriously enough to contribute feedback.
>
> I'm trying to start a discussion on this and get some momentum so that
> we can hopefully implement something in a near future version.

Fair enough. But it should be made clearer what is being voted on.
>From reading the RFC, it is not clear that there even are alternative
options, which ones have been proposed before, what points have been
raised in the discussion, etc.

Ilija


Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-04 Thread Ilija Tovilo
On Sun, Aug 4, 2024 at 8:41 PM Nick Lockheart  wrote:
>
>
> >
> > But that's not quite what the RFC says:
> >
> > > I am asking that we discuss and vote on the following question:
> > >
> > > “Should there be some way for developers to signal to the parser at
> > > compile time that all unqualified function names found in a
> > > namespace context are global, without a namespace lookup?”
> >
> > Which implies:
> >
> > 1. There is _some_ change to syntax.
> > 2. All unqualified calls become global.
>
> I did not intend for all unqualified calls to become global, unless the
> new directive is present.

Sorry, my language was not precise enough. Your proposal suggests
making unqualified calls global when the directive is present, whereas
my proposal suggests keeping local scope as a fallback, hence the two
not being compatible.

What I'm saying is that:

1. If the vote fails, it might have been because some people don't
want opt-in behavior. Niels just voiced this opinion.
2. If the vote is accepted, it would void my suggestion, because it is
not compatible with the conditions laid out in your proposal. If this
were a straw poll, rather than an RFC vote, it would be clearer that
there is no mandatory approach or policy being accepted.

A vote with multiple options to pick an approach (opt-in through some
directive, hard BC break, flipping lookup order, nothing at all) might
be more appropriate.

Ilija


Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-04 Thread Ilija Tovilo
Hi Nick

On Sun, Aug 4, 2024 at 7:29 PM Nick Lockheart  wrote:
>
> So I wanted to get a "yes" from the people who need to say yes, then
> discuss all of those things you mentioned:
>
> 1. File level vs global
> 2. Syntax
> 3. Alternative options
>
> But I think we need a "yes" for the concept first. Otherwise the vote
> will fail on the details, as they have in the past.

But that's not quite what the RFC says:

> I am asking that we discuss and vote on the following question:
>
> “Should there be some way for developers to signal to the parser at compile 
> time that all unqualified function names found in a namespace context are 
> global, without a namespace lookup?”

Which implies:

1. There is _some_ change to syntax.
2. All unqualified calls become global.

So, it doesn't seem like there's room for alternative approaches
within the definition of your RFC. Both of these points are not
compatible with my proposal.

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-04 Thread Ilija Tovilo
Hi Claude

On Fri, Aug 2, 2024 at 9:02 PM Claude Pache  wrote:
>
> I propose the following alternative approach:
>
> * establish a restricted whitelist of global functions for which the 
> performance gain would be noteworthy if there wasn’t any need to look at 
> local scope first;
>
> * for those functions, disallow to define a function of same name in any 
> namespace, e.g.: https://3v4l.org/RKnZt
>
> That way, those functions could be optimised, but the current semantics of 
> namespace lookup would remain unchanged.

That would be an improvement over the status quo. However, if you look
at the bullet points in my original email, while some of the
optimizations apply only to some functions (CTE, custom opcodes and
frameless calls), others apply to all internal, global functions
(double-lookup, lookup by offset, specialized argument passing).
Hence, we may only get a fraction of the benefits by restricting the
optimization to a handful of functions. I also wonder if the impact is
actually bigger, as then there's no workaround for redeclaring the
function, requiring much bigger refactoring.

Ilija


Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-04 Thread Ilija Tovilo
Hi Rob

On Fri, Aug 2, 2024 at 7:10 PM Rob Landers  wrote:
>
> So, what you’re saying is that symfony and laravel can get a performance 
> increase by simply adding a \ in the right places? Why don’t they do that 
> instead of changing the language?

Nothing, of course. However, a Symfony maintainer has expressed
uninterest in prefixing all internal function calls, including
automated use statements at the top of the file. Even if they did,
most users will not.

> For functions/classes in the same exact namespace, you don’t need a use 
> statement. But after this change, you do in certain cases?
>
> namespace Foo;
>
> function array_sum($bar) {}
>
> function baz($bar) {
>   return array_sum($bar);
> }
>
> So, how do you use that function in the same file?

Yes. But I'm not sure how that's different from today? If there's a
local and global function declared with the same name, and you intend
to call the global one, you'll already need to disambiguate the call
with a \.

With this change, your two options would be to:

* Prefix your calls with namespace\. That's quite ugly, but is the
syntax we currently offer.
* Add a `use array_sum;` to the top of the file.

An explicit use has upsides too. It makes it much more obvious that
the global function is shadowed.

> We can only see open source code when doing impact analysis. This means 
> picking even a slightly “popular” name could go very poorly.

Yes, and there are many more than 10 000 composer repositories. An
impact analysis can give you an approximation for breakage, not
absolute numbers.

Ilija


Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-04 Thread Ilija Tovilo
Hi Nick

I find it a bit unfortunate that you gave my thread barely any time to
be discussed.

On Sun, Aug 4, 2024 at 1:42 PM Nick Lockheart  wrote:
>
> I would like to discuss and then vote on this proposal as a feature,
> without getting into any specifics of syntax.
>
> I propose that we vote yes/no on if there should be some way, whatever
> that way ends up being, to tell the parser to always treat unqualified
> function names as global.

This can be achieved in various ways. For example:

* Per-file, via declare(), new use syntax, or whatnot.
* Globally, through an INI setting.
* Via hard cut in a new PHP version, where a relative calls just stop
looking up local scope.
* By flipping lookup order, as proposed in my last thread.

One might be in favor of one approach but not others. Of course,
everybody will be in favor of a "free" 2-4% speedup. Hence, a yes vote
won't mean much.

Ilija


[PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-02 Thread Ilija Tovilo
Hi everyone

As you probably know, a common performance optimization in PHP is to
prefix global function calls in namespaced code with a `\`. In
namespaced code, relative function calls (meaning, not prefixed with
`\`, not imported and not containing multiple namespace components)
will be looked up in the current namespace before falling back to the
global namespace. Prefixing the function name with `\` disambiguates
the called function by always picking the global function.

Not knowing exactly which function is called at compile time has a
couple of downsides to this:

* It leads to the aforementioned double-lookup.
* It prevents compile-time-evaluation of pure internal functions.
* It prevents compiling to specialized  opcodes for specialized
internal functions (e.g. strlen()).
* It requires branching for frameless functions [1].
* It prevents an optimization that looks up internal functions by
offset rather than by name [2].
* It prevents compiling to more specialized argument sending opcodes
because of unknown by-value/by-reference passing.

All of these are enabled by disambiguating the call. Unfortunately,
prefixing all calls with `\`, or adding a `use function` at the top of
every file is annoying and noisy. We recently got a feature request to
change how functions are looked up [3]. The approach that appears to
cause the smallest backwards incompatibility is to flip the order in
which functions are looked up: Check in global scope first, and only
then in local scope. With this approach, if we can find a global
function at compile-time, we know this is the function that will be
picked at run-time, hence automatically enabling the optimizations
above. I created a PoC implementing this approach [4].

Máté has kindly benchmarked the patch, measuring an improvement of
~3.9% for Laravel, and ~2.1% for Symfony
(https://gist.github.com/kocsismate/75be09bf6011630ebd40a478682d6c17).
This seems quite significant, given that no changes were required in
either of these two codebases.

There are a few noteworthy downsides:

* Unqualified calls to functions in the same namespace would be
slightly slower, because they now involve checking global scope first.
I believe that unqualified, global calls are much more common, so this
change should still result in a net positive. It's also possible to
avoid this cost by adding a `use function` to the top of the file.
* Introducing new functions in the global namespace could cause a BC
break for unqualified calls, if the function happens to have the same
name. This is unfortunate, but likely rare. Since new functions are
only introduced in minor/major versions, this should be manageable,
but must be considered for every PHP upgrade.
* Some mocking libraries (e.g. Symfony's ClockMock [5]) intentionally
declare functions called from some file in the files namespace to
intercept these calls. This use-case would break. That said, it is
somewhat of a fragile approach to begin with, given that it wouldn't
work for fully qualified calls, or unnamespaced code.

I performed a small impact analysis [6]. There are 484 namespaced
functions shadowing global, internal functions in the top 1000
composer packages. However, the vast majority (464) of these functions
come from thecodingmachine/safe, whose entire purpose is offering
safer wrappers around internal functions. Excluding this library,
there are only 20 shadowing functions, which is surprisingly little.
Furthermore, the patch would have no impact on users of
thecodingmachine/safe, only on the library code itself.

As for providing a migration path: One approach might be to introduce
an INI setting that performs the function lookup in both local and
global scope at run-time, and informs the user about the behavioral
change in the future. To mitigate it, an explicit `use function` would
need to be added to the top of the file, or the call would need to be
prefixed with `namespace\`. The impact analysis [6] also provides a
script that looks for shadowing functions in your project. It does not
identify uses of these functions (yet), just their declarations.

Lastly, I've already raised this idea in the PHP Foundations internal
chat but did not receive much positive feedback, mostly due to fear of
the potential BC impact. I'm not particularly convinced this is an
issue, given the impact analysis. Given the surprisingly large
performance benefits, I was inclined to raise it here anyway. It also
sparked some related ideas, like providing modules that lock
namespaces and optimize multiple files as a singular unit. That said,
such approaches would likely be significantly more complex than the
approach proposed here (~30 lines of C code).

Anyway, please let me know about possible concerns, broken use-cases,
or any alternative approaches that may come to mind. I'm looking
forward to your feedback.

Ilija

[1] https://github.com/php/php-src/pull/12461
[2] https://github.com/php/php-src/pull/13634
[3] https://github.com/php/php-src/issues/

[PHP-DEV] Require C11 in PHP 8.4

2024-08-01 Thread Ilija Tovilo
Hi everyone

We've gotten a bug report about compile errors in the PHP 8.4 alpha on
old C99 compilers (and on some modern compilers when passing the
-std=c99 flag). Specifically, C99 does not support typedef
redeclarations, which is a C11 feature.

```c
// some_header.h
typedef struct _zval_struct zval;
void some_func(zval *zv);

// zend_types.h
struct _zval_struct { ... };
typedef struct _zval_struct zval;
```

Some headers might want to forward declare zval so that zend_types.h
doesn't have to be included. The two primary reasons you might want to
avoid that are 1. to reduce compile times if the header is very large
and 2. to prevent recursive header dependencies.

However, in C99 this code is actually illegal if both of these headers
end up being included, because redeclarations of typedefs are not
allowed. To fix it, the typedef must be removed and the signatures
must refer to the struct directly.

```c
// some_header.h
struct _zval_struct;
void some_func(struct _zval_struct *zv);
```

I started fixing these in a PR [1] which required more changes than
expected. After a short discourse, we were wondering whether it might
be better to switch to a newer C standard instead. Our coding
standards [2] currently specify that compiling php-src requires C99.
The Unix installation page on php.net [3] claims it is ANSI C, which
is certainly outdated. There have been suggestions to require C11 for
a while, which should be well supported by all compilers shipped with
maintained distributions.

GCC gained support for C11 in 4.7 [4], LLVM Clang in 3.1, both
released in 2012. For context, Debian Bullseye comes with GCC 10.2
[5], Ubuntu Focal Fossa comes with GCC 9.3 [6], RHEL 8 comes with GCC
8.x [7]. The biggest blocker in the past was MSVC, which has finally
added C11 support in Visual Studio 2019 [8]. Technically, Visual
Studio 2015 and 2017 are still supported by Microsoft [9], but
according to Christoph they are no longer used for PHP builds since
version 8.

Hence, it seems like it would be ok to bump our C compiler requirement
to C11. We'd like to make this change before beta 1 if there are no
objections. There are no immediate plans to make non-optional use of
other C11 features, although that is conceivable at some point.

Another benefit brought up by Christoph is that C11 relegated the
requirement for the compiler to support VLAs (variable-length arrays)
which was never implemented by MSVC, technically not complying with
the currently required standard.

If you do have any concerns, please let me know.

Ilija

[1] https://github.com/php/php-src/pull/15079
[2] 
[https://github.com/php/php-src/blob/8b6f14a9a221599b076e4bc3570f013a7e65aa21/CODING_STANDARDS.md?plain=1#L12](https://github.com/php/php-src/blob/8b6f14a9a221599b076e4bc3570f013a7e65aa21/CODING_STANDARDS.md?plain=1#L12)
[3] https://www.php.net/manual/en/install.unix.php
[4] https://stackoverflow.com/a/16256596
[5] https://packages.debian.org/bullseye/gcc
[6] https://packages.ubuntu.com/focal/gcc
[7] https://access.redhat.com/solutions/19458
[8] 
https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/
[9] https://learn.microsoft.com/en-us/visualstudio/productinfo/vs-servicing


Re: [PHP-DEV] Request for opinions: bug vs feature - change intokenization of yield from

2024-07-21 Thread Ilija Tovilo
Hi Juliette

On Sat, Jul 20, 2024 at 6:40 PM Juliette Reinders Folmer
 wrote:
>
> On 20-7-2024 18:04, Tim Düsterhus wrote:
>
> As I've said: I agree that the current situation is unfortunate. But the 
> correct solution is not "disallow comments", but "split the T_YIELD_FROM into 
> T_YIELD T_WHITESPACE T_FROM_FOR_YIELD_FROM".
>
> Tim, you're making my point for me. This is *exactly* why the current change 
> should be reverted.
>
> I'm not against changing the tokenization of "yield from" and the GH ticket 
> thread also contains an alternative proposal for this from Bob [1], but like 
> Matthew also said [2]: if that's something we want to do, let's have a proper 
> discussion about it and let it go through an RFC and be a documented change.
>
> And not, like it is now, an undocumented, random change creating an 
> inconsistency in the Tokenizer.

Thank you for raising this issue. Let me share my viewpoint as the
person who has made the change.

First of all, while allowing comments between yield and from was
obviously intentional, the side-effect of embedding the comment in the
yield_from token was something I did not sufficiently consider. That's
my bad.

I agree that it's not very likely that many people started placing
comments between yield and from, but certainly not impossible. The
main problem with reverting this change is that such code will
completely stop working. Patch updates are routinely installed on
servers without additional testing, so it's very much possible that we
would brick peoples production environments.

To compare the impact on PHP_CodeSniffer: From what I understand
(please correct me if I'm wrong), there are really only two possible
scenarios when encountering comments between yield and from:

* PHP_CodeSniffer doesn't see the comment and acts as if it weren't
there, and potentially removes it when fixing errors automatically.
* PHP_CodeSniffer thinks this is a syntax error and thus reports it.

While that's not optimal, it is limited to a subset of the already
small set of people using this new feature, namely the ones also using
PHP_CodeSniffer. It is also limited to development environments, where
PHP_CodeSniffer is executed.

What I'd suggest instead is fixing this for master in a way that makes
it simple for PHP_CodeSniffer to add support, or to just revert it
there. If it is indeed true that nobody uses this feature after 8
months, then waiting another 4 for a proper fix doesn't seem like an
issue.

In the likely case of not finding consensus on this issue: Setting up
a vote doesn't seem straight-forward to me either. Would we consider
8.3 the baseline, with reverting the change requiring the 2/3
majority, or do we consider 8.2 the baseline?

Ilija


Re: [PHP-DEV] [RFC] Asymmetric Visibility, v2

2024-07-20 Thread Ilija Tovilo
Hi Rob

On Sat, Jul 20, 2024 at 3:47 PM Rob Landers  wrote:
>
> On Sat, Jul 20, 2024, at 03:14, Larry Garfield wrote:
>
> On Wed, May 29, 2024, at 2:15 PM, Larry Garfield wrote:
> >
> > https://wiki.php.net/rfc/asymmetric-visibility-v2
>
> Hi folks.  After a side quest to polish off hooks, we're nearly ready to 
> bring aviz to a vote.
>
> We've made one change since we last discussed it:  Specifically, Ilija 
> realized that __set's behavior is already inconsistent, so supporting it for 
> aviz properties with invisible set would make it even more inconsistent, not 
> less.  For that reason, we've changed the __set behavior such that a 
> non-readonly aviz property will not trigger __set.  Further details are in 
> the RFC, but in short, all of the use cases for that behavior now have better 
> alternatives, such as property types, hooks, and aviz itself.  So there's 
> really no point to falling back to __set in edge cases.
>
> https://wiki.php.net/rfc/asymmetric-visibility-v2#interaction_with_set_and_unset
>
> Hmm,
>
> There’s a lot of existing code that relies on this behavior, it’s pretty much 
> the best way to create proxies without requiring code generation.
>
> This is a pretty massive breaking change.

There was a miscommunication between Larry and me. The change is not
to any existing behavior. __get/__set are currently called under two
circumstances:

* The properties _full_ visibility is not met.
* The property was explicitly unset.

We're not changing this behavior. Instead, we decided not calling
__set for asymmetric visibility, when only the set visibility isn't
met. Before making this decision, implicitly implying protected(set)
for a readonly property would have led to __set being called (because
the scope protection now comes from asymmetric visibility, rather than
readonly itself), which would have been a change to the current
behavior.

So, in short: If only the set visibility isn't met, we're now throwing
an error. This is consistent with readonly today, and with get-only
hooks. If somebody wants to change this in the future, they can do so
without any BC breaks, and most likely should make the behavior
consistent across all of these three cases.

Ilija


Re: [PHP-DEV] Sync CODEOWNERS

2024-07-18 Thread Ilija Tovilo
Hi Christoph

On Thu, Jul 18, 2024 at 2:09 PM Christoph M. Becker  wrote:
>
> So I suggest to sync CODEOWNERS for all active branches (and maybe even
> security branches).
>
> What do you think?

I think back when it was introduced it wasn't clear (at least to me)
that PRs would request reviews based on the CODEOWNERS file of the
target branch, rather than the default branch. Given that's how it
works, syncing it in all active branches makes sense to me.

Ilija


Re: [PHP-DEV] [RFC] Property Hook improvements

2024-07-10 Thread Ilija Tovilo
Hi Derick

On Wed, Jul 10, 2024 at 11:32 AM Derick Rethans  wrote:
>
> On Mon, 1 Jul 2024, Larry Garfield wrote:
>
> >
> > https://wiki.php.net/rfc/hook_improvements
>
> 1. Remove the proactive guard against recursive backing value access
>
> "which would eventually trigger a stack overflow"
>
> Would it though? PHP 8.4 has an actual protection against this now,
> instead of getting the white-screen-of-death.
>
> Not that it matters much, but it's probably good to be precise in the
> language.

Yes, this excerpt is referring to the PHP stack overflow detection,
which will throw an exception rather than segfault. That is, on all
the architectures that already support stack overflow detection. We're
just reusing the same mechanism already used for normal function
calls.

---

Hi Claude

> 1. Removing the guard against recursion is sensible to me, as the sort of bug 
> it is supposed to prevent is not specific to property accessors. IMO, a 
> better solution to the issue is to implement a global upper limit on the call 
> stack size. Currently, I am able to generate a call stack of more than 10 
> millions items before an OOM error occurs; PHP should have thrown a stack 
> overflow error long before that. But this is entirely orthogonal to property 
> hooks.

PHP allows using recursion for algorithms on arbitrarily large data
because function calls do not cause the C stack to grow. Limiting the
PHP stack size would thus require these algorithms to be rewritten to
iterative ones, which can make them more complicated. What we could do
is check for stack size on OOM and indicate that the OOM was most
likely caused by infinite recursion.

Anyway, This does not really apply to hooks, because it will (usually)
cause VM reentry for recursion, growing the stack size and thus
triggering the "fast" stack overflow detection error.

Ilija


Re: [PHP-DEV] Re: VCS Account Request: youkidearitai

2024-07-01 Thread Ilija Tovilo
Hi Yuya!

On Sat, Jun 29, 2024 at 11:11 PM youkidearitai  wrote:
>
> 2024年4月9日(火) 7:55 youkidearitai :
> >
> > Mainly review and approve pull request to mbstring extension.
> > (probably everything related to Unicode and other character encoding)
> > Alex Dowad (alexdowad) suggested that give to me.
> > https://github.com/php/php-src/pull/13906#issuecomment-2041585626
>
> Hello, Internals
> I want to continue to contribute to mbstring extension.
> And also I want become to code owner to mbstring extension.
> Could I become grant write access?
>
> Please check below:
> https://github.com/php/php-src/commit/0bcc1e613bc8b62859fc75afeeb83651555fb44a

Sorry for the delay. I granted you access to the php-src repository.
Please familiarize yourself with our merging process [1]. Note that I
unfortunately cannot (and wouldn't know how to) create main.php.net
accounts. They are pretty much unused nowadays except for the wiki. I
can ask Derick to create one once he's back from vacation.

Feel free to ping me if you have any questions. Thank you for
contributing to PHP!

Ilija

[1] https://wiki.php.net/vcs/gitworkflow


Re: [PHP-DEV] [Early Feedback] Pattern matching

2024-06-26 Thread Ilija Tovilo
Hi Rob

On Wed, Jun 26, 2024 at 10:36 PM Rob Landers  wrote:
>
>> On Wed, Jun 26, 2024, at 21:50, Morgan wrote:
>>
>> So the issue has nothing to do with this hypothetical infinity of
>> unobservable nulls, and comes entirely down to the fact that with this
>> pattern a variable may pass
>> a) because it does not have a key named 'foo', or
>> b) because it has a key named 'foo' with a string value.
>>
>> In other words, "this key is optional, but if it is defined it must
> match this pattern".
>
> Seriously, write out using it both ways. I asked in the beginning for someone 
> to give a realistic example showing a practical difference in the final 
> implementation and I haven’t seen one. I will gracefully eat my hat. The main 
> issue is that key-existence-check is logically inconsistent with itself (if 
> you say the key shouldn’t exist or be a string, you’d be surprised to get 
> null from that key!)

function test($value) {
if ($value is ['foo' => ?string]) {
$foo = $value['foo'];
}
}
test([]);

With your approach, the example above would emit a warning, even
though the context within test() doesn't look like it should. If
?string means that the index might or might not exist, all code that
accesses them must check for existence, even when not needing to
handle null itself. That doesn't seem desirable to me.

I also think the issue goes further. If anything|null means that the
offset might not exist, does that include mixed? That makes ['foo' =>
mixed] essentially useless.

Ilija


Re: [PHP-DEV] [Early Feedback] Pattern matching

2024-06-25 Thread Ilija Tovilo
Hi Rob

On Tue, Jun 25, 2024 at 9:05 AM Rob Landers  wrote:
>
> On Tue, Jun 25, 2024, at 01:20, Ilija Tovilo wrote:
>
> On Mon, Jun 24, 2024 at 9:54 PM Robert Landers  
> wrote:
> >
> > To be honest, this is one of the smaller concerns I have with the new
> > syntax. There might be some misunderstanding here, though. A
> > non-existent key is NULL, always has been, and always will be.
>
> This is just not accurate. Inexistent indexes are not null in PHP,
> they are undefined. PHP implicitly coerces undefined to null, because
> undefined is not a value accessible to users. The same occurs when
> accessing $undefinedVariable. For arrays, this fact is observable
> through `foreach`, warnings when accessing the index, and likely
> others.
>
> This is a bit like telling someone who fell off a ladder that they didn’t 
> “technically” fall, instead the Earth and them pulled at each other until 
> they collided and the ground + body absorbed the energy.
>
> While yes, you are “technically” correct, what you describe is essentially 
> unobservable from the context of the running code (unless you turn the 
> warning into an error/exception). For all direct accesses of array values 
> ($arr['key']) an array is infinitely full of nulls (I have actually depended 
> on this property at one point for a bloom filter).

If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.

> So yes, `[?'foo' => string]` and `['foo' => ?string]` are indeed
> different. The former accepts `[]`, while the latter accepts `['foo'
> => null]`.
>
> Are they actually different in practice though? That was my point. After the 
> “is” in both cases, you’ll have to use null-coalescence to retrieve the 
> value. For all intents, they are the same resulting code. If you can show a 
> difference in the resulting code and how it is an improvement, I may be 
> inclined to agree, but I can’t think of one.

Sure. If a null value were to mean "not set", then ['foo' => string]
should accept ['foo' => 'foo', 'bar' => null], which is absolutely
observable if the code assumes it won't see any additional indexes.

Ilija


Re: [PHP-DEV] [Early Feedback] Pattern matching

2024-06-24 Thread Ilija Tovilo
On Mon, Jun 24, 2024 at 9:54 PM Robert Landers  wrote:
>
> > The first means b is an optional key, but if it’s there, can only be a 
> > string. The second says b is a required key, but it may be a string or 
> > null. If there were a binding involved, that determines the type of the 
> > binding in incompatible ways.  I’m fine with requiring bindings to be 
> > nullable for optional keys, but it strikes me as strictly less flexible and 
> > not consistent with the rest of PHP’s behavior, at least not under E_ALL.
> >
>
> To be honest, this is one of the smaller concerns I have with the new
> syntax. There might be some misunderstanding here, though. A
> non-existent key is NULL, always has been, and always will be.

This is just not accurate. Inexistent indexes are not null in PHP,
they are undefined. PHP implicitly coerces undefined to null, because
undefined is not a value accessible to users. The same occurs when
accessing $undefinedVariable. For arrays, this fact is observable
through `foreach`, warnings when accessing the index, and likely
others.

So yes, `[?'foo' => string]` and `['foo' => ?string]` are indeed
different. The former accepts `[]`, while the latter accepts `['foo'
=> null]`.

> $arr = ['a' => 'a string'];
> $arr is ['a' => string, ?'b' => $value, ...];
>
> This syntax implies that a non-existent key is a special case, and if
> it passes as-is, it will be. If there is a binding and the key is
> missing, what happens to that binding?

This is the same problem as `|`. Variable bindings within optional
keys must be forbidden. I already mentioned that to Larry when we
thought about this idea.

Ilija


Re: [PHP-DEV] Re: RFC karma request

2024-06-20 Thread Ilija Tovilo
Hi Bilage

On Thu, Jun 20, 2024 at 7:55 PM Bilge  wrote:
> >
> > Re: Static class, I would like to propose an RFC for this feature. My
> > wiki account is: bilge.
> >
> > Cheers,
> > Bilge
> >
> If there is some reason why I should not be able to do this, please let
> it be known.

Sorry for the delay. I granted you RFC karma. Good luck!

Ilija


Re: [PHP-DEV] [RFC] Static Constructor

2024-06-19 Thread Ilija Tovilo
Hi Erick

On Wed, Jun 19, 2024 at 2:34 PM Erick de Azevedo Lima
 wrote:
>
> You can read the RFC here:
> https://wiki.php.net/rfc/static_constructor

I see that you're using zend_class_init_statics() as a hook to call
__static_construct(). This makes the initialization order
unpredictable, because static properties are initialized lazily when
the class is first used (when instantiated, when accessing constants,
etc.). Essentially, this recreates the same problem described in the
"new in initializer" RFC:

https://wiki.php.net/rfc/new_in_initializers#unsupported_positions

> New expressions continue to not be supported in (static and non-static) 
> property initializers and class constant initializers. The reasons for this 
> are twofold:
> [snip]
> For static property initializers and class constant initializers a different 
> evaluation order issue arises. Currently, these initializers are evaluated 
> lazily the first time a class is used in a certain way (e.g. instantiated). 
> Once initializers can contain potentially side-effecting expressions, it 
> would be preferable to have a more well-defined evaluation order. However, 
> the straightforward approach of evaluating initilizers when the class is 
> declared would break certain existing code patterns. In particular, 
> referencing a class that is declared later in the same file would no longer 
> work.

Lazy evaluation might be ok if order is clearly defined. Making the
order undefined makes it hard (or impossible) to understand which
symbols declared in the current file may be used from
__static_construct(). The alternative mentioned above (calling
__static_construct() at class-declaration-time) likely breaks too much
existing code (because it would also require calling static
initializers just before that, which may reference symbols declared
later on), and is further complicated by early-binding. I'm not sure
what the best approach is here.

Ilija


Re: [PHP-DEV] Wiki registration and request RFC karma

2024-06-18 Thread Ilija Tovilo
Hi Erick

On Mon, Jun 17, 2024 at 11:46 PM Erick de Azevedo Lima
 wrote:
>
> I've been reading the internals mailing list for some time and giving my 2 
> cents here and there.
> Now I have created a new wiki account (erickcomp) and I'd like to request RFC 
> karma for a new RFC I worked on during my recent studies of the php-src.

I have granted you RFC karma. Good luck!

Note that the RFC process suggests first introducing the concept on
the list first, before writing an RFC. You might save yourself some
time if the idea turns out not to be desired.

Ilija


Re: [PHP-DEV] Property hooks, and &get-only write behavior

2024-06-13 Thread Ilija Tovilo
Hi Gina

On Thu, Jun 13, 2024 at 3:24 AM Gina P. Banyard  wrote:
>
> On Wednesday, 12 June 2024 at 16:06, Ilija Tovilo  
> wrote:
>
> > Hi everyone
> >
> > [...]
> >
> > I hope that these semantics are acceptable for everyone.
> >
> > Ilija
>
> Hello Ilija,
>
> I might know what sort of technical issues you've been running into, as I 
> have some similar issues correctly handling references with my 
> Container/Offset RFC implementation.
> The way I'm handling this is by splitting the "get"/read handler/hook to 
> *just* be a by-value return, and for fetching creating a new "fetch" handler 
> that is called in the respective situations.
> Therefore, I am wondering if it wouldn't possibly make more sense to add a 
> "fetch" hook, and ban returning by-ref for the get hook.

I don't know exactly what issue you encountered, but it sounds
unrelated to mine. The issue was not related to the design of &get,
but the way object handlers work for properties. In particular,
get_property_ptr_ptr does not provide zval space for temporary values.

There are three object handlers relevant:

* get_property_ptr_ptr, which returns a direct pointer to the property
zval. This is usually used for read+write operations such as ++, or
when assigning by-ref, like =&.
* read_property
* write_property

read_property and write_property are also used as fallbacks when
get_property_ptr_ptr returns NULL. This is the case for hooks.
Essentially, for hooks, we want to make sure the engine does not
bypass the set hook by writing to the zval directly.

Pure assignments aren't really an issue, I was able to implement them
using write_property. When a value is assigned to a &get-only
property, we call &get and assign the value to the resulting
reference. However, for read+write operations, the flow works like
this:

* get_property_ptr_ptr returns NULL, indicating that a direct
modification of the zval is not possible.
* The engine calls read_property instead.
* It modifies the resulting value.
* It calls write_property with the new value.

With this workflow, &get gets called twice, once for read_property and
once for write_property, which is the unexpected part. I would expect
&get only to be called once, and for this value to be modified
in-place. To implement this properly, get_property_ptr_ptr should
return the reference instead.

However, the way get_property_ptr_ptr is built doesn't allow for this,
because unlike read_property, it doesn't have a `zval *rv` parameter
the caller must supply that the callee may use in the case of
temporary values (such as &__get, or here, &get).

So, rather than causing a big internal API change to add the `zend
*rv` parameter (this API is very old), I opted to go with the original
plan.

I hope that helps.

Ilija


[PHP-DEV] Property hooks, and &get-only write behavior

2024-06-12 Thread Ilija Tovilo
Hi everyone

While reviewing the property hooks patch, Bob found a discrepancy in
the implementation and the specification. There is an unfortunate
mistake in one of the examples that suggests specific by-reference
behavior that we did not intend to support.

Specifically, from https://wiki.php.net/rfc/property-hooks#references
(you'll have to scroll down a bit):

```php
class C {
public string $a {
&get {
$b = $this->a;
return $b;
}
}
}
$c = new C();
$c->a = 'beep';
// $c is unchanged.
```

This example implies that the assignment `$c->a = 'beep';` will invoke
`&get` to write to the contents of the reference. However, this is not
the case. Direct assignments will, in the absence of `set`, fall back
to the builtin write operation for backed properties, or throw for
virtual properties.

Instead of `$c->a = 'beep';`, the example _should_ have said:

```php
$ref = &$c->a;
$ref = 'beep';
```

Here, `$ref = &$c->a;` _will_ call `&get`. What the example intended
to show is that the reference returned by `&get` is detached from
`$c->a` itself.

That said, I did give this idea some thought. I think it would not
have been unreasonable to automatically call `&get` and assign to the
returned reference when there is a `&get` hook but _no_ `set` hook.
However, when trying to implement this, I encountered some technical
issues that make this currently unfeasible. So, we'd like to move
forward with the original semantics: If there is no `set` hook, then
the assignment will trigger a builtin write for backed properties, or
an error for virtual properties.

If you wish to achieve the behavior described above (using `&get` to
write to the property), you can implement `set` to perform the write
to the reference yourself.

I hope that these semantics are acceptable for everyone.

Ilija


Re: [PHP-DEV] [RFC] Asymmetric Visibility, v2

2024-06-05 Thread Ilija Tovilo
Hi Tim

On Tue, Jun 4, 2024 at 7:54 PM Tim Düsterhus  wrote:
>
> One thing that would get pretty wonky would be private-read properties:
> Private property names are currently internally "mangled" to include the
> class name. This allows to define the same private property in multiple
> classes of an inheritance chain, without those classes needing to know
> about the private properties of each other and making the addition and
> removal of a private property not a BC break. For all intents and
> purposes those private properties to not exist, unless you are the class
> itself.
>
> I have no idea what the semantics of a public-write, private-read
> property should be - and this problem is pretty similar to the
> sibling-discussion about making private-set properties implicitly final,
> because otherwise the semantics get wonky.
>
> I believe that the case of making a property public-write, private-read
> is best left to a virtual set-only hook.

Indeed. A private property with a more permissible set operation is
the wrong approach. What we'd want here is a public property with a
restricted get operation. This is not quite expressible with the
current syntax. We'd need something like `public private(get)`, or
`public $prop { private get; }` with the C# equivalent. However, this
is quite an edge case, and since it requires additional syntax I don't
think it's something we should support without a specific use-case.
You can emulate it with a set-only virtual property, if you really
want to.

Ilija


Re: [PHP-DEV] Fwd: Request for RFC Karma to Propose any_empty and all_empty Methods

2024-06-03 Thread Ilija Tovilo
Hi Elminson!

On Mon, May 27, 2024 at 6:51 PM Elminson De Oleo Baez  wrote:
>
> I hope this message finds you well. I am writing to request RFC karma for my 
> wiki account in order to propose a new RFC.
>
> My proposal involves the introduction of two new methods, any_empty and 
> all_empty, for working with arrays. These methods are designed to provide 
> boolean outputs indicating whether any of the elements in an array are empty, 
> or if all elements are empty, respectively. I believe these methods will be 
> valuable additions to PHP’s array manipulation functionalities.
>
> Below is a brief overview of the proposed methods:
>
> any_empty(array $array): bool - This method will return true if any element 
> in the provided array is empty, and false otherwise.
> all_empty(array $array): bool - This method will return true if all elements 
> in the provided array are empty, and false otherwise.
> These methods aim to simplify common array checks and improve code 
> readability and efficiency.
>
> I look forward to your approval and any guidance you can provide on moving 
> forward with this proposal.

I'm skeptical personally about these functions. empty() doesn't have
the best semantics, and it's going to be rare that all your input
types exactly follow these semantics. I think a deeper dive into
common validation requirements might be good, to see whether they can
be abstracted in some way. It's also worth noting that validation
entails much more than just yielding true or false, e.g. coercion or
graceful errors.

Anyway, I granted you RFC karma. Good luck!

Ilija


Re: [PHP-DEV] [RFC] [Vote] #[\Deprecated] attribute

2024-06-03 Thread Ilija Tovilo
Hi Matthew

On Mon, Jun 3, 2024 at 3:15 PM Matthew Weier O'Phinney
 wrote:
>
> On Wed, May 22, 2024 at 2:24 AM Benjamin Außenhofer  
> wrote:
>>
>> The vote for the RFC #[\Deprecated] attribute is now open:
>>
>> https://wiki.php.net/rfc/deprecated_attribute
>>
>> Voting will close on Wednesday 5th June, 08:00 GMT.
>
> I have voted no for a few reasons:
>
> - Ideally, I'd like to be able to mark _anything_ as deprecated. In 
> particular, not being able to mark a _class/interface/enum/etc_ as deprecated 
> makes this far less useful.

While it's true that extending #[Deprecated] to classes would be
useful, deprecation already exists as a language concept, and it can
be extended to class-like structures without a BC break.

> - The "since" parameter is basically worthless to me. It's very easy to find 
> out the last version that wasn't deprecated. What would be far more useful to 
> a consumer is an argument indicating when something will be removed (e.g. 
> $toRemoveInVersion, $versionForRemoval, etc.). This helps me as a user plan 
> for the future.

Did you vote yes in the secondary vote by accident? I voted no on the
$since parameter for the same reason:

* "How long have I not fixed this?" is not a particularly useful
question to ask. "When do I have to fix this?" is more relevant.
* The format of $since is intentionally left unstandardized, and it's
unclear (to me?) what it refers to. For example, some packages are
split into multiple, smaller ones (e.g. Doctrine) with diverging
version numbers. The sub-package version number may not be useful to
the end-user, who never requires it directly. Similarly, referencing
the main package version may be confusing, especially if the ranges of
recent main and sub-package versions overlap.

Ilija

> - The "since" parameter is basically worthless to me. It's very easy to find 
> out the last version that wasn't deprecated. What would be far more useful to 
> a consumer is an argument indicating when something will be removed (e.g. 
> $toRemoveInVersion, $versionForRemoval, etc.). This helps me as a user plan 
> for the future.
>
> --
> Matthew Weier O'Phinney
> mweierophin...@gmail.com
> https://mwop.net/
> he/him


Re: [PHP-DEV] [RFC] Transform exit() from a language construct into a standard function

2024-05-28 Thread Ilija Tovilo
On Tue, May 28, 2024 at 2:10 PM Gina P. Banyard  wrote:
>
> On Monday, 27 May 2024 at 02:31, Ilija Tovilo  wrote:
>
> > > On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard intern...@gpb.moe 
> > > wrote:
> > >
> > > > I would like to formally propose my idea for exit() as a function 
> > > > brought up to the list on 2024-02-24 [1] with the following RFC:
> > > > https://wiki.php.net/rfc/exit-as-function
> >
> >
> > As mentioned early on in private, I don't see a convincing reason to
> > remove tokenizer/parser support for exit/die. I'd rather see this
> > handled in the parser directly, by converting the standalone keywords
> > to function calls. This avoids any backwards incompatibility, and
> > avoids special handling in zend_compile_const.
>
> I must be honest, I don't really understand how parsers work, so I went with 
> what I could understand and implement.
> But I'm struggling to see a reason for keeping the token in the first place, 
> and what issues hooking into zend_compile_const() has.

Mostly because I think handling exit and die as constants is
misleading. exit; isn't a constant any more than yield; is. Instead,
you could turn exit; into the same AST as exit(); (i.e. a function
call), which will make the compiler handle it automatically. If you
wish, I can have a quick look.

> > Another thing that's probably not too important: The PR likely breaks
> > dead code elimination for exit() and die(). This could be re-added by
> > checking for the never return type instead.
>
> Checking for a never return type seems more robust if it wasn't already 
> supported by DCE.
> I will see if I can do this.

That would be great! I agree that this can be done in retrospect.

Ilija


Re: [PHP-DEV] [RFC] Transform exit() from a language construct into a standard function

2024-05-26 Thread Ilija Tovilo
Hi Gina

On Sun, May 26, 2024 at 11:47 PM Gina P. Banyard  wrote:
>
> On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard  wrote:
>
> >
> > I would like to formally propose my idea for exit() as a function brought 
> > up to the list on 2024-02-24 [1] with the following RFC:
> > https://wiki.php.net/rfc/exit-as-function
>
> As there haven't been any comments for nearly two weeks, I'm planning on 
> opening the vote for the RFC on Tuesday.

As mentioned early on in private, I don't see a convincing reason to
remove tokenizer/parser support for exit/die. I'd rather see this
handled in the parser directly, by converting the standalone keywords
to function calls. This avoids any backwards incompatibility, and
avoids special handling in zend_compile_const.

Another thing that's probably not too important: The PR likely breaks
dead code elimination for exit() and die(). This could be re-added by
checking for the never return type instead. You'd also need to
special-case the lookup of exit/die in namespaced code, where it will
always refer to the global function (as they cannot be declared in
userland).

Ilija


Re: [PHP-DEV] [DISCUSSION] Checking uninitialized class properties

2024-05-20 Thread Ilija Tovilo
On Sat, May 18, 2024 at 4:41 PM Rowan Tommins [IMSoP]
 wrote:
>
> On 18/05/2024 11:52, Luigi Cardamone wrote:
> > Are there any downsides in adding a
> > specific syntax to check if a property
> > is initialized with any value?
>
> In my opinion - and I stress that others may not share this opinion -
> the entire concept of "uninitialized properties" is a wart on the
> language, which we should be doing our best to eliminate, not adding
> more features around it.

I fully agree. The reason NULL was deemed the "billion-dollar mistake"
is not because NULL isn't a useful value, it's because it is
implicitly part of types that are commonly never NULL. For example, in
C, there's no way through the type system to convey whether a pointer
returned from a function may or may not be NULL (ignoring hints
through attributes). As such, the user must look at documentation
(which may be inaccurate), or even guess whether NULL must be handled.
A failure to do so will not result in any compilation errors, but
crashes at runtime. Luckily, PHP does not suffer from this issue. null
is only allowed when the type says so.

However, if you leave your class properties uninitialized, you're
essentially recreating this issue. How do you know whether it is safe
to access class properties? May they be uninitialized under certain
conditions? The type system has no way of conveying this information.
There's no way to know, without looking at the implementation. Static
analysis won't be able to catch it for you either.

So, as Rowan suggested, it is better to hint at this "uninitialized"
value that must be handled through the type system, currently most
conveniently through single-case enums and union types.

Ilija


Re: [PHP-DEV] [Discussion] Why can't I do "{$a::class}"?

2024-05-19 Thread Ilija Tovilo
On Sun, May 19, 2024 at 1:15 PM Ilija Tovilo  wrote:
>
> Hi Peter
>
> On Sun, May 19, 2024 at 10:30 AM Peter Stalman  wrote:
> >
> > echo " {A::$static_property} \n"; // doesn't work (unless $static_property 
> > is a variable)
> > echo " {$a::$static_property} \n"; // works
> >
> > echo " {A::static_method()} \n"; // doesn't work (just text)
> > echo " {$a::static_method()} \n"; // works
> >
> > echo " {A::constant} \n"; // doesn't work
> > echo " {$a::constant} \n"; // doesn't work either, but why?
>
> There were also suggestions to extend strings in a more generic way,
> akin to JavaScripts template strings [3], but I didn't have any use
> for this myself.
>
> [3] 
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals

What I actually wanted to reference were "tagged templates":

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates

Sorry for the confusion.

Ilija


Re: [PHP-DEV] [Discussion] Why can't I do "{$a::class}"?

2024-05-19 Thread Ilija Tovilo
Hi Peter

On Sun, May 19, 2024 at 10:30 AM Peter Stalman  wrote:
>
> echo " {A::$static_property} \n"; // doesn't work (unless $static_property is 
> a variable)
> echo " {$a::$static_property} \n"; // works
>
> echo " {A::static_method()} \n"; // doesn't work (just text)
> echo " {$a::static_method()} \n"; // works
>
> echo " {A::constant} \n"; // doesn't work
> echo " {$a::constant} \n"; // doesn't work either, but why?

It would be straightforward to allow all expressions that start with a
`$` in string interpolation, as I've noticed a couple of years ago
[1]. This restriction seems rather arbitrary.

I think I held off proposing this because I was planning on proposing
a more complete form of string interpolation [2]. However, the
backwards-compatible syntax was largely disliked, so I withdrew the
RFC.

I wasn't particularly fond of introducing more forms of strings (e.g.
$"", f"", etc.) to avoid the BC break, because there are already
plentiful:

* ''
* ""
* `` (yes, these allow interpolation)
* <

Re: [PHP-DEV] [DISCUSSION] Checking uninitialized class properties

2024-05-17 Thread Ilija Tovilo
Hi Luigi

On Fri, May 17, 2024 at 11:40 PM Luigi Cardamone
 wrote:
>
> Here is an example to describe my problem. Imagine a simple
> DTO like this:
>
> class MyDTO{
> public ?int $propA;
> public ?int $propB;
> }
>
> Imagine that a Form processor or a generic mapper fill some of
> these fields with a null value:
>
> $dto = new MyDTO();
> $dto->propA = null;
>
> Sometimes we use DTOs to handle PATCH requests and not all
> the properties are mapped with a value. In a scenario like this,
> "null" is often a valid value.
>
> At this point, I need a way to find if a property was initialized or
> not but unfortunately "isset" is not a solution. When I write:
>
> echo isset($dto->propA) ? 'init' : 'not-init';

IMO, "uninitialized" should not be treated as a value. Code outside of
the constructor should generally not have to deal with uninitialized
properties.

Instead, make sure the constructor leaves your object fully
initialized, in this case likely by setting it to null. If you need
some additional "undefined" state, I think that's better modeled in
other ways, maybe by wrapping it in an object. ADTs [1] should help
with that in the future.

enum Age {
case Unknown;
case Unborn;
case Years(int $years);
}
public Age $age;

Or:

enum Age {
case Unborn;
case Years(int $years);
}
public ?Age $age;

Silly example, but you get the point.

Ilija

[1] https://wiki.php.net/rfc/tagged_unions


Re: [PHP-DEV] [RFC] [Vote] Type Guards for Classes

2024-05-16 Thread Ilija Tovilo
Hi Patrik

On Thu, May 16, 2024 at 10:31 PM Patrik Václavek  wrote:
>
> Introduce a new type guard syntax for classes:
>
> ```php
> (Foo) $variable;
> ```
>
> This syntax will internally perform the following operations:
>
> 1. Check if `$variable` is an instance of `Foo`.
> 2. If the check fails, throw a `TypeError` with a message indicating the 
> expected and actual types.

Note that this feature is covered under the pattern matching RFC with
the "Throwing alternative" extension.
https://wiki.php.net/rfc/pattern-matching#throwing_alternative

In addition to just class types, it would support all other patterns,
including scalar types, union types, array shapes, etc.

Ilija


[PHP-DEV] Introduce ReflectionConstant

2024-04-15 Thread Ilija Tovilo
Hi everyone

We recently received a feature request to allow reflection of global
constants, primarily to check whether they are deprecated.
https://github.com/php/php-src/issues/13570

I created a simple PR to introduce a minimal `ReflectionConstant` class.
https://github.com/php/php-src/pull/13669

If there are no objections, I'd like to merge this PR in a few days.

Ilija


Re: [PHP-DEV] [RFC][Vote announcement] Property hooks

2024-04-14 Thread Ilija Tovilo
Hi Matthew

We're going to skip over the reiterations of Juliettes arguments, as
they have already been responded to.

On Thu, Apr 11, 2024 at 12:08 AM Matthew Weier O'Phinney
 wrote:
>
> On Mon, Apr 8, 2024 at 4:41 PM Ilija Tovilo  wrote:
>>
>> https://externals.io/message/122445#122667
>>
>
> 2. I'm not a huge fan of the short syntax, but the improvements in the most 
> recent draft are _mostly_ ones I can live with. The part that's still unclear 
> is when and where hooks need a `;` termination, and _why_ the `;` is used, 
> instead of `,`. When using `match()` expressions, you use `,` to separate the 
> expressions, but for some reason, the proposal uses `;` ... but only when 
> using short expressions. And if you have a full-form mixed with a short form, 
> the `;` is only needed for the short-form expression. This feels arbitrary, 
> and it will be easy to get it wrong for people comfortable with `match()` 
> statements.

I agree that the distinction of `,` and `;` isn't clear-cut. I would
categorize hooks as declarations, because they are really just
functions attached to the property. Declarations are `;`-terminated,
or have a body (`{}`) (properties, (non-)abstract methods, class
consts, etc). `,`, on the other hand, is used for lists of expressions
and other things (array elements, match cases, function arguments).

The other argument for the syntax we chose is that it's already used in C#.

> (Larry tells me that it's `match()` being weird here, but considering that 
> for many developers, their only point of reference for this sort of syntax IS 
> `match()`, making it feel like the language is ignoring its own syntax when 
> creating new syntax.)

Given my description from above, I don't believe this is true either.
Match arms most closely relate to the array element syntax, which
already uses `,`.

> 3. While I'd likely prefer Marco's approach to references (just don't allow 
> them), the fact that they mirror how `__get()` and `__set()` _currently_ work 
> gives a migration path for users who are familiar with that paradigm's 
> gotchas. In other words, it's consistent with the current language, and will 
> make migrating from `__set/__get` to hooks easier. It's a lot of complexity, 
> but the table you created helps with that. That table MUST make it to the 
> docs for the feature!

Precisely. Our decision to support references (as much as possible)
comes from the desire to maintain compatibility (as much as possible),
not only with `__get`/`__set` but also plain props.

Ilija


Re: [PHP-DEV] [RFC][Vote announcement] Property hooks

2024-04-11 Thread Ilija Tovilo
Hi Juliette

On 9-4-2024 16:03, Juliette Reinders Folmer wrote:
> On 8-4-2024 23:39, Ilija Tovilo wrote:
>>
>> https://wiki.php.net/rfc/property-hooks
>>
>
> I realize it is late in the discussion period to speak up, but for months 
> I've been trying to find the words to express my concerns in a polite and 
> constructive way and have failed.

First of all, thank you for taking the time to analyze the RFC and
form your thoughts. I know how much time and effort it takes.
Nonetheless, I hope you understand that receiving feedback of this
sort literally minutes before a vote is planned is close to the worst
way to go about it, both from a practical standpoint, but also RFC
author and reviewer morale.

Many of the points you raise have been there for many months to a
year, and many have been raised and discussed many times over. I will
try to address each, and give people some time to respond.

> And not just one syntax, but five and the differences in the semantics of the 
> function syntaxes are significant.

I think this is somewhat of a misrepresentation. For reference, you
can find the grammar here. It's only about <50 lines.

https://github.com/php/php-src/blob/bf390b47c7522fd4d130be26b8ee97520f985275/Zend/zend_language_parser.y#L1088-L1131

Notably, `get` and `set` hooks don't actually have a different
grammar. Instead, hooks have names, optional parameter lists, and an
optional short (`=>`) or long (`{}`) body. Whether all possible
combinations are considered different syntax is open to
interpretation. It's true that additional rules apply to each hook,
e.g. whether they are allowed to declare a parameter list, and how the
return value is used. I could see an argument being made for allowing
an empty parameter list (`()`) for `get`, for consistency. Let us know
if you have any other ideas to make them more congruent.

In any case, I don't believe PHP_CodeSniffer would need to handle each
combination separately.

What would your optimal solution look like? I feel this discussion
cannot progress unless we have more concrete discussion points.

> * The implicit "set" parameter, which does not have a explicit parameter, 
> does not have parentheses and has the magically created $value variable.

To be a bit more exact, the parameter list is simply inferred to
`( $value)` so that the user does not need to spell it
out. This is not akin to other magic variables we had previously, like
`$http_response_header`.

> TL;DR: this RFC tries to do too much in one go and introduces a huge amount 
> of cognitive complexity with all the exceptions and the differences in 
> behaviour between virtual and backed properties. This cognitive complexity is 
> so high that I expect that the feature will catch most developers out a lot 
> of the time.

Many people still say that this RFC doesn't do enough because it
doesn't support x/y/z. This is including you:

> * The arrow function variant for `set` with all the above differences + the 
> differences inherent to arrow functions with the above mentioned exceptions + 
> the implicit assignment, which breaks the expected behaviour of arrow 
> functions by assigning the result of the expression instead of returning it 
> (totally understandable, but still a difference).
> * Properties _may_ or _may not_ have a default value anymore, depending on 
> whether the hooks cause it to be a backed or a virtual property.
> * Properties with hooks can _be_ readonly, but cannot be declared as readonly.
> * Only available for object properties, static properties are excluded.
> * Readonly properties which are not denoted as readonly, but still are due to 
> their virtual nature (get without access to $this->prop), which can only be 
> figured out by, again, studying the contents of the hook function.

There are more below that I will be commenting on in more detail.

I understand that, in a perfect world, each language feature would
automatically work with every other. In reality, this is either not
possible, or complex and time consuming. We opted for the middle
ground, allowing things that were workable and omitting things that
weren't, or required unrelated changes. If this is not acceptable,
then I genuinely don't know how to approach non-trivial RFCs.

> * Properties can now be declared as abstract, but only with explicit hook 
> requirements, otherwise the abstract keyword is not allowed.
> * Properties can now be declared on interfaces, but only with explicit hook 
> requirements, otherwise they are not allowed.

The semantics of plain interface properties are not clear.
Intuitively, `public $prop;` looks equivalent to `public $prop { get;
set; }`, but due to reference semantics it is not. It's not even
equivalent to `{ &get; set; }`, because by-reference hooked properties
still have some inherent limita

Re: [PHP-DEV] [RFC][Vote announcement] Property hooks

2024-04-09 Thread Ilija Tovilo
Hi Robert

On Tue, Apr 9, 2024 at 9:34 PM Robert Landers  wrote:
>
> On Tue, Apr 9, 2024 at 8:56 PM Larry Garfield  wrote:
> >
> > The Aviz RFC was put to a vote last year but didn't pass.
>
> It would be really nice if votes weren't just a yes/no vote, but
> yes/needs-more-work/no vote, where needs-more-work and no are
> effectively the same in terms of passing the RFC, but needs-more-work
> just means there is more to do (either addressing ugly syntax or the
> idea is sound, but as it says, it needs more work), and can thus be
> simply revoted on after concerns are addressed -- instead of creating
> a whole new RFC that needs to pass the "not too similar to other RFCs
> rule."

The asymmetric visibility RFC did include a poll for no votes.
https://wiki.php.net/rfc/asymmetric-visibility#proposed_voting_choices

> I got the impression from the Aviz discussions that most people were
> against Aviz due to the syntax, not the feature itself. It would be
> absolutely tragic if this failed to pass simply because people
> expected Aviz here.

According to the poll, syntax was one, but not the primary reason for
its rejection. The primary reason was that some people don't believe
the feature is necessary at all.

IIRC, people were arguing that readonly covers 80% of use-cases,
because it protects against writes to the property both publicly and
privately. I don't agree with this viewpoint, because I think readonly
is bad for ergonomics. In fact, we already had an RFC that attempted
to fix clone for readonly
(https://wiki.php.net/rfc/readonly_amendments) but this fix was not
complete (because it's still not possible to pass values from clone to
__clone). "Clone with" is another thing needed to fix this, and at
this point it just feels like applying more band-aids.

For DTOs, I believe value types (i.e. data classes,
https://externals.io/message/122845) solve the problem of "spooky
actions at a distance" in a cleaner and more ergonomic way.

For services and other intentional reference types, readonly often
isn't the right choice either, just to make the property not publicly
writable. Asymmetric visibility would be a much more fitting choice.

Anyway, we didn't include asymmetric visibility in this RFC because:

* We wanted to avoid getting rejected by people who fundamentally
dislike asymmetric visibility.
* We didn't feel it was fair to "sneak" the feature back in through
some other RFC, when it was explicitly rejected.

Instead, we are planning to re-propose asymmetric visibility once
property hooks are merged, as it may become more apparent why it is
useful.

Ilija


[PHP-DEV] [RFC][Vote announcement] Property hooks

2024-04-08 Thread Ilija Tovilo
Hi everyone

Heads-up: Larry and I would like to start the vote of the property
hooks RFC tomorrow:
https://wiki.php.net/rfc/property-hooks

We have worked long and hard on this RFC, and hope that we have found
some middle-ground that works for the majority. One last concern we
have not officially clarified on the list:

https://externals.io/message/122445#122667

>> I personally do not feel strongly about whether asymmetric types make it 
>> into the initial implementation. Larry does, however, and I think it is not 
>> fair to exclude them without providing any concrete reasons not to. [snip]
>
> My concern is more about the external impact of what is effectively a change 
> to the type system of the language: [snip] will tools like PhpStan and Psalm 
> require complex changes to analyse code using such properties?

In particular, this paragraph is referencing the ability to widen the
accepted $value parameter type of the set hook, described at the
bottom of https://wiki.php.net/rfc/property-hooks#set. I have talked
to Ondřej Mirtes, the maintainer of PHPStan, and he confirmed that
this should not be complex to implement in PHPStan. In fact, PHPStan
already offers the @property-read and @property-write class
annotations, which can be used to describe "virtual" properties
handled within __get/__set, already providing asymmetric types of
sorts. Hence, this concern should be a non-issue.

Thank you to everybody who has contributed to the discussion!

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-06 Thread Ilija Tovilo
Hi Rowan

On Fri, Apr 5, 2024 at 12:28 AM Rowan Tommins [IMSoP]
 wrote:
>
> On 03/04/2024 00:01, Ilija Tovilo wrote:
>
> Regardless of the implementation, there are a lot of interactions we will 
> want to consider; and we will have to keep considering new ones as we add to 
> the language. For instance, the Property Hooks RFC would probably have needed 
> a section on "Interaction with Data Classes".

That remark was implying that data classes really are just classes
with some additional tweaks. That gives us the ability to handle them
differently when desired. However, they will otherwise behave just
like classes, which makes it not so different from your suggestion.

> On a practical note, a few things I've already thought of to consider:
>
> - Can a data class have readonly properties (or be marked "readonly data 
> class")? If so, how will they behave?

Yes. The CoW semantics become irrelevant, given that nothing may
trigger a separation. However, data classes also include value
equality, and hashing in the future. These may still be useful for
immutable data.

> - Can you explicitly use the "clone" keyword with an instance of a data 
> class? Does it make any difference?

Manual cloning is not useful, but it's also not harmful. So I'm
leaning towards allowing this. This way, data classes may be handled
generically, along with other non-data classes.

> - Tied into that: can you implement __clone(), and when will it be called?

Yes. `__clone` will be called when the object is separated, as you would expect.

> - If you implement __set(), will copy-on-write be triggered before it's 
> called?

Yes. Separation happens as part of the property fetching, rather than
the assignment itself. Hence, for `$foo->bar->baz = 'baz';`, once
`Bar::__set('baz', 'baz')` is called, `$foo` and `$foo->bar` will
already have been separated.

> - Can you implement __destruct()? Will it ever be called?

Yes. As with any other object, this will be called once the last
reference to the object goes away. There's nothing special going on.

It's worth noting that CoW makes `__clone` and `__destruct` somewhat
nondeterministic, or at least non-obvious.

> > Consider this example, which would > work with the current approach: > > 
> > $shapes[0]->position->zero!();
>
> I find this concise example confusing, and I think there's a few things to 
> unpack here...

I think you're putting too much focus on CoW. CoW should really be
considered an implementation detail. It's not _fully_ transparent,
given that it is observable through `__clone` and `__destruct` as
mentioned above. But it is _mostly_ transparent.

Conceptually, the copy happens not when the method is called, but when
the variable is assigned. For your example:

```php
$shape = new Shape(new Position(42,42));
$copy = $shape; // Conceptually, a recursive copy happens here.
$copy->position->zero!(); // $shape is already detached from $copy.
The ! merely indicates that the value is modified.
```

> The array access doesn't need any special marker, because there's no 
> ambiguity.

This is only true if you ignore ArrayAccess. `$foo['bar']` does not
necessarily indicate that `$foo` is an array. If it were a `Vector`,
then we would absolutely need an indication to separate it.

It's true that `$foo->bar` currently indicates that `$foo` is a
reference type. This assumption would break with this RFC, but that's
also kind of the whole point.

> What is going to be CoW cloned, and what is going to be modified in place? I 
> can't actually know without knowing the definition behind both $item and 
> $item->shape. It might even vary depending on input.

For the most part, data classes should consist of other value types,
or immutable reference types (e.g. DateTimeImmutable). This actually
makes the rules quite simple: If you assign a value type, the entire
data structure is copied recursively. The fact that PHP delays this
step for performance is unimportant. The fact that immutable reference
types aren't cloned is also unimportant, given that they don't change.

Ilija


Re: [PHP-DEV] Proposal: retrieve line, filename and if user defined for ReflectionAttribute

2024-04-05 Thread Ilija Tovilo
Hi Joel

On Fri, Apr 5, 2024 at 3:10 PM Joel Wurtz  wrote:
>
> Like a lot of libraries, we offer the possibility to configure behaviors with 
> Attributes. However in some cases it's wrongly configured by the user and 
> this wrong configuration cannot be detected on the attribute constructor but 
> afterwards.
>
> In this case we may want to pinpoint which attribute (in which file and at 
> which line) cause this bad configuration. Since there was no method to 
> retrieve those information in the ReflectionAttribute I proposed a PR 
> https://github.com/php/php-src/pull/13889 to add those informations.
>
> I do believe this will allow better DX for end user when correctly used,

I would propose negating and renaming isUserDefined() to isInternal(),
since we already have several such methods. As hinted in the PR, I
believe the implementation is wrong, or rather doesn't do what you
would expect it to do.

I'm also wondering whether it may be more useful to expose the
attribute class as getClass(), without instantiating the attribute
itself. This would allow you to check for isInternal() there, along
with all the other class reflection information.

Ilija


Re: [PHP-DEV] RFC idea: using the void type to control maximum arity of user-defined functions

2024-04-04 Thread Ilija Tovilo
On Thu, Apr 4, 2024 at 5:58 PM Tim Düsterhus  wrote:
>
> On 4/4/24 16:36, Pablo Rauzy wrote:
> > I strongly agree in theory, but this could break existing code, and
> > moreover such a proposal was already rejected:
> > https://wiki.php.net/rfc/strict_argcount
>
> The RFC is 9 years old by now. My gut feeling is be that using an actual
> variadic parameter for functions that are variadic is what people do,
> because it makes the function signature much clearer. Actually variadic
> parameters are available since PHP 5.6, which at the time of the
> previous RFC was the newest version. Since then we had two major
> releases, one of which (7.x) is already out of support.
>
> I think it would be reasonable to consider deprecating passing extra
> arguments to a non-variadic function.

IIRC one of the bigger downsides of this change are closure calls that
may provide arguments that the callee does not care about.

https://3v4l.org/0QdoS

```
function filter($array, callable $c) {
$result = [];
foreach ($array as $key => $value) {
if ($c($value, $key)) {
$result[$key] = $value;
}
}
return $result;
}

var_dump(filter(['foo', '', 'bar'], function ($value) {
return strlen($value);
}));

// Internal functions already throw on superfluous args
var_dump(filter(['foo', '', 'bar'], 'strlen'));
```

The user may currently choose to omit the $key parameter of the
closure, as it is never used. In the future, this would throw. We may
decide to create an exemption for such calls, but I'm not sure
replacing one inconsistency with another is a good choice.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-03 Thread Ilija Tovilo
Hi Larry

On Wed, Apr 3, 2024 at 12:03 AM Larry Garfield  wrote:
>
> On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote:
>
> > I think you misunderstood. The intention is to mark both call-site and
> > declaration. Call-site is marked with ->method!(), while declaration
> > is marked with "public mutating function". Call-site is required to
> > avoid the engine complexity, as previously mentioned. But
> > declaration-site is required so that the user (and IDEs) even know
> > that you need to use the special syntax at the call-site.
>
> Ah, OK.  That's... unfortunate, but I defer to you on the implementation 
> complexity.

As I've argued, I believe the different syntax is a positive. This
way, data classes are known to stay unmodified unless:

1. You're explicitly modifying it yourself.
2. You're calling a mutating method, with its associated syntax.
3. You're creating a reference from the value, either explicitly or by
passing it to a by-reference parameter.

By-reference argument passing is the only way that mutations of data
classes can be hidden (given that they look exactly like normal
by-value arguments), and its arguably a flaw of by-reference passing
itself. In all other cases, you can expect your value _not_ to
unexpectedly change. For this reason, I consider it as an alternative
approach to readonly classes.

> > Disallowing ordinary by-ref objects is not trivial without additional
> > performance penalties, and I don't see a good reason for it. Can you
> > provide an example on when that would be problematic?
>
> There's two aspects to it, that I see.
>
> data class A {
>   public function __construct(public string $name) {}
> }
>
> data class B {
>   public function __construct(
> public A $a,
> public PDO $conn,
>   ) {}
> }
>
> $b = new B(new A(), $pdoConnection);
>
> function stuff(B $b2) {
>   $b2->a->name = 'Larry';
>   // This triggers a CoW on $b2, separating it from $b, and also creating a 
> new instance of A.  What about $conn?
>   // Does it get cloned?  That would be bad.  Does it not get cloned?  That 
> seems weird that it's still the same on
>   // a data object.
>
>   $b2->conn->beginTransaction();
>   // This I would say is technically a modification, since the state of the 
> connection is changing.  But then
>   // should this trigger $b2 cloning from $b1?  Neither answer is obvious to 
> me.
> }

IMO, the answer is relatively straight-forward: PDO is a reference
type. For all intents and purposes, when you're passing B to stuff(),
B is copied. Since B::$conn is a "reference" (read pointer), copying B
doesn't copy the connection, only the reference to it. B::$a, however,
is a value type, so copying B also copies A. The fact that this isn't
_exactly_ what happens under the hood due to CoW is an implementation
detail, it doesn't need to change how you think about it. From the
users standpoint, $b and $b2 can already separate values once stuff()
is called.

This is really no different from arrays:

```php
$b = ['a' => ['name' => 'Larry'], 'conn' => $pdoConnection];
$b2 = $b; // $b is detached from $b2, $b['conn'] remains a shared object.
```

> The other aspect is, eg, serialization.  People will come to expect 
> (reasonably) that a data class will have certain properties (in the abstract 
> sense, not lexical sense).  For instance, most classes are serializable, but 
> a few are not.  (Eg, if they have a reference to PDO or a file handle or 
> something unserializable.)  Data classes seem like they should be safe to 
> serialize always, as they're "just data".  If data classes are limited to 
> primitives and data classes internally, that means we can effectively 
> guarantee that they will be serializable, always.  If one of the properties 
> could be a non-serializable object, that assumption breaks.

I'm not sure that's a convincing argument to fully disallow reference
types, especially since it would prevent you from storing
DateTimeImmutables and other immutable values in data classes and thus
break many valid use-cases. That would arguably be very limiting.

> There's probably other similar examples besides serialization where "think of 
> this as data" and "think of this as logic" is how you'd want to think, which 
> leads to different assumptions, which we shouldn't stealthily break.

I think your assumption here is that non-data classes cannot contain
data. This doesn't hold, and especially will not until data classes
become more common. Readonly classes can be considered strict versions
of data classes in terms of mutability, minus some of the other
semantic changes (e.g. identity).

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Rowan

On Tue, Apr 2, 2024 at 10:10 PM Rowan Tommins [IMSoP]
 wrote:
>
> On 02/04/2024 01:17, Ilija Tovilo wrote:
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> I'm not sure if you've considered it already, but mutating methods should 
> probably be constrained to be void (or maybe "mutating" could occupy the 
> return type slot). Otherwise, someone is bound to write this:
>
> $start = new Location('Here');
> $end = $start->move!('There');
>
> Expecting it to mean this:
>
> $start = new Location('Here');
> $end = $start;
> $end->move!('There');
>
> When it would actually mean this:
>
> $start = new Location('Here');
> $start->move!('There');
> $end = $start;

I think there are some valid patterns for mutating methods with a
return value. For example, Set::add() might return a bool to indicate
whether the value was already present in the set.

> I seem to remember when this was discussed before, the argument being made 
> that separating value objects completely means you have to spend time 
> deciding how they interact with every feature of the language.

Data classes are classes with a single additional
zend_class_entry.ce_flags flag. So unless customized, they behave as
classes. This way, we have the option to tweak any behavior we would
like, but we don't need to.

Of course, this will still require an analysis of what behavior we
might want to tweak.

> Does the copy-on-write optimisation actually require the entire class to be 
> special, or could it be triggered by a mutating method on any object? To 
> allow direct modification of properties as well, we could move the call-site 
> marker slightly to a ->! operator:
>
> $foo->!mutate();
> $foo->!bar = 42;

I suppose this is possible, but it puts the burden for figuring out
what to separate onto the user. Consider this example, which would
work with the current approach:

$shapes[0]->position->zero!();

The left-hand-side of the mutating method call is fetched by
"read+write". Essentially, this ensures that any array or data class
is separated (copied if RC >1).

Without such a class-wide marker, you'll need to remember to add the
special syntax exactly where applicable.

$shapes![0]!->position!->zero();

In this case, $shapes, $shapes[0], and $shapes[0]->position must all
be separated. This seems very easy to mess up, especially since only
zero() is actually known to be separating and can thus be verified at
runtime.

> The main drawback I can see (outside of the implementation, which I can't 
> comment on) is that we couldn't overload the === operator to use value 
> semantics. In exchange, a lot of decisions would simply be made for us: they 
> would just be objects, with all the same behaviour around inheritance, 
> serialization, and so on.

Right, this would either require some other marker that switches to
this mode of comparison, or operator overloading.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Niels

On Tue, Apr 2, 2024 at 8:16 PM Niels Dossche  wrote:
>
> On 02/04/2024 02:17, Ilija Tovilo wrote:
> > Hi everyone!
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
>
> As already hinted in the thread, I also think inheritance may be dangerous in 
> a first version.
> I want to add to that: if you extend a data-class with a non-data-class, the 
> data-class behaviour gets lost, which is logical in a sense but also 
> surprised me in a way.

Yes, that's definitely not intended. I haven't implemented any
inheritance checks yet. But if inheritance is allowed, then it should
be restricted to classes of the same kind (by-ref or by-val).

> Also, FWIW, I'm not sure about the name "data" class, perhaps "value" class 
> or something alike is what people may be more familiar with wrt semantics, 
> although dataclass is also a known term.

I'm happy with value class, struct, record, data class, what have you.
I'll accept whatever the majority prefers.

> I do have a question about iterator behaviour. Consider this code:
> ```
> data class Test {
> public $a = 1;
> public $b = 2;
> }
>
> $test = new Test;
> foreach ($test as $k => &$v) {
> if ($k === "b")
> $test->a = $test;
> var_dump($k);
> }
> ```
>
> This will reset the iterator of the object on separation, so we will get an 
> infinite loop.
> Is this intended?
> If so, is it because the right hand side is the original object while the 
> left hand side gets the clone?
> Is this consistent with how arrays separate?

That's a good question. I have not really thought about iterators yet.
Modification of an array iterated by-reference does not restart the
iterator. Actually, by-reference capturing of the value also captures
the array by-reference, which is not completely intuitive.

My initial gut feeling is to handle data classes the same, i.e.
capture them by-reference when iterating the value by reference, so
that iteration is not restarted.

Ilija


Re: [PHP-DEV] Requiring GPG Commit Signing

2024-04-02 Thread Ilija Tovilo
On Tue, Apr 2, 2024 at 9:43 PM Rowan Tommins [IMSoP]
 wrote:
>
> Similarly, if you discover a compromised key or signing account, you can look 
> for uses of that key or account, which might be a tiny number from a non-core 
> contributor; if you discover a compromised account pushing unsigned commits, 
> you have to audit every commit in the repository.

Right, that and what Jakub mentioned are fair arguments.

> I agree it's not a complete solution, but no security measure is; it's always 
> about reducing the attack surface or limiting the damage.

Right. That was the original intention of my e-mail: To point out that
we might also want to consider other mitigations. Not that we
shouldn't do commit signing.

Ilija


Re: [PHP-DEV] Requiring GPG Commit Signing

2024-04-02 Thread Ilija Tovilo
Hi Rowan

On Tue, Apr 2, 2024 at 8:48 PM Rowan Tommins [IMSoP]
 wrote:
>
> In fact, you don't need to compromise anybody's key: you could socially 
> engineer a situation where you have push access to the repository, or break 
> the security in some other way. As I understand it, this is exactly what 
> happened 3 years ago: someone gained direct write access to the git.php.net 
> server, and added commits "authored by" Nikita and others to the history in 
> the repository.

Right, but I would like to believe that attaining push access _without
gaining access to a maintainers account_ should be substantially
harder on GitHub than our self-hosted git server. :)

> If all commits are signed, a compromised key or account can only be used to 
> sign commits with that specific identity: your GitHub account can't be used 
> to sign commits as Derick or Nikita, only as you. The impact is limited to 
> one identity, not the integrity of the entire repository.

But, does it matter? I'm not sure we look at some commits closer than
others, based on its author. It's true that it might be easier to
identify malicious commits if they all come from the same user, but it
wouldn't prevent them.

To be clear: I'm not against commit signing, I've been doing it for
years. I'm just unsure if it's a sufficient solution (apart from
releases, which are a whole different can of worms).

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Larry

On Tue, Apr 2, 2024 at 5:31 PM Larry Garfield  wrote:
>
> On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote:
> > Hi everyone!
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
> >
> > * Data classes are ordinary classes, and as such may implement
> > interfaces, methods and more. I have not decided whether they should
> > support inheritance.
>
> What would be the reason not to?  As you indicated in another reply, the main 
> reason some languages don't is to avoid large stack copies, but PHP doesn't 
> have large stack copies for objects anyway so that's a non-issue.
>
> I've long argued that the fewer differences there are between service classes 
> and data classes, the better, so I'm not sure what advantage this would have 
> other than "ugh, inheritance is such a mess" (which is true, but that ship 
> sailed long ago).

One issue that just came to mind is object identity. For example:

class Person {
public function __construct(
public string $firstname,
public string $lastname,
) {}
}

class Manager extends Person {
public function bossAround() {}
}

$person = new Person('Boss', 'Man');
$manager = new Manager('Boss', 'Man');
var_dump($person === $manager); // ???

Equality for data objects is based on data, rather than the object
handle. How does this interact with inheritance? Technically, Person
and Manager represent the same data. Manager contains additional
behavior, but does that change identity?

I'm not sure what the answer is. That's just the first thing that came
to mind. I'm confident we'll discover more such edge cases. Of course,
I can invest the time to find the questions before deciding to
disallow inheritance.

> > * Mutating method calls on data classes use a slightly different
> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> > marked as `mutating`. The reason for this is twofold: 1. It signals to
> > the caller that the value is modified. 2. It allows `$vector` to be
> > cloned before knowing whether the method `append` is modifying, which
> > hugely reduces implementation complexity in the engine.
>
> As discussed in R11, it would be very beneficial if this marker could be on 
> the method definition, not the method invocation.  You indicated that would 
> be Hard(tm), but I think it's worth some effort to see if it's surmountably 
> hard.  (Or at least less hard than just auto-detecting it, which you 
> indicated is Extremely Hard(tm).)

I think you misunderstood. The intention is to mark both call-site and
declaration. Call-site is marked with ->method!(), while declaration
is marked with "public mutating function". Call-site is required to
avoid the engine complexity, as previously mentioned. But
declaration-site is required so that the user (and IDEs) even know
that you need to use the special syntax at the call-site.

> So to the extent there is a consensus, equality, stringifying, and a hashcode 
> (which we don't have yet, but will need in the future for some things I 
> suspect) seem to be the rough expected defaults.

I'm just skeptical whether the default __toString() is ever useful. I
can see an argument for it for quick debugging in languages that don't
provide something like var_dump(). In PHP this seems much less useful.
It's impossible to provide a default implementation that works
everywhere (or pretty much anywhere, even).

Equality is already included. Hashing should be added separately, and
probably not just to data classes.

> > * In the future, it should be possible to allow using data classes in
> > `SplObjectStorage`. However, because hashing is complex, this will be
> > postponed to a separate RFC.
>
> Would data class properties only be allowed to be other data classes, or 
> could they hold a non-data class?  My knee jerk response is they should be 
> data classes all the way down; the only counter-argument I can think of it 
> would be how much existing code is out there that is a "data class" in all 
> but name.  I still fear someone adding a DB connection object to a data class 
> and everything going to hell, though. :-)

Disallowing ordinary by-ref objects is not trivial without additional
performance penalties, and I don't see a good reason for it. Can you
provide an example on when that would be problematic?

Ilija


Re: [PHP-DEV] Requiring GPG Commit Signing

2024-04-02 Thread Ilija Tovilo
Hi Derick

On Tue, Apr 2, 2024 at 4:15 PM Derick Rethans  wrote:
>
> What do y'all think about requiring GPG signed commits for the php-src
> repository?

Let me repost my internal response for visibility.

I'm currently struggling to understand what kind of attack signing
commits prevents.

If your GitHub account is compromised, GitHub allows the attacker to
commit via web interface and will happily sign their commits with a
gpg key auto-generated for your account.

See: 
https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification

> GitHub will automatically use GPG to sign commits you make using the web 
> interface. Commits signed by GitHub will have a verified status. You can 
> verify the signature locally using the public key available at 
> https://github.com/web-flow.gpg.

Even if this wasn't the case, the attacker may simply register their
own gpg key in your account, with the commits appearing as verified.

If your ssh key is compromised instead, and you use ssh to sign your
commits, the attacker may sign their malicious commits with that same
key they may use to push.

The only thing this really seems to prevent is pushing commits via a
compromised ssh key, while commits need to be signed with gpg. If
that's the intention, we should require using gpg rather than ssh for
signing (or using a different ssh key, I suppose). Additionally, it
may help for people who push via HTTP+auth token, but that's probably
not advisable in the first place.

Something that may also help is restricting pushes to patch branches
(PHP-x.y.z) to release managers. These branches are not commonly
looked at by the public, and so it may be easier to sneak malicious
commits into them.

In addition, we should keep GitHub privileges narrow, especially
branch protection configuration.

As mentioned by others, this does not prevent the xz issue. But paired
with an auto-deployment solution, it could definitely help. It would
be even better if release managers cannot change CI, and CI
maintainers cannot create releases, as this essentially enforces the
4-eyes principle. The former may be hard to enforce, as CI lives in
the same repository.

Another solution might be to require PRs, and PR verifications. But
this will inevitably create overhead for maintainers.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Alexander

On Tue, Apr 2, 2024 at 4:53 AM Alexander Pravdin  wrote:
>
> On Tue, Apr 2, 2024 at 9:18 AM Ilija Tovilo  wrote:
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
>
> While I like the idea, I would like to suggest something else in
> addition or as a separate feature. As an active user of readonly
> classes with all promoted properties for data-holding purposes, I
> would be happy to see the possibility of cloning them with passing
> some properties to modify:
>
> readonly class Data {
> function __construct(
> public string $foo,
> public string $bar,
> public string $baz,
> ) {}
> }
>
> $data = new Data(foo: 'A', bar: 'B', baz: 'C');
>
> $data2 = clone $data with (bar: 'X', baz: 'Y');

What you're asking for is part of the "Clone with" RFC:
https://wiki.php.net/rfc/clone_with

This issue is valid and the RFC would improve the ergonomics of
readonly classes.

However, note that it really only addresses a small part of what this
RFC tries achieve:

> Some APIs further exacerbate the issue by
requiring multiple copies for multiple modifications (e.g.
`$response->withStatus(200)->withHeader('X-foo', 'foo');`).

Readonly works fine for compact data structures, even if it is copied
more than it needs. For large data structures, like large lists, a
copy for each modification would be detrimental.

https://3v4l.org/GR6On

See how the performance of an insert into an array tanks if a copy of
the array is performed in each iteration (due to an additional
reference to it). Readonly is just not viable for data structures such
as lists, maps, sets, etc.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Marco

On Tue, Apr 2, 2024 at 2:56 AM Deleu  wrote:
>
>
>
> On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo  wrote:
>>
>> I'd like to introduce an idea I've played around with for a couple of
>> weeks: Data classes, sometimes called structs in other languages (e.g.
>> Swift and C#).
>>
>> snip
>>
>> Some other things to note about data classes:
>>
>> * Data classes are ordinary classes, and as such may implement
>> interfaces, methods and more. I have not decided whether they should
>> support inheritance.
>
> I'd argue in favor of not including inheritance in the first version. Taking 
> inheritance out is an impossible BC Break. Not introducing it in the first 
> stable release gives users a chance to evaluate whether it's something we 
> will drastically miss.

I would probably agree. I believe the reasoning some languages don't
support inheritance for value types is because they are stored on the
stack. Inheritance encourages large structures, but copying very large
structures over and over on the stack may be slow.

In PHP, objects always live on the heap, and due to CoW we don't have
this problem. Still, it may be beneficial to disallow inheritance
first, and relax this restriction if it is necessary.

>> * Mutating method calls on data classes use a slightly different
>> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
>> marked as `mutating`. The reason for this is twofold: 1. It signals to
>> the caller that the value is modified. 2. It allows `$vector` to be
>> cloned before knowing whether the method `append` is modifying, which
>> hugely reduces implementation complexity in the engine.
>
> I'm not sure if I understood this one. Do you mean that the `!` modifier here 
> (at call-site) is helping the engine clone the variable before even diving 
> into whether `append()` has been tagged as mutating?

Precisely. The issue comes from deeper nested values:

$circle->position->zero();

Imagine that Circle is a data class with a Position, which is also a
data class. Position::zero() is a mutating method that sets the
coordinates to 0:0. For this to work, not only the position needs to
be copied, but also $circle. However, the engine doesn't yet know
ahead of time whether zero() is mutating, and as such needs to perform
a copy.

One idea was to evaluate the left-hand-side of the method call, and
repeat it with a copy if the method is mutating. However, this is not
trivially possible, because opcodes consume their operands. So, for an
expression like `getCircle()->position->zero()`, the return value of
`getCircle()` is already gone. `!` explicitly distinguishes the call
from non-mutating calls, and knows that a copy will be needed.

But as mentioned previously, I think a different syntax offers
additional benefits for readability.

> From outside it looks odd that a clone would happen ahead-of-time while 
> talking about copy-on-write. Would this syntax break for non-mutating methods?

If by break you mean the engine would error, then yes. Only mutating
methods may (and must) be called with the $foo->bar!() syntax.

Ilija


[PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-01 Thread Ilija Tovilo
Hi everyone!

I'd like to introduce an idea I've played around with for a couple of
weeks: Data classes, sometimes called structs in other languages (e.g.
Swift and C#).

In a nutshell, data classes are classes with value semantics.
Instances of data classes are implicitly copied when assigned to a
variable, or when passed to a function. When the new instance is
modified, the original instance remains untouched. This might sound
familiar: It's exactly how arrays work in PHP.

```php
$a = [1, 2, 3];
$b = $a;
$b[] = 4;
var_dump($a); // [1, 2, 3]
var_dump($b); // [1, 2, 3, 4]
```

You may think that copying the array on each assignment is expensive,
and you would be right. PHP uses a trick called copy-on-write, or CoW
for short. `$a` and `$b` actually share the same array until `$b[] =
4;` modifies it. It's only at this point that the array is copied and
replaced in `$b`, so that the modification doesn't affect `$a`. As
long as a variable is the sole owner of a value, or none of the
variables modify the value, no copy is needed. Data classes use the
same mechanism.

But why value semantics in the first place? There are two major flaws
with by-reference semantics for data structures:

1. It's very easy to forget cloning data that is referenced somewhere
else before modifying it. This will lead to "spooky actions at a
distance". Having recently used JavaScript (where all data structures
have by-reference semantics) for an educational IR optimizer,
accidental mutations of shared arrays/maps/sets were my primary source
of bugs.
2. Defensive cloning (to avoid issue 1) will lead to useless work when
the value is not referenced anywhere else.

PHP offers readonly properties and classes to address issue 1.
However, they further promote issue 2 by making it impossible to
modify values without cloning them first, even if we know they are not
referenced anywhere else. Some APIs further exacerbate the issue by
requiring multiple copies for multiple modifications (e.g.
`$response->withStatus(200)->withHeader('X-foo', 'foo');`).

As you may have noticed, arrays already solve both of these issues
through CoW. Data classes allow implementing arbitrary data structures
with the same value semantics in core, extensions or userland. For
example, a `Vector` data class may look something like the following:

```php
data class Vector {
private $values;

public function __construct(...$values) {
$this->values = $values;
}

public mutating function append($value) {
$this->values[] = $value;
}
}

$a = new Vector(1, 2, 3);
$b = $a;
$b->append!(4);
var_dump($a); // Vector(1, 2, 3)
var_dump($b); // Vector(1, 2, 3, 4)
```

An internal Vector implementation might offer a faster and stricter
alternative to arrays (e.g. Vector from php-ds).

Some other things to note about data classes:

* Data classes are ordinary classes, and as such may implement
interfaces, methods and more. I have not decided whether they should
support inheritance.
* Mutating method calls on data classes use a slightly different
syntax: `$vector->append!(42)`. All methods mutating `$this` must be
marked as `mutating`. The reason for this is twofold: 1. It signals to
the caller that the value is modified. 2. It allows `$vector` to be
cloned before knowing whether the method `append` is modifying, which
hugely reduces implementation complexity in the engine.
* Data classes customize identity (`===`) comparison, in the same way
arrays do. Two data objects are identical if all their properties are
identical (including order for dynamic properties).
* Sharing data classes by-reference is possible using references, as
you would for arrays.
* We may decide to auto-implement `__toString` for data classes,
amongst other things. I am still undecided whether this is useful for
PHP.
* Data classes protect from interior mutability. More concretely,
mutating nested data objects stored in a `readonly` property is not
legal, whereas it would be if they were ordinary objects.
* In the future, it should be possible to allow using data classes in
`SplObjectStorage`. However, because hashing is complex, this will be
postponed to a separate RFC.

One known gotcha is that we cannot trivially enforce placement of
`modfying` on methods without a performance hit. It is the
responsibility of the user to correctly mark such methods.

Here's a fully functional PoC, excluding JIT:
https://github.com/php/php-src/pull/13800

Let me know what you think. I will start working on an RFC draft once
work on property hooks concludes.

Ilija


Re: [PHP-DEV] GitHub milestones

2024-03-27 Thread Ilija Tovilo
Hi Jakub

On Wed, Mar 27, 2024 at 4:55 PM Jakub Zelenka  wrote:
>
> We actually decided not to do it for 8.3 because it would be just waste of 
> time to set all PR's with that milestone. The thing is that PR should just 
> get merged when it's ready and we won't be delaying release because some PR's 
> in that milestone are not ready so it does not have any meaning.

Thank you, I was unaware that this was a conscious decision. I agree
that it's not particularly useful for the next minor version. If they
are ready, nothing is blocking a merge to master.

> I'm not really sure if there's any point to have non-draft PR's targeting 
> next major version because they cannot be merged to master until it is 
> decided the next version will be the major one. So those PR's should be draft 
> but it might make sense to create milestone for them to show quickly why they 
> are in draft.

Draft PRs that target the next major version can make sense if they
are part of an RFC. I generally believe that every RFC should have at
least a proof-of-concept. In my experience, the implementation is the
only reliable way to reveal conceptual issues.

But then again, such RFCs should be in the RFC listing, as mentioned
in my previous message. So I agree that there's not a big need to
milestones.

> So I don't think there is much point in adding 8.4 milestone but 9.0 might be 
> useful.

That sounds reasonable to me, as it shouldn't cost much. Milestones
don't need to be complete either, it can be added where it makes
sense, for things that might be forgotten otherwise.

Ilija


Re: [PHP-DEV] Request for RFC karma

2024-03-27 Thread Ilija Tovilo
Hi!

On Wed, Mar 27, 2024 at 12:59 PM 하늘아부지  wrote:
>
> I request RFC karma to discuss the issue at this link.
> https://github.com/php/php-src/issues/13813
>
> Wiki account : daddyofsky

Either somebody beat me to it, or you already had RFC privileges.

Note that step 1 in the RFC process suggests introducing the RFC idea
to the mailing list first. This way, you can get an initial feel
whether your idea is desired by other developers, and thus whether
it's worth your time.

Good luck with the RFC!

Ilija


Re: [PHP-DEV] GitHub milestones

2024-03-27 Thread Ilija Tovilo
Hi Peter

On Wed, Mar 27, 2024 at 9:44 AM Peter Kokot  wrote:
>
> I was wondering if it would be useful to add GitHub milestones for the 
> PHP-8.4 and PHP-9.0 (or PHP-next or something like this)?
> https://github.com/php/php-src/milestones
>
> Because some pull requests might target versions after the PHP-8.4 and it 
> might be useful to have them additionally sorted to not forget about them. 
> Not to tag all PRs of course but only those which are meant to go into some 
> of the future branches.

Milestones have already been used in the past
(https://github.com/php/php-src/milestones?state=closed), so there's
no reason not to do the same for 8.4 and 9.0. Most likely, we just
forgot. It's probably not documented as part of the release process.

RFCs that have actionable tasks for the next major version should be
listed under "Pending Implementation / Landing" on the RFC page:
https://wiki.php.net/rfc#pending_implementationlanding

Adding them to a GitHub Milestone would make it a bit more explicit.
As for PRs that don't have an RFC, adding them to a milestone
definitely makes sense. However, PRs that may only be merged in the
next major version, and _don't_ require an RFC are extremely rare.

Ilija


Re: [PHP-DEV] Proposal: AS assertions

2024-03-19 Thread Ilija Tovilo
Hi Rowan

On Tue, Mar 19, 2024 at 8:39 PM Rowan Tommins [IMSoP]
 wrote:
>
> As well pattern matching, which Ilija mentioned, another adjacent feature is 
> a richer set of casting operators. Currently, we can assert that something is 
> an int; or we can force it to be an int; but we can't easily say "make this 
> an int if safe, but throw otherwise" or "make this an int if safe, but 
> substitute null/$someValue otherwise".
>
> I've been considering how we can improve that for a while, but not settled on 
> a firm proposal - there's a lot of different versions we *could* support, so 
> choosing a minimal set is hard.

I've thought about this in the context of pattern matching a while
back. I was thinking about something like `$x is ~int`, where the
pattern match is successful iff `$x` is coercible to `int` without
loss of information. Given that patterns may be nested, `array<~int>`
could check that all elements of an array are coercible to `int`. The
same could work for literal patterns, e.g. `~5`, where `5`, `5.0` and
`'5'` are all accepted.

This can potentially be combined with the variable binding pattern,
`$var @ pattern`. The syntax looks a bit confusing at first, but it
basically makes sure that the matched value conforms to `pattern`, and
then binds it to `$var`. Hence, something like `$foo as Foo { $bar @
~int }` would 1. make sure `$foo` is an instance of `Foo`, 2. make
sure `$foo->bar` is coercible to `int`, and then assigned the coerced
value to `$bar`. (It gets more complicated, because the assignment
must be delayed until the entire pattern matches.) If the pattern
matching fails at any point, it throws.

This is just an idea, neither the `as` operator nor the `~` pattern
have been implemented. I don't know whether they are feasible.

Anyway, we're probably going off-topic. :)

Ilija


Re: [PHP-DEV] Proposal: AS assertions

2024-03-19 Thread Ilija Tovilo
Hi Marco

On Tue, Mar 19, 2024 at 7:04 PM Marco Aurélio Deleu  wrote:
>
> > On 19 Mar 2024, at 14:51, Ilija Tovilo  wrote:
> >
> > Hi Robert
> >
> >> On Tue, Mar 19, 2024 at 5:24 PM Robert Landers  
> >> wrote:
> >>
> > See https://wiki.php.net/rfc/pattern-matching#throwing_alternative. I
> > believe this idea would combine nicely with pattern matching. It has
> > many more uses there than just simple class type matching, and could
> > even be used for things like destructuring.
>
> That looks like a PHP dream. Has there been any work regarding that?

https://github.com/iluuu1994/php-src/pull/102/files

The implementation is mostly complete (it might slightly diverge from
the current specification. Bob has called for a different
implementation approach that might be more complex but potentially
easier to optimize, I'll have to play around with it. There are also
still some design decisions that we aren't completely sure about. For
now, Larry and I are just trying to get property hooks over the finish
line.

Ilija


Re: [PHP-DEV] Proposal: AS assertions

2024-03-19 Thread Ilija Tovilo
Hi Robert

On Tue, Mar 19, 2024 at 5:24 PM Robert Landers  wrote:
>
> I've been thinking about this as an RFC for awhile, but with generics
> being far off (if at all), I'd like to propose a useful idea: reusing
> the AS keyword in a different context.
>
> Example:
>
> $x = $attributeReflection->newInstance() as MyAttribute;
>
> This would essentially perform the following code:
>
> assert(($x = $attributeReflection->newInstance()) instanceof MyAttribute);

See https://wiki.php.net/rfc/pattern-matching#throwing_alternative. I
believe this idea would combine nicely with pattern matching. It has
many more uses there than just simple class type matching, and could
even be used for things like destructuring.

Ilija


Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

2024-03-17 Thread Ilija Tovilo
Hi Rowan

On Sun, Mar 17, 2024 at 3:41 PM Rowan Tommins [IMSoP]
 wrote:
>
> The remaining difference I can see in the current RFC which seems to be
> unnecessary is that combining &get with set is only allowed on virtual
> properties. Although it may be "virtual" in the strict sense, any &get
> hook must actually be referring to some value stored somewhere - that
> might be a backed property, another field on the current class, a
> property of some other object, etc:
>
> public int $foo { &get => $this->foo; set { $this->foo = $value; } }
>
> public int $bar { &get => $this->_bar; set { $this->_bar = $value; } }
>
> public int $baz { &get => $this->delegatedObj->baz; set {
> $this->delegatedObj->baz = $value; } }
>
> This sentence from the RFC applies equally to all three of these examples:
>
>  > That is because any attempted modification of the value by reference
> would bypass a |set| hook, if one is defined.
>
> I suggest that we either trust the user to understand that that will
> happen, and allow combining &get and set on any property; or we do not
> trust them, and forbid it on any property.

I'm indeed afraid that people will blindly make their array properties
by-reference, without understanding the implications. Allowing
by-reference behavior for virtual read/write properties is a tradeoff,
for cases where it  may be necessary. Exposing private properties
by-reference is already possible outside of hooks
(https://3v4l.org/VNhf7), that's not something we can prevent for
secondary backing properties. However, we can at least make sure that
a reference to the baking value of a hooked property doesn't escape.

I realize this is somewhat inconsistent, but I believe it is
reasonable. If you want to expose the underlying property
by-reference, you need to jump through some additional hoops.

> > Apart from the things already mentioned, it's unclear to me whether,
> > with such `set;` declarations, a `get`-only backed property should
> > even be legal. With the complete absence of a write operation, the
> > assignment within the `set` itself would fail. To make this work, the
> > absence of `set;` would need to mean something like "writable, but
> > only within another hook", which introduces yet another form of
> > asymmetric visibility.
>
> Any write inside the get hook already by-passes the set hook and refers
> to the underlying property, so there would be no need for any default
> set behaviour other than throwing an error.
>
> It's not likely to be a common scenario, but the below works with the
> current implementation https://3v4l.org/t7qhR/rfc#vrfc.property-hooks
>
> class Example {
>  public int $nextNumber {
>  get {
>  $this->nextNumber ??= 0;
>  return $this->nextNumber++;
>  }
>  // Mimic the current behaviour of a virtual property:
> https://3v4l.org/cAfAI/rfc#vrfc.property-hooks
>  set => throw new Error('Property Example::$nextNumber is
> read-only');
>  }
> }

Again, it depends on how you think about it. As you have argued, for a
get-only property, the backing value should not be writable without an
explicit `set;` declaration. You can interpret `set;` as an
auto-generated hook, or as a marker that indicates that the backing
value is accessible without a hook. As mentioned in my previous
e-mail, auto-generated hooks is something we'd really like to avoid.
So, if the absence of `set;` means that the backing value is not
writable, the hook itself must be exempt from this rule.

Another thing to consider: The current implementation inherits the
backing value and all hooks from its parent. If the suggestion is to
add an explicit `set;` declaration to make it more obvious that the
property is writable, how does this help overridden properties?

```php
class P {
public $prop {
get => strtolower($this->prop);
set;
}
}

class C extends P {
public $prop {
get => strtoupper(parent::$prop::get());
}
}
```

Even though `P::$prop` signals that it is writable, there is no such
indication in `C::$prop`. You may suggest to also add `set;` to the
child, but then what if the parent adds a custom implementation for
`set;`?

```php
class P {
public $prop {
get => strtolower($this->prop);
set {
echo $value, "\n";
$this->prop = $value;
}
}
}

class C extends P {
public $prop {
get => strtoupper(parent::$prop::get());
set;
}
}
```

The meaning for `set;` is no longer clear. Does it mean that there's a
generated hook that accesses the backing field? Does it mean that the
backing field is accessible without a hook? Or does it mean that it
accesses the parent hook? The truth is, with inheritance there's no
way to look at the property declaration and fully understand what's
going on, unless all hooks must be spelled out for the sake of clarity
(e.g. `get => parent::$prop::get()`).

> We are already allowing more than Kotlin by lett

Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

2024-03-16 Thread Ilija Tovilo
Hi Rowan

On Sat, Mar 16, 2024 at 8:23 PM Rowan Tommins [IMSoP]
 wrote:
>
> I still think there will be a lot of users coming from other languages, or 
> from using __get and __set, who will look at virtual properties first. Making 
> things less surprising for those people seems worth some effort, but I'm not 
> asking for a complete redesign.

For clarity, you are asking for a way to make the "virtualness" of
properties more explicit, correct? We touch on a keyword and why we
think it's suboptimal in the FAQ section. Unfortunately, I cannot
think of many alternatives. The `$field` variable made it a bit more
obvious, but only marginally.

I do believe that, for the most part, the user should not have to
think much about whether the property is backed or virtual. The
behavioral differences are mostly intuitive. For example:

```php
class Test {
// This property has a set hook that writes to the backing value. Since
// we're using the backing value, it makes sense for there to be a way to
// retrieve it. Without that, it wouldn't be useful.
public $prop {
set {
$this->prop = strtoupper($value);
}
}

// Similarly, a property with only a get hook that accesses the backing
// value would need a way to write to the property for the get to be useful.
public $prop {
get => strtoupper($this->prop);
}

// A property with a get hook that does not use the backing value does not
// need an implicit set operation, as writing to the backing value would be
// useless, given that nobody will read it.
public $prop {
get => 42;
}

// Similarly, in the esoteric write-only case that does not use the backing
// value, having an implicit get operation would always lead to a
// "uninitialized property" error, and is not useful as such.
public $prop {
set {
echo "Prop set\n";
}
}
}
```

Furthermore, `serialize`, `var_dump` and the other functions operating
on raw property values will include the property only if it is backed.
This also seems intuitive to me: If you never use the backing value,
the backing value would always be uninitialized, so there's no reason
to include it.

One case that is not completely obvious is lazy-initialized properties.

```php
class Test {
public $prop {
get => $this->prop ??= expensiveOperation();
}
}
```

It's not immediately obvious that there is a public set operation
here. The correct way to fix this would be with asymmetric visibility,
which was previously declined. Either way, I don't consider this case
alone enough to completely switch our approach. Please let me know if
you are aware of any other potentially non-intuitive cases.

I will admit that it is unfortunate that a user of the property has to
look through the hook implementation to understand whether a property
is writable. As you have previously suggested, one option might be to
add an explicit `set;` declaration. Maybe it's a bit more obvious now,
after my previous e-mail, why we are trying to avoid this.

Apart from the things already mentioned, it's unclear to me whether,
with such `set;` declarations, a `get`-only backed property should
even be legal. With the complete absence of a write operation, the
assignment within the `set` itself would fail. To make this work, the
absence of `set;` would need to mean something like "writable, but
only within another hook", which introduces yet another form of
asymmetric visibility.

> > Dynamic properties are not particularly relevant today. The point was
> > not to show how similar these two cases are, but to explain that
> > there's an existing mechanism in place that works very well for hooks.
> > We may invent some new mechanism to access the backing value, like
> > `field = 'value'`, but for what reason? This would only make sense if
> > the syntax we use is useful for something else. However, given that
> > without guards it just leads to recursion, which I really can't see
> > any use for, I don't see the point.
>
> I can think of several reasons we *could* explore other syntax:
>
> 1) To make it clearer in code whether a particular line is accessing via the 
> hooks, or by-passing them 2) To make the code in the hooks shorter (e.g. 
> `$field` is significantly shorter than `$this->someDescriptiveName`) 3) To 
> allow code to by-pass the hooks at will, rather than only when called from 
> the hooks (e.g. having a single method that resets the state of several 
> lazy-loaded properties)
>
> Those reasons are probably not enough to rule out the current syntax; but 
> they show there are trade-offs being made.

Fair enough. 1 and 2 are reasons why we added the `$field` macro as an
alternative syntax in the original draft. I don't quite understand
point 3. In Kotlin, `field` is only usable within its associated hook.
Other languages I'm aware of do not provide a way to access the
backing value directly, neither inside nor ou

Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

2024-03-16 Thread Ilija Tovilo
Hi Rowan

On Sat, Mar 16, 2024 at 9:32 AM Rowan Tommins [IMSoP]
 wrote:
>
> On 16 March 2024 00:19:57 GMT, Larry Garfield  wrote:
>
> >Well, reading/writing from within a set/get hook is an obvious use case to 
> >support.  We cannot do cached properties easily otherwise:
> >
> >public string $expensive {
> >  get => $this->expensive ??= $this->compute();
> >  set {
> >if (strlen($value) < 50) throw new Exception();
> >$this->expensive = $value;
> >  }
> >}
>
>
> To play devil's advocate, in an implementation with only virtual properties, 
> this is still perfectly possible, just one declaration longer:
>
> private string $_expensive;
> public string $expensive {
>   get => $this->_expensive ??= $this->compute();
>   set {
> if (strlen($value) < 50) throw new Exception();
> $this->_expensive = $value;
>   }
> }
>
> Note that in this version there is an unambiguous way to refer to the raw 
> value from anywhere else in the class, if you wanted a clearAll() method for 
> instance.
>
> I can't stress enough that this is where a lot of my thinking comes from: 
> that backed properties are really the special case, not the default. Anything 
> you can do with a backed property you can do with a virtual one, but the 
> opposite will never be true.
>
>
> The minimum version of backed properties is basically just sugar for that - 
> the property is still essentially virtual, but the language declares the 
> backing property for you, leading to:
>
> public string $expensive {
>   get => $field ??= $this->compute();
>   set {
> if (strlen($value) < 50) throw new Exception();
> $field = $value;
>   }
> }
>
> I realise now that this isn't actually how the current implementation works, 
> but again I wanted to illustrate where I'm coming from: that backed 
> properties are just a convenience, not a different type of property with its 
> own rules.

That's not really how we think about it. Our design decisions have
been guided by a few factors:

1. The RFC intentionally makes plain properties and properties with
hooks as fully compatible as possible.

A subclass can override a plain property by adding hooks to it. Many
other languages only allow doing so if the parent property already has
generated accessors (`{ get; set; }`). For many of them, switching
from a plain property to one with accessors is actually an ABI break.
One requires generating assembly/IR instructions that access a field
in some structure, the other one is a method call. This is not
relevant in our case.

In most languages, a consequence of `{ get; set; }` is that such
properties cannot be passed by reference. This part _is_ relevant to
PHP, because PHP makes heavy use of explicit by-reference passing for
arrays, but not much else. However, as outlined in the RFC, arrays are
not a good use-case for hooks to begin with. So instead of fragmenting
the entirety of all PHP code bases into plain and `{ get; set; }`
properties where it doesn't actually make a semantic difference, and
then not even using them when it would matter (arrays), we have
decided to avoid generated hooks altogether.

The approach of making plain and hooked properties compatible also
immediately means that a property can have both a "backing value"
(inherited from the parent property) and hooks (from the child
property). This goes against your model that backed properties are
really just two properties, one for the backing value and a virtual
one for the hooks.

Our approach has the nice side effect of properties only containing
hooks when they actually do something. We don't need to deal with
optimizations like "the hook is auto-generated, revert to accessing
the property directly to make it faster", or even just having the
generated hook taking up unnecessary memory. You can think of our
properties this way:

```php
class Property {
public ?Data $storage;
public ?callable $getHook;
public ?callable $setHook;

public function get() {
if ($hook = $this->getHook) {
return $hook();
} else if ($storage) {
return $storage->get();
} else {
throw new Error('Property is write-only');
}
}

public function set($value) {
if ($hook = $this->setHook) {
$hook($value);
} else if ($storage) {
$storage->set($value);
} else {
throw new Error('Property is read-only');
}
}
}
```

Properties can inherit both storage and hooks from their parent.
Hopefully, that helps with the mental model. Of course, in reality it
is a bit more complicated due to guards and references.

2. Although you say backed properties are just syntactic, they really
are not. For example, renaming a public property, making it private
and replacing it with a new passthrough virtual property breaks
serialization, as serialization works on the object's raw values. On
the other hand, adding a hook to an existing property doesn't
influence its backing va

Re: [PHP-DEV] automatic formatting checks for pull requests?

2024-02-18 Thread Ilija Tovilo
On Sun, Feb 18, 2024 at 4:11 PM Gina P. Banyard  wrote:
>
> On Saturday, 17 February 2024 at 22:18, Ilija Tovilo  
> wrote:
>
> > * The new code style should be applied only to newly added sections or
> > changed code, not entire files. Otherwise, we'll have many changes in
> > large files, with endless merge conflicts when merging up from lower
> > branches.
>
> Surely the best way is to apply the formatting tool on all branches that are 
> supported (even in security support).
> Have them be merged upwards, and then add the revisions of the commits to a 
> .git-blame-ignore-revs file so that git blame doesn't care about them.
>
> This should resolve the issue of making future merges difficult.

Presumably, this would lead to merge conflicts in every open pull
request. Maybe a resolution strategy could be automated, not ideal
nonetheless.

Additionally, given that the PR has discovered a clang-format bug that
changes behavior
(https://github.com/php/php-src/pull/13417#issuecomment-1950920114),
I'd be wary of applying the formatting blindly to our stable branches.


Re: [PHP-DEV] automatic formatting checks for pull requests?

2024-02-17 Thread Ilija Tovilo
Hi Hans

On Sat, Feb 17, 2024 at 3:31 PM Gina P. Banyard  wrote:
>
> On Saturday, 17 February 2024 at 11:24, Hans Henrik Bergan  
> wrote:
>
> > Can we add automatic formatting checks for pull requests?
> > Made a PR: https://github.com/php/php-src/pull/13417
>
> It would be nice to have some formatting rules to harmonize the codebase as 
> it is somewhat the wild west,
> but as far as my understanding goes is that Clang format struggles to 
> understand our codebase (namely macros) and is difficult to set-up for 
> php-src.

Right. Consistent code style is nice, but what we have now is really
not that bad. There are a couple things I'd want if we enforce code
style:

* Fixing the style should be easy, running a single command without
first pushing to CI.
* It should be fast too, so that I can easily run it for every commit,
preferably even on-save in my editor.
* The new code style should be applied only to newly added sections or
changed code, not entire files. Otherwise, we'll have many changes in
large files, with endless merge conflicts when merging up from lower
branches.
* The formatting tool should work for all php-src code, not just plain
C code. We don't want to be forced to refactor old macros just because
we need to add a single line to some long-standing code. Last time I
tried clang-format, it utterly failed with our macros.

I haven't looked at your PR in detail, so I'm not sure which of these
points it satisfies. It would be great if you could quickly describe
how it works, and what the goals are.

Essentially, I'm just sceptical that this isn't more trouble than it's worth.

Ilija


Re: [PHP-DEV] RE: Testing new list server

2024-02-17 Thread Ilija Tovilo
Hi

On Sat, Feb 17, 2024 at 12:17 AM Jorg Sowa  wrote:
>
> Hello Derick,
> there is something wrong. I don't get all of the emails from the new setup, 
> only part. Examples of emails I didn't receive:
> - https://externals.io/message/122391
> - https://externals.io/message/122390
> - https://externals.io/message/122388
>
> I'm using Gmail and Spam doesn't contain any of them.

Same here. I'm using Gmail and have not received various e-mails.

Ilija


Re: [PHP-DEV] Requesting RFC karma

2024-02-17 Thread Ilija Tovilo

Hi Hans

On 16.02.24 13:05, Hans Henrik Bergan wrote:

My name is "Hans Henrik Bergan", usually go by the nickname
"divinity76", I've contributed to OSS (including PHP) for years, and
am currently involved in 3 things that might require an RFC, and
requesting RFC karma for wiki account "divinity76".


RFC karma was granted, good luck!

Ilija


Re: [PHP-DEV] Registration apply for php wiki

2024-02-17 Thread Ilija Tovilo
Hi JaeHan!

On Fri, Feb 16, 2024 at 10:53 AM 하늘아부지  wrote:
>
> Hi, My name is JaeHan Seo. (wiki username is daddyofsky)
>
> I hope I can make a proposal by registering on the php wiki.

I can give you karma. Usually, RFC karma (which is what I'm guessing
you're asking for) is granted for specific RFC ideas. Could you
quickly summarize what your idea is? See step 1 in the RFC howto:

https://wiki.php.net/rfc/howto

Thank you!
Ilija


Re: [PHP-DEV] [Discussion] Thoughts on casting to null

2024-02-14 Thread Ilija Tovilo
Hi Robert

On Wed, Feb 14, 2024 at 1:29 AM Robert Landers  wrote:
>
> I won't be the first to say this, at first glance, casting to null
> sounds silly, but short arrow functions must always return something,
> by design. That's when casting to null makes any sense at all (that I
> can think of): you want to write a succinct, short function but
> guarantee the result is discarded. Instead, if you really must use a
> short array function, you have to do something even weirder:
>
> EventLoop::repeat($pingInterval, fn() => $client->ping() ? null : null);

While (void) $client->ping() would solve your problem, it's not very
useful outside this scenario.

I think there are two issues you're implicitly referring to.

1. Arrow functions cannot be void, because they always return something.
2. Arrow functions cannot contain multiple statements.

As for the former, PHP actually had a similar issue for never closures
that was solved a while ago [1]. In the report I suggested that the
same could be done for void, by making void arrow functions evaluate
and drop the right hand side of =>, and always return nothing (i.e.
null). This would solve your issue, although probably mostly by
accident (because void functions return null, and your caller expects
exactly null). Regardless, I think this change would be useful, if
just to signal that the return value of an arrow function is not
intended to be used.

The latter would require some sort of block or grouped expression.
Short closures have been discussed extensively in the past, so I won't
get into that. There's also the comma operator in some languages like
C and JavaScript that evaluates a list of expressions and returns the
result of the last one, although probably not universally liked (i.e.
fn () => ($client->ping(), null)).

[1] 
https://github.com/php/php-src/commit/f957e3e7f17eac6e9195557f3fa79934f823fd38

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] php-src docs

2024-02-12 Thread Ilija Tovilo
Hi Yuya

It seems you accidentally sent your response to me instead of the list.

On Sun, Feb 11, 2024 at 5:10 PM youkidearitai  wrote:
>
> 2024年2月11日(日) 21:18 Ilija Tovilo :
> >
> > Hi everyone.
> >
> > I would like to start an initiative to centralize documentation of the
> > PHP internals.
> > https://github.com/php/php-src/pull/13338
> > https://iluuu1994.github.io/php-src/ (will be moved to php.github.io
> > once merged)
> >
> > Let me know of any thoughts and suggestions you might have.
>
> Hi, Ilija.
> Thank you for your great suggestion.
>
> It seems make sense to have a set of documents about the structure of
> php-src in php-src.
> Easily create pull requests to them.
>
> Although I have to learn reStructuredText, It is not seems major problem.

For some context, I initially planned to go with the mdBook from the
Rust project (https://github.com/rust-lang/mdBook/) as Markdown is a
bit more approachable. After writing the sample zval chapter, I
noticed some pain points in terms of formatting, most significantly
tables. That said, reStructuredText is far from perfect itself.

As mentioned previously, the other reason for choosing Sphinx was that
it is quite extensible.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] php-src docs

2024-02-12 Thread Ilija Tovilo
Hi Mönôme

On Sun, Feb 11, 2024 at 2:20 PM Mönôme Epson  wrote:
>
> > centralize documentation of the PHP internals
> I'm glad to hear that you're planning to centralize the documentation for PHP 
> internals.
>
> > Let me know of any thoughts and suggestions you might have.
> I have a preference for devel-docs/ instead of docs/ . This would make the 
> doc-en repository a PHP subproject.

Can you clarify? Do you mean a separate repository called devel-docs?
I don't hear the term "devel" much outside of package managers. The
suggestion was to put it directly in the php-src repository, where
there's not much confusion about its content (precisely the php-src
repository).

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] php-src docs

2024-02-11 Thread Ilija Tovilo
Hi everyone.

I would like to start an initiative to centralize documentation of the
PHP internals. As mentioned in a recent e-mail, there's a lot of
useful documentation scattered across the internet, like the PHP
internals book, various blogs of contributors, wiki.php.net, to name a
few. Information is currently hard to discover over multiple mediums.
Some of these mediums are also prone to go stale and can't be updated
by internals (e.g. blog posts).

After a brief discussion at the foundation, we think it's best to
incorporate the documentation directly into the php-src repository.
This makes it very easy to discover, contribute to, and allows
updating documentation right alongside its technical changes. To make
browsing the documentation easier, the documentation is built with
Sphinx and published to GitHub Pages.

I've prepared a PR here:
https://github.com/php/php-src/pull/13338
https://iluuu1994.github.io/php-src/ (will be moved to php.github.io
once merged)

Let me know of any thoughts and suggestions you might have.

Ilija

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Why are serialized strings wrapped in double quotes? (s::"")

2024-02-07 Thread Ilija Tovilo
Hi Sandy

On Tue, Feb 6, 2024 at 9:19 PM Sanford Whiteman  wrote:
>
> I'd like a little background on something we've long accepted: why
> does the serialization format need double quotes around a string, even
> though the byte length is explicit?
>
> Example:
>
>   s:5:"hello";
>
> All else being equal I would think we could have just
>
>   s:5:hello;
>
> Was this just to make strings look more 'stringy', even though the
> format isn't meant to be human-readable?

I don't have the historical context, but I'm assuming that's it. PHPs
serialization format is not efficient, and I don't think that was ever
the primary focus. If you need something more efficient, you can try
https://github.com/igbinary/igbinary which is aimed to be a drop-in
replacement.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] Re: [RFC][Vote] RFC1867 for non-POST HTTP verbs

2024-02-05 Thread Ilija Tovilo
Hi everyone

On Mon, Jan 22, 2024 at 10:23 AM Ilija Tovilo  wrote:
>
> I started the vote on the "RFC1867 for non-POST HTTP verbs" RFC.
> https://wiki.php.net/rfc/rfc1867-non-post

The RFC has been accepted with 23 yes and 1 no vote. As promised to
Sara, I will be setting up a poll for the suggested, alternative
function name in the coming days.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Discussion: making continue and break into an expression

2024-01-25 Thread Ilija Tovilo
Hi Larry

On Thu, Jan 25, 2024 at 6:38 PM Larry Garfield  wrote:
>
> On Thu, Jan 25, 2024, at 11:28 AM, Ilija Tovilo wrote:
>
>
> > This leads to very similar issues as break/continue inside blocks. See:
> > https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statements
> >
> I'm curious, how did `throw` expressions manage to avoid these issues?  Or 
> was it just "Ilija did the hard work of tracking down the weirdness?"

Can't really take the credit for this. This issue went over my head,
as this was my first RFC.

Exceptions work a bit differently, in that they use something called
live-ranges. Essentially, we look at the generated opcodes and figure
out which variables are "live" (i.e. valid and unfreed) during which
opcodes. For something like echo foo() + bar():

Pseudo opcodes:

 V1 = CALL foo
0001 V2 = CALL bar
0003 V3 = ADD V1 V2
0004 ECHO V3

V1 would be live for -0003, V2 for 0001-0003, V3 for 0003-0004. If
an exception is thrown (or rethrown across function boundaries) the VM
checks which temporary variables are currently live and frees them. So
if CALL bar were to throw, we'd see that V1 is currently live and
needs to be freed. For something like foo() + throw new Exception(),
if you replace the second CALL with a throw, you'll see that the
live-range for V1 doesn't change, and so this "just works".

There was, however, a related issue with the optimizer.

echo foo() + throw new Exception();

 V1 = CALL foo
0001 THROW
0003 V3 = ADD V1 false
0004 ECHO V3

Where the optimizer would remove the dead instructions after the
throw, breaking live-range analysis.

 V1 = CALL foo
0001 THROW

V1 no longer had a consuming opcode, and as such the algorithm could
no longer determine the live-range of V1. This would cause V1 to leak.
The solution was simply to disable dead code elimination for this
case. The solution was suggested by Tyson Andre and implemented by
Nikita.

In theory, break/continue expressions might try to re-use live-ranges.
I recall thinking about this, but I can't seem to remember if there
was a reason not to do it.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Discussion: making continue and break into an expression

2024-01-25 Thread Ilija Tovilo
Hi Robert

On Thu, Jan 25, 2024 at 10:16 AM Robert Landers
 wrote:
>
> Now that throwing is an expression, it allows for some very concise
> programming. What are your thoughts on making a break/continue into an
> expression as well?
>
> Instead of:
>
> while(true) {
> ...
> if(is_null($arr['var'])) continue;
> if($something) continue; else break;
> ...
> }
>
> You could write
>
> while(true) {
> ...
> $arr['var'] ?? continue;
> $something ? continue : break;
> ...
> }

This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statements

I'll try to explain.

The VM works with temporary variables. For the expression foo() +
bar() two temporary variables for the result of foo() and bar() will
be created, which are then used for the + operation. Normally, + will
consume both operands, i.e. use and then free them. However, with
break/continue etc. being expressions, it would become possible to
skip over the consuming instructions.

do {
echo foo() + break;
} while (true);
echo 'Done';

Pseudo opcodes:

 V1 = CALL foo
0001 JMP 0005
0002 V2 = ADD V1 false ; false is here represents a bottom value
that will never actually be used
0003 ECHO V2
0004 JMP 
0005 ECHO 'Done'

Since JMP will skip over the ADD instruction, V1 remains unused. A
similar problem already exists for break/continue in foreach itself.

foreach ($foos as $foo) {
   foreach ($bars as $bar) {
   break 2;
   }
}

foreach holds a copy of $bars (in case it gets modified) that normally
gets cleaned up when the loop ends. With break over multiple
loop-boundaries, we can completely skip over this freeing mechanism.
PHP solves this by inserting an explicit FE_FREE instruction before
the break 2, which itself is essentially just a JMP to the end of the
outer loop.

Hopefully it's now more evident why this is a problem:

while (true) {
   foo() && break;
}

foo() returns a value that would normally be consumed by the &&
operation. However, with break, we may skip over the && operation
entirely. As such, the break itself becomes responsible for freeing
these values. This requires significant changes in the compiler to
track variables that are currently "live" (i.e. haven't been consumed
yet), and emitting FREE opcodes for them as needed. I've implemented
this for match blocks here:

https://github.com/php/php-src/compare/master...iluuu1994:php-src:match-blocks-var-tracking

However, note that due to complexity, I've decided to disallow using
break/continue and the likes in such contexts to avoid this issue
completely, which isn't possible for what you are suggesting.

There's another related issue.

foo(bar(), break);

Function calls in PHP consist of multiple instructions, namely an
INIT_CALL, 0-n SEND and a DO_CALL opcode. INIT_CALL creates a stack
frame, SEND pushes arguments onto the stack frame, and DO_CALL starts
the execution of the function and frees both arguments and stack frame
when the function ends. If prior to a SEND opcode we break, we skip
over the DO_CALL, so the stack frame needs to be freed manually.

The patch linked above solves this by inserting CLEAN_UNFINISHED_CALLS
opcodes that do as the name suggests. This mechanism is already used
for exceptions. This should work for you, but was insufficient for
match blocks, for reasons I won't get into here.

All this to say: Don't expect the implementation here to be trivial.

Regards,
Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Re: Wiki Access request

2024-01-24 Thread Ilija Tovilo
On Wed, Jan 24, 2024 at 1:22 PM Ilija Tovilo  wrote:
>
> Thank you for the list. It looks more digestible, and most of it is
> already congruent with CONTRIBUTING.md.
>
> * https://www.zend.com/resources/writing-php-extensions requires
> sharing your contact infomation to obtain. I'm not sure how other
> people feel about this.

Oh, nevermind. That sidebar is clickable. I missed that there's an
online version.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Re: Wiki Access request

2024-01-24 Thread Ilija Tovilo
Hi Carlos

On Wed, Jan 24, 2024 at 12:29 PM Barel  wrote:
>
> > Feel free to share this list here, on GitHub or otherwise. I'm
> > skeptical whether throwing partially outdated resources at people is
> > actually helpful.
>
> I trimmed the list that I collected so that it only included the most
> significant items which have a lot of information. This list would be:
> - https://www.phpinternalsbook.com/ PHP Internals book
> - https://phpinternals.net/ PHP Internals web site with documentation about
> a lot of the structures and macros used in the code
> - https://www.npopov.com/ Nikita Popov's blog
> - http://blog.jpauli.tech/ Julien Pauli's blog
> - https://phpinternals.news/ Derick Rethans' podcast
> - https://www.zend.com/resources/writing-php-extensions Zend's guide about
> writing PHP extensions
> - https://wiki.php.net/internals The internals page in the wiki
> - https://www.informit.com/store/extending-and-embedding-php-9780672327049
> Sara Golemon's printed book
>
> Do you think this sounds good? If you do, I will create the PR to update
> the contributing doc

Thank you for the list. It looks more digestible, and most of it is
already congruent with CONTRIBUTING.md.

* Juliens blog is a bit large, but contains some very detailed posts
that should still be relevant. It might make sense to list them
explicitly.
* https://www.zend.com/resources/writing-php-extensions requires
sharing your contact infomation to obtain. I'm not sure how other
people feel about this.
* https://www.informit.com/store/extending-and-embedding-php-9780672327049
might be outdated, as it was published back in the PHP 5.1 era. I'll
leave it up to Sara whether she thinks the book should be recommended
at this time.

The rest seems to be already listed.

Regards,
Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC][Vote] RFC1867 for non-POST HTTP verbs

2024-01-24 Thread Ilija Tovilo
Hi Sara

Thank you for your feedback.

On Tue, Jan 23, 2024 at 8:41 PM Sara Golemon  wrote:
>
> On Mon, Jan 22, 2024 at 1:24 AM Ilija Tovilo  wrote:
>
> > I started the vote on the "RFC1867 for non-POST HTTP verbs" RFC.
> > https://wiki.php.net/rfc/rfc1867-non-post
> >
>
> 1/ This function reaches into the SAPI to pull out the "special" body
> data.  That's great, but what about uses where providing an input string
> makes sense.  For that, and for point 2, I'd suggest
> `http_parse_query(string $query, ?array $options = null): array|object`.

The RFC previously included support for the $input_stream variable
(string is not very appropriate for multipart because the input may be
arbitrarily large). The implementation wasn't complex, but it required
duplication of all the reads to support both a direct read from the
SAPI and a read from the stream, duplication of some limit checks and
special passing of the streams to avoid SAPI API breakage.

As for actual use cases, I found limited evidence that this function
would be useful for worker-based services _right now_. Most services
handle request parsing in some other layer. For example, RoadRunner
has a Go server that stores the file to disk, and then just passes the
appropriate path to PHP in the $_FILES array. It seems to me that a
custom input would be useful exclusively for a web server written in
PHP. The one that was pointed out to me (AdapterMan) handles requests
as strings, which would not scale for multipart requests.

I don't mind getting back to this if AdapterMan rewrites request
handling to use streams. Adding back the $input_stream parameter can
be done with no BC breaks. But for the time being, I don't think the
motivation is big enough to justify the added complexity.

Additionally, because multipart is used exclusively as a request
content type, it isn't useful in a general sense either, because a PHP
request will typically only receive one request (but potentially
multiple responses, in case it communicates with other servers).

> 2/ `request_` represents a new psuedo-namespace, functions are easier to
> find and associate if we keep them grouped.  I recommend 'http_` because it
> compliments the very related function `http_build_query()`, and for the
> version of this function which refers directly to the request:
> `http_parse_request(?array $options = null) : array|object`.

That's fair. If the name bothers you I can create an amendment RFC. I
think http_parse_body() would be a bit more appropriate, because
request implies more than just the body.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Re: Wiki Access request

2024-01-22 Thread Ilija Tovilo
Hi Carlos

You should now have access to the /internals sub-pages.

On Mon, Jan 22, 2024 at 11:38 AM Barel  wrote:
>
> I didn't know that there was a list of tech resources listed in the
> "CONTRIBUTING.md" file. I have been researching resources that have
> information about PHP internals and have found a lot, more than twenty. I
> can think of three possibilities:
> - List them all in the contributing document but this can be a bit
> overwhelming as, like I said, the number of resources available is big

Feel free to share this list here, on GitHub or otherwise. I'm
skeptical whether throwing partially outdated resources at people is
actually helpful.

> - Add a new page in the Github repo with all these links and link to that
> page from the contributing page
> - Keep the list in the wiki and link to that page from the contributing page

One of the reasons I'd like to move off of the wiki for these things
is that there is no review process. We highly value volunteer work,
but trusting new contributors to blindly make changes to official
guides is obviously somewhat problematic.

I'd prefer to make CONTRIBUTING.md the "official list" for now. This
file is allowed to be long. And as mentioned, I suspect that the list
of 20+ items may be trimmed.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] [RFC][Vote] RFC1867 for non-POST HTTP verbs

2024-01-22 Thread Ilija Tovilo
Hi everyone

I started the vote on the "RFC1867 for non-POST HTTP verbs" RFC.
https://wiki.php.net/rfc/rfc1867-non-post

Ilija

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Newly Created Wiki Account - Quick Introduction

2024-01-22 Thread Ilija Tovilo
Hi Jair!

On Mon, Jan 22, 2024 at 5:14 AM Jair Humberto  wrote:
>
> My name is Jair Humberto, I am brazilian and have been working with PHP
> since 2006. I am super excited about the new things PHP has launched lately
> and finally realised that I can contribute as well. I think this is a new
> fase of my career, I want to learn a bit more about the php source code and
> processes and I plan to start contributing with small things, but my first
> main goal (long term) is to make possible generics in PHP, one day!

Welcome! Note that you do not need a wiki account for contributing to
PHP. Source code is managed on GitHub (https://github.com/php/php-src)
and changes are made via pull requests. Nowadays, you'll only really
need a wiki account to create RFCs.

Check the CONTRIBUTING.md file for some good resources on the internals of PHP.
https://github.com/php/php-src/blob/master/CONTRIBUTING.md#technical-resources

If you need any guidance, feel free to contact me.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Wiki Access request

2024-01-21 Thread Ilija Tovilo
Hi Carlos!

On Fri, Jan 19, 2024 at 4:56 PM Barel  wrote:
>
> This page in the Wiki https://wiki.php.net/internals/references has a lot
> of links which are outdated and should be removed or changed. If you can
> provide wiki edit access for me I can work on updating them and also try to
> find more recent links to add to that reference page (suggestions welcome!!)
>
> My username for the wiki site is "barelon" in case this is needed

I can absolutely give you write access to these pages. Updating this
list to reflect more up-to-date resources certainly makes sense.

As you probably know, there are a number of different places where php
internals are documented, and I think that, long-term, it makes sense
to try to consolidate these efforts. We briefly spoke about this in
the last foundation meeting.

We have some documentation in the php-src repository itself, a
significant amount in the php internals book
(https://www.phpinternalsbook.com/), some in the wiki, some on blogs
of current or previous contributors, etc. There are a number of things
that are important when it comes to documentation, like convenience,
access, history, discoverability, etc.

While not the worst, I don't think the wiki is the best place for this
work. Handling documentation directly through Git in PHPs main
repository (or at least a repository in the PHP organization) would
likely tick the most boxes. Providing documentation with PRs might
also improve the understanding of intention of the changes for
reviewers.

As for links to other references, I believe the CONTRIBUTING.md file
is currently the most up-to-date.

https://github.com/php/php-src/blob/master/CONTRIBUTING.md#technical-resources

Rather than duplicating this list on the wiki, it might make sense to
reference the CONTRIBUTING.md file, and extend it as necessary.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



  1   2   3   4   >