Re: [PHP-DEV] function autoloading v4 RFC

2024-09-10 Thread Rob Landers
On Tue, Sep 10, 2024, at 15:28, Gina P. Banyard wrote:
> On Tuesday, 3 September 2024 at 23:44, Rob Landers  wrote:
>> On Thu, Aug 15, 2024, at 17:22, Rob Landers wrote:
>>> Hello internals,
>>> 
>>> I've decided to attempt an RFC for function autoloading. After reading 
>>> hundreds of ancient (and recent) emails relating to the topic along with 
>>> several abandoned RFCs from the past, and after much review, I've decided 
>>> to put forth a variation of a previous RFC, as it seemed the least 
>>> ambitious and the most likely to work:
>>> 
>>> https://wiki.php.net/rfc/function_autoloading4
>>> 
>>> Please let me know what you think. An implementation should be along before 
>>> opening it for a vote (now that I realize how important that is).
>>> 
>>> — Rob
>> 
>> Hello internals,
>> 
>> I've updated the RFC text and clarified some language (hopefully) while also 
>> addressing a number of concerns.
>>  1.  I've removed the BC break—the 'type' of the autoloadee will not be 
>> passed to the autoloader. This can allow someone to use spl_autoload for 
>> function autoloading if they so desire.
>>  2. I've removed artificial restrictions on the constants in which all 
>> functions that take them can accept both at the same time and behave 
>> appropriately.
>>  3. I've performed multiple benchmarks on several projects (WordPress, 
>> Symfony, and others from techempower) and couldn't detect a statistically 
>> significant performance difference with this feature in multiple 
>> configurations.
>>  4. Some simplified aspects of Gina's implementation for core autoloading 
>> were integrated into the patch. However, I see that RFC as a general 
>> refactoring of the API more than function autoloading. This RFC is strictly 
>> about function autoloading while being very conservative with the 
>> autoloading API.
>> While minor issues in the PR are still being worked on, I plan to open the 
>> vote next Wednesday, September 11th 2024, barring any significant pushback 
>> or objections
> 
> What is the rush in wanting to put this RFC to a vote *now*?
> You know that I was on holidays for 2 weeks and that I am open to 
> collaborate, as I replied to you off list.
> What benefit does opening the vote on the RFC give?
> You are giving busy work for RFC voters when there is another related RFC and 
> there is still a whole year to work on it.
> I'd expect this behaviour if this were June, not September
> 
> I can understand wanting to get stuff moving quickly, but this is not how 
> things work, adding support for something in a jank way is not better than 
> not adding it at all.
> I still do not believe that shoehorning more stuff into the current SPL 
> mechnaism is good, I do not think using a bitflag for setting the autoloaders 
> is a good nor sensible API (because if you class and function autoloader 
> behave the same I have questions).
> Reading the RFC I have no idea how you are tackling the global namespace 
> fallback, nor how you are going to prevent the lookup happening multiple 
> times.
> 
> As such in the current state I will vote against it, and be annoyed that you 
> are making me do busy work.
> 
> 
> Best regards,
> 
> Gina P. Banyard

Hi Gina & Jordi,

I believe there has been some confusion, so I'm postponing the vote 
indefinitely.

>From the beginning, I have stated that I don't see these two RFCs as competing 
>or mutually exclusive. The core autoloading RFC (which I'd still love to 
>collaborate on) seems to be about

> The proposal consists of adding a better designed class autoloading mechanism 
> and a new function autoloading mechanism, and ***aliasing the existing 
> autoload functions to the new versions*.**

(emphasis mine) From my perspective, function autoloading is a secondary aspect 
of the core autoloading RFC, even if it is the main motivation. The new core 
autoloading API includes function autoloading, but separating it wouldn't 
change the proposed API much, if at all, and would remove the need to introduce 
function autoloading as a "new" feature.

Here's a summary of potential outcomes with what we have today:

| core autoload | autoload v4 | function autoloading |
| success   | success | success  |
| fail  | success | success  |
| success   | fail| success  |
| fail  | fail| fail |

If this RFC isn't put towards a vote, there's only a 50% chance of having 
function autoloading in some form. However, i

Re: [PHP-DEV] bikeshed: Typed Aliases

2024-09-10 Thread Rob Landers
On Tue, Sep 10, 2024, at 10:00, Mike Schinkel wrote:
>> On Sep 9, 2024, at 5:35 PM, Rowan Tommins [IMSoP]  
>> wrote:
>> 
>> On 09/09/2024 19:41, Mike Schinkel wrote:
>>> In Go you cannot add or subtract on a typedef without casting to the 
>>> underlying type.  I would definitely prefer that to be relaxed, but only
>>>  if it is relaxed via an explicit opt-in, e.g. something maybe like 
>>> this:
>>> 
>>> typedef UserId: int operations: +, -, *, /;
>>> typedef UserName: string operations: .;
>> I think this would stray into some of the same complexity as operator 
>> overloads on objects, in terms of the types and values allowed. For instance:
>> 
> I tend to agree that allowing operations may be too much for an initial scope 
> given that it is unlike anything else in the current language and with no 
> other languages offering an equivalent AFAIK.
> 
> I would however make the distinction that it is unlike operator overloading 
> because the big concern was what constituted an operation for any given type 
> could be too subjective.  In your example of `Metres` it is pretty obvious, 
> but not at all obvious for a `User`, for example.  (BTW, thank you for not 
> calling out my nonsensical example of operations on a `UserId`; when I wrote 
> that I clear was not thinking about if they were relevant, doh!)
> 
> However give the suggestion regarding operations with a typedef, the only 
> operations that I suggested would be valid would be the ones already defined 
> on the underlying type, (when I mentioned other operations I was thinking of 
> methods — see my the following example with round — not operators so that is 
> not the same as operator overload.) For example:
> 
> /**
>  * Currency is an int so for example in USD 1 
>  * unit of currency not a dollar but a cent.
>  */
> typedef Currency: int operations: +,-,*,/,round;
> function CalcTotal(Currency $subTotal, Currency $shipping, float 
> $tax):Currency {
>return round($subTotal*(1+$tax/100),0) + $shipping;
> }

This is very similar (behaviorally) to what I wanted to do with GMP. 
Overloading GMP would have given you int-like powers in your type. The biggest 
negative feedback I got was that people would abuse it still; so we need some 
way to prevent abuse. If you read Jordon's operator overloads RFC, and my GMP 
overloading RFC, you can see that users basically need a way to define how to 
operate over even just an integer.

For example, Dollar(1) + Euro(3) is what? Or even Dollar(1) + 1? How does a 
developer prevent someone from doing something nonsensical? The language needs 
to enforce some rules and/or the developer needs to be able to define them. 
These rules need to be intuitive and well reasoned, IMHO.

>> typedef Metres: int;
>> 
>> assert( Metres(2) +  Metres(1) === Metres(3) ); // most obvious
>> assert( Metres(2) + 1 === Metres(3) ); // seems pretty clear
> 
> Both of those are in line with what I was suggesting.
>> 
>> 
>> $_GET['input'] = '1';
>> assert( Metres(2) + $_GET['input'] === Metres(3) ); // might be more 
>> controversial
>> 
>> 
> I would not consider this appropriate as it has two levels of conversion and 
> could thus end up with unintended edge cases. To do the above I think you 
> would have to either convert or typecast:
> 
> assert( Metres(2) + intval($_GET['input']) === Metres(3) ); 
> assert( Metres(2) + (int)$_GET['input'] === Metres(3) ); 
>> 
>> 
>> typedef Feet: int;
>> assert( Metres(2) + Feet(1) === Metres(3) ); // almost certainly a bad idea
>> 
>> 
> This would be operator overloading where knowledge of the conversion between 
> meters and feet would be required, and that is not in any way in scope with 
> what I was suggesting.  
> 
> As an aside, I am against userland operator overloading as I have seen in 
> other languages that operator overloading gets abused and results in code 
> that is a nightmare to maintain. OTOH, I would support operator overloading 
> in specific cases, e.g. a `Measurement` class in PHP core could allow adding 
> meters to feet, assuming such a proposal were made and all other aspects of 
> the RFC were of the nature to be voted in.
> 
> To reiterate on typedefs, what I was suggesting was that if an operation was 
> explicitly allowed — e.g. + — then anything that would work with the 
> underlying type — such as adding an int 1 would work without typecasting and 
> yet still result in the typedef type, e.g. Meters(2) + 1 results in a value 
> of type Meters. (note that I corrected your spelling of 'Meters' here. ;-) 
> 
> But I agree, this is probably a bridge too far for a first RFC for typedefs. 
> 
>>> type MyNewType: Foo
>>> type MyAlias = Foo
>> I know this was only an example, but as a general point, I think we should 
>> avoid concise but cryptic differences like this. PHP is generally 
>> keyword-heavy, rather than punctuation-heavy, and I think that's a valid 
>> style which we should keep to.
>> 
> Here, I also tend to agree WRT PHP.  Was just pointing out for

Re: [PHP-DEV] bikeshed: Typed Aliases

2024-09-07 Thread Rob Landers


On Sat, Sep 7, 2024, at 15:28, Larry Garfield wrote:
> On Fri, Sep 6, 2024, at 7:46 PM, Davey Shafik wrote:
> 
> > My main struggle with this is readability. As much as I want custom 
> > types (and type aliases is a good chunk of the way there), the main 
> > issue I have is understanding what the valid inputs are:
> >
> > function foo(Status $string): void { }
> >
> > How do I know that Status is a) not a class, b) that I can fulfill the 
> > requirement with a string, and/or maybe any object with __toString(), 
> > or maybe it’s ints? Or objects or enums?
> >
> > Even with file-local aliases (which I would definitely prefer to avoid) 
> > we will most likely rely on developer tooling (e.g. IDEs and static 
> > analyzers) to inform the developer what the right input types are.
> >
> > I would very much prefer to either go all in with an Enum-like (which 
> > means that we can hang methods on to the value) or we need to 
> > distinguish between type hints for class-likes and type hints for 
> > not-class-likes (*Bar anyone?).
> >
> > Expanding on type-class-likes: within the type methods, $this->value 
> > would refer to the original value, any operators would follow the 
> > _same_ rules as either the original values type (e.g. $int = 4; $string 
> >  = “foo”; $int . $string == “4foo", or call __toString() in all the 
> > normal places for strings if defined).
> >
> >
> > type Stringable: string|int {
> >  public function __toString(): string
> >  {
> >   return (string) $this->value; // original value
> >  }
> >
> >  // Add Stringable methods here
> > }.
> 
> Methods on typedefs was the sort of "other stuff classes do" that I was 
> trying to avoid getting into right now. :-)  Mainly because I can totally see 
> how it's tempting, but also have no idea what it would mean from a 
> type-theoretic perspective.  It would only make sense if we're talking type 
> DEFs, not type ALIASes.  I'm not against that, and it could be fun to try and 
> think through the type theoretical implications, but I don't think that's 
> what Rob was going for so I didn't want to take that side quest just yet.  
> (Though if he's OK with it, I'm OK with it.)

To be fair, I should probably mention that I've already explored it some (which 
I alluded to in another thread a couple of weeks ago). 😉 So... I guess it is 
'on-topic' for no other reason than it is interesting.

Here are my notes thus far:

__Sugary Generics__

Since we could attach behaviors (at least at the engine level) we could use 
this to implement generics. Imagine we have a Box and want to instantiate a 
Box. To do this, when we define a Box, we actually define an 
alias internally.

This is what the definition of our generic box class in PHP might look like:

class Box { public T $var; }

It would get compiled to something like this (though it would probably be 
invalid php, it perhaps illustrates my meaning):

class Box {
  public BoxT $var;
  public function __construct(private alias BoxT) {}
}

Then, to instantiate a Box, the engine compiles the constructor, 
passing the types as arguments (still probably not ever valid php) and stealing 
a bit from python's "self":

$box = new Box; // compiles to $box = new Box(alias int|float);

The beauty of this is that it automatically becomes an error if you forget the 
type argument (and if we made it the last argument, would allow setting default 
values for BC reasons like Collection)

Even constraints on T could be expressed similarly:

class Box { public T $var; }

which may get compiled into this not ever valid php:

class Box {
  public BoxT $var;
  private alias BoxTConstraint => int|float;
  public function __construct(private alias BoxT) {
if (BoxT is not a BoxTConstraint) throw new RunTimeException();
}

Basically, it just needs to check that the BoxT alias is of the right type 
during construction, essentially making generics just a sugary layer over 
aliases.

__Ambitious Type System Replacement__

I've also explored it in the case of types, in general, by replacing the entire 
type system with this way of representing types (where a special class 
represents a real type of value and its behavior). This would allow for 
defining casting rules, operators, passing types as values (for pattern 
matching), etc on types themselves. I have no idea what that would look like 
"at scale", but it is interesting to think about because primitive types would 
have the same way of working as classes and everything else. It would also 
separate types from their implementation—whether we want to expose this to 
user-land is a different story. 

I suspect this could be a 'technical-only' change and not affect user-land at 
all. zvals would probably get a lot simpler though... 

I generally stop myself from thinking too much about it, because while 
interesting, it is a TON of work. Not that I'm afraid of doing that work, I 
just don't want to do it by myself. So, I'm cautiously optimistic as a

Re: [PHP-DEV] bikeshed: Typed Aliases

2024-09-07 Thread Rob Landers
On Sat, Sep 7, 2024, at 14:42, Mike Schinkel wrote:
> > On Sep 6, 2024, at 4:45 PM, Larry Garfield  wrote:
> > Aliases can then be used only in parameter, return, property, and 
> > instanceof types.  Extends and implements are out of scope entirely.
> 
> Is there a strong technical reason why extends and implements should be out 
> of scope? 
> 
> There is definite utility for this, to create a local alias in a namespace 
> that can be used throughout the namespace rather than having to refer to the 
> external namespace in many different places.
> 
> > On Sep 6, 2024, at 8:46 PM, Davey Shafik  wrote:
> > I would very much prefer to either go all in with an Enum-like (which means 
> > that we can hang methods on to the value) or we need to distinguish between 
> > type hints for class-likes and type hints for not-class-likes (*Bar 
> > anyone?).
> 
> Allowing methods also have definite value as most use-cases I have seen in 
> other languages alias in order to add methods, especially for enabling 
> support of interfaces.
> 
> Which, however, brings up an important distinction that other languages have 
> made and which I think PHP would benefit from addressing:
> 
> 1. Type Alias => Different Name for Same Type
> 2. Type Def => New Type which has all the same properties and methods of 
> other type
> 
> e.g. (being hypothetical with the syntax; bikeshed away):
> 
> typealias  LocalWidget: Widget
> 
> typedef  MyWidget: Widget {
>function foo() {...}
> }
> 
> function doSomething(Widget $widget) {...}
> 
> $w = new LocalWidget;
> doSomething($w);   // This works, no problem as LocalWidget === Widget
> 
> $w = new MyWidget;
> doSomething($w);   // This throws an error as MyWidget !== Widget
> 
> -Mike
> 
> 

Hey Mike,

Keep in mind there is already single-class aliasing with well-known behavior 
for both local and project-wide aliases. Local aliases are done through 'use' 
statements, and project-wide aliases can be created by using the 
`class_alias()` function.

I feel like any aliasing for primitive/intersection/union types should feel 
like an extension of that for local aliases. For 'project-wide' aliases, I'm 
open to much more different syntax, like typealias or even 'alias'.

As far as extends/implements go, I plan to keep it the same for simple class 
aliases as there is utility there and the RFC already covers this.

— Rob

Re: [PHP-DEV] bikeshed: Typed Aliases

2024-09-07 Thread Rob Landers
On Sat, Sep 7, 2024, at 05:17, Juliette Reinders Folmer wrote:
> On 6-9-2024 20:41, Rob Landers wrote:
>> - This RFC expands the "use ... as ..." syntax to allow any type expression 
>> on the left side. These aliases are PER FILE and expire once the file is 
>> compiled.
>> 
> 
> Explicitly without expressing any opinion about the RFC, I'd just like to 
> remind you that `use...` import statements for classes and such are not 
> actually per file, but per namespace and making the `use ...` statements for 
> types behave differently would be very inconsistent and surprising behaviour.
> 
> These are the rules (as far as I know and based on extensive testing from my 
> side):
> * No namespace - `use` imports apply to whole file.
> * Curly brace scoped namespace - `use` imports apply only to the code within 
> the current namespace scope.
> * Single unscoped namespace - `use` imports apply to whole file (as the whole 
> file is within the single unscope namespace)
> * Multiple unscoped namespaces - `use` imports apply only until the next 
> namespace declaration is encountered.
> 
> Having type `use` behave differently to other import `use` statements would, 
> I imagine, also bring up problems around "file contains 2 scoped namespaces, 
> type use is for the file, but - as things currently are - no code is allowed 
> in the file outside the scoped namespaces, so where to place the import for 
> the type ? and what would it then apply to ?"
> 
> Smile,
> Juliette

Thanks Juliette,

I meant "as they are now" and not literally "per file" as described in the RFC. 
I will make that more clear in the RFC. When I wrote it, I was thinking in 
terms of how I usually write namespaced code and thought of it as "per file" 
but that is probably the wrong mental model in any case. Thanks again for 
pointing this out.

— Rob

Re: [PHP-DEV] bikeshed: Typed Aliases

2024-09-06 Thread Rob Landers


On Sat, Sep 7, 2024, at 01:34, Larry Garfield wrote:
> On Fri, Sep 6, 2024, at 6:27 PM, Rob Landers wrote:
> 
> >> I suspect there's also other edge case bits to worry about, particularly 
> >> if trying to combine a complex alias with a complex type, which could lead 
> >> to violating the DNF rule.  For example:
> >
> > Oh, DNF is the bane of my existence with this RFC—I don't want to mess 
> > this up. I'll see you at the end of the example, though.
> >
> >> 
> >> typealias Foo: (Bar&Baz)|Beep;
> >> 
> >> use (Bar&Baz)|Beep as Foo;
> >> 
> >> function narf(Foo&Stringable $s) {}
> >> 
> >> With the compile time approach, that would expand to 
> >> `(Bar&Baz)|Beep&Stringable`, which is not a valid type def.
> >
> > I can see how you arrived at this, but I think you may have missed a 
> > step, since the entirety of Foo will be &'d with Stringable.
> >
> > Foo = (Bar & Baz) | Beep
> >
> > want: (Foo) & Stringable
> >
> > expand Foo: ((Bar & Baz) | Beep) & Stringable
> >
> > Which can be reduced to the following in proper DNF (at least, it 
> > compiles—https://3v4l.org/0bMlP):
> >
> > (Beep & Stringable) | (Bar & Baz & Stringable)
> >
> > It's probably a good idea to update the RFC explaining how expansion works.
> 
> Woof.  We're not "fixingup" anyone's DNF elsewhere.  I cannot speak for 
> everyone, but I'd be perfectly fine not doing any magic fixing for now, and 
> then debating separately if we should do it more generally.  Just doing it 
> for aliases doesn't seem like the best plan.
> 
> --Larry Garfield
> 

Oh, we're definitely not "fixingup" the expression to DNF... more like spending 
some time in the RFC showing how the expansion is the same execution as with a 
DNF expression to prove that it is a valid type expression.

— Rob

Re: [PHP-DEV] bikeshed: Typed Aliases

2024-09-06 Thread Rob Landers
On Fri, Sep 6, 2024, at 22:45, Larry Garfield wrote:
> Hi Rob.
> 
> First of all, I'm very much in favor of type aliases generally, so thank you 
> for taking a swing at this.
> 
> Second, it looks like you've run into the main design issue that has always 
> prevented them in the past: Should aliases be file-local and thus not 
> reusable, or global and thus we need to figure out autoloading for them?  It 
> looks like your answer to that question at the moment is "yes". :-)  While I 
> can see the appeal, I don't think that's the best approach.  Or rather, if we 
> go that route, they shouldn't be quite so similar syntactically.
> 
> There seems to be two different implementations living in the same RFC, 
> uncomfortably.  In one, it's a compiler-time replacement.  In the other, it's 
> a special class-like.  But the RFC seems to go back and forth on what happens 
> in which case, and I'm not sure which is which.
> 
> However, you have demonstrated a working class-like for it, which is frankly 
> the biggest hurdle.  So I think the direction has promise, but should be 
> adjusted to go all-in on that approach.
> 
> To wit:
> 
> typealias Stringy: string|Stringable;
> typealias UserID: Int;
> typealias TIme: Hour|Minute|Second;
> typealias FilterCallback: callable(mixed $a): bool;  (eventually...)
> 
> (etc.)
> 
> Each of those produces a class-like, which can therefore be autoloaded like a 
> class.  The syntax is also a bit closer to a class (or an Enum, I suppose), 
> so it's much more self-evident that they are defining a reusable thing 
> (whereas "use" does not do that currently).  And the syntax is not stringy, 
> like the proposed type_alias(), so it's easier to write.  I wouldn't even 
> include type_alias() at that point.  It exists at runtime, so reflection is 
> meaningful.
> 
> Aliases can then be used only in parameter, return, property, and instanceof 
> types.  Extends and implements are out of scope entirely.
> 
> (Whether the keyword is typealias or typedef, uses : or =, or whatever, is 
> its own bikeshed I won't dive into at the moment.)
> 
> Then, as a separate, entirely optional, maybe even separate RFC (or second 
> vote, or whatever), we have a `use string|Stringable as Stringy` syntax.  
> Like all other `use` declarations, these are compile-time only, single-file 
> only, and do not exist at runtime, so no reflection.  They compile away just 
> like all other use-statements now.
> 
> I'm not personally convinced the second is really necessary if we do a good 
> enough job on the first, but I'd probably not stand in the way of having both.

That's a really good point and would clear up quite a bit of confusion and 
complexity.

> 
> Having typealias/typedef as a class-like also opens up some interesting 
> potential in the future, because classes have all sorts of other things they 
> do, but that is probably too complex scope creepy to get into here so I will 
> not go further than that mention.
> 
> I suspect there's also other edge case bits to worry about, particularly if 
> trying to combine a complex alias with a complex type, which could lead to 
> violating the DNF rule.  For example:

Oh, DNF is the bane of my existence with this RFC—I don't want to mess this up. 
I'll see you at the end of the example, though.

> 
> typealias Foo: (Bar&Baz)|Beep;
> 
> use (Bar&Baz)|Beep as Foo;
> 
> function narf(Foo&Stringable $s) {}
> 
> With the compile time approach, that would expand to 
> `(Bar&Baz)|Beep&Stringable`, which is not a valid type def.

I can see how you arrived at this, but I think you may have missed a step, 
since the entirety of Foo will be &'d with Stringable.

Foo = (Bar & Baz) | Beep

want: (Foo) & Stringable

expand Foo: ((Bar & Baz) | Beep) & Stringable

Which can be reduced to the following in proper DNF (at least, it 
compiles—https://3v4l.org/0bMlP):

(Beep & Stringable) | (Bar & Baz & Stringable)

It's probably a good idea to update the RFC explaining how expansion works.

> 
> With the runtime approach, I don't know if that could be handled gracefully 
> or if it would still cause an error.
> 
> I'm not sure what the right solution is on this one, just pointing it out as 
> a thing to resolve.
> 
> --Larry Garfield
> 

— Rob

Re: [PHP-DEV] RFC: Deprecate json_encode() on classes marked as non-serializable

2024-09-06 Thread Rob Landers
On Fri, Sep 6, 2024, at 20:07, Claude Pache wrote:
> 
> 
>> Le 5 sept. 2024 à 18:03, John Coggeshall  a écrit :
>> 
>> 
>>> As per my previous email to the list, I have now created the official RFC 
>>> to deprecate calling json_serialize() on instances of classes marked with 
>>> ZEND_ACC_NOT_SERIALIZABLE.
>> 
>> I would suggest we take a step back from this and look at it with a bit more 
>> of a wider lens. It seems to me that this would be a good place to have an 
>> attribute (e.g. `#[NotSerializable]` )  that could be defined for any class 
>> (with `ZEND_ACC_NOT_SERIALIZABLE`  being automatically given this 
>> attribute)? It just seems to be a more holistic approach that makes sense, 
>> rather than basing it on internal engine stuff and/or limiting it to 
>> internal objects.
>> 
>> Coogle
>> 
> Hi,
> 
> An attribute adds very little value. It doesn’t add new capability, because 
> you can achieve the same effect with a serialiser that throws 
> unconditionally; it is just a nicer syntax. People generally don’t bother 
> making their classes unserialisable unless they have a good reason; having an 
> attribute won’t really change that.

You also need to ensure that you make it final, as someone could extend your 
class and make your borked serializer work.

> The core problem is that objects are json-serialisable by default, although 
> most of them are not supposed to be json-serialised. Taking a second step 
> back, if we really want to solve the issue, one should:
> 
> * for internal classes, determine which ones could be json-serialisable (at 
> least, stdClass); for all other classes, `json_encode(...)` shall be disabled 
> (after a deprecation period);
> * for user-defined classes, the user shall opt into json-serialisability, 
> either by extending a json-serialisable class, or by using an {attribute / 
> magic method / interface} (chose your bikeshed colour).
> 
> —Claude

I also agree with this to a point: there is the 
https://www.php.net/manual/en/class.jsonserializable.php interface, after all.

— Rob

[PHP-DEV] bikeshed: Typed Aliases

2024-09-06 Thread Rob Landers
Hello Internals,

I'm going to try something new. I've been working on another RFC called "Typed 
Aliases" (https://wiki.php.net/rfc/typed-aliases). It is very much a draft and 
in-flux, and I've already worked out the technical mechanics of it ... however, 
I am very unhappy with the syntax, and while I have a few ideas about that; I 
assume this list will be much better at it than me. So, please bring your 
brushes and paint; I brought a bikeshed.

If you haven't read it already, here's the TL;DR:

- This RFC expands the "use ... as ..." syntax to allow any type expression on 
the left side. These aliases are PER FILE and expire once the file is compiled.

- This RFC also adds the ability for an alias to survive a file (currently 
using the "as alias" syntax that I don't like) which actually just creates a 
special kind of class. When this special class is encountered during 
type-checking, the alias is expanded and checked. It also allows this via a 
"type_alias()" function instead of the "use ... as alias ..." syntax.

How it works:

use string as alias MyString

gets virtually compiled into a special class that would look something like 
this to ReflectionClass (as it is currently):

class MyString extends PrimitiveAlias { 
  const PrimitiveType aliasOf = PrimitiveType::string;
}

- Reflection is a bit weird here, and I'm not exactly happy with it; but I'm 
curious what the list thinks. I'm open to virtually anything that makes sense 
here; including not allowing ReflectionClass on the type aliases at all.

- Since these are "technically" classes, I went with just "use"-ing them like 
normal classes. Originally, I had something different: "use alias ..." (like 
"use function ...") to make it more clear. I will probably go back to this, but 
I'm curious what others think.

I'm going to take a step back and listen/answer questions. But please, grab a 
brush and paint.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-09-05 Thread Rob Landers
On Thu, Sep 5, 2024, at 21:03, Jakob Givoni wrote:
> 
> 
> On Wed, Sep 4, 2024 at 9:18 PM Rob Landers  wrote:
>> __
>> On Wed, Sep 4, 2024, at 17:16, Jakob Givoni wrote:
>>>  
>>>> 2. I've removed artificial restrictions on the constants in which all 
>>>> functions that take them can accept both at the same time and behave 
>>>> appropriately.
>>> 
>>> I'm not a big fan passing flags and using binary operations to combine 
>>> options into a single parameter. 
>>> It works, but it's opaque and old-school.
>>> We have both named parameters and enums now, can't we just use those going 
>>> forward, making each option a separate parameter or using enums with 3 
>>> cases, FUNCTION, CLASS or BOTH? 
>> 
>> Thank you for your opinion, but for cases like this, enums is probably one 
>> of the worst choices IMHO. As mentioned towards the end of the RFC, I'd like 
>> to add further support for other things, such as constants and stream 
>> filters. Further, it appears that enums cannot be used in SPL (at least, I 
>> couldn't get it to link) due to SPL having a recursive dependency on itself. 
>> This is what Gina's RFC seeks to rectify by breaking autoloading out of SPL. 
>> This RFC focuses purely on function autoloading.
> 
> I see that under "Future scope" you put:
>> Potentially, constants and stream wrappers can be added in a similar fashion.
> Trying to figure out if you are referring to the possibility of autoloading 
> stream wrappers and constants? Is that something there's a need/desire for?

Hey Jakob,

I'm replying this in a separate thread because it is more or less 'meta' than 
my other reply. There have been discussions about this off-and-on for awhile, 
but the gist is that there are people that would find it useful. For example, 
in my experimental "time" library there is a SECOND and MINUTE constant that 
must be required if you have the library, even if your code execution path 
never uses them. So, I think there might be some usefulness to constant 
autoloading. Further, there are some great async libraries that use stream 
wrappers to hijack/use file_get_contents and friends in an async way, so having 
autoloading there may also be useful.

That being said, I wanted to constrain the scope initially as this appeared to 
be a hot topic (autoloading functions) every time it came up. Thus I was 
preparing myself for a long and drawn-out discussion and kept the scope minimal.

I hope that helps explain why it is under future scope.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-09-05 Thread Rob Landers
On Thu, Sep 5, 2024, at 21:03, Jakob Givoni wrote:
> 
> 
> On Wed, Sep 4, 2024 at 9:18 PM Rob Landers  wrote:
>> __
>> On Wed, Sep 4, 2024, at 17:16, Jakob Givoni wrote:
>>>  
>>>> 2. I've removed artificial restrictions on the constants in which all 
>>>> functions that take them can accept both at the same time and behave 
>>>> appropriately.
>>> 
>>> I'm not a big fan passing flags and using binary operations to combine 
>>> options into a single parameter. 
>>> It works, but it's opaque and old-school.
>>> We have both named parameters and enums now, can't we just use those going 
>>> forward, making each option a separate parameter or using enums with 3 
>>> cases, FUNCTION, CLASS or BOTH? 
>> 
>> Thank you for your opinion, but for cases like this, enums is probably one 
>> of the worst choices IMHO. As mentioned towards the end of the RFC, I'd like 
>> to add further support for other things, such as constants and stream 
>> filters. Further, it appears that enums cannot be used in SPL (at least, I 
>> couldn't get it to link) due to SPL having a recursive dependency on itself. 
>> This is what Gina's RFC seeks to rectify by breaking autoloading out of SPL. 
>> This RFC focuses purely on function autoloading.
> 
> I see that under "Future scope" you put:
>> Potentially, constants and stream wrappers can be added in a similar fashion.
> Trying to figure out if you are referring to the possibility of autoloading 
> stream wrappers and constants? Is that something there's a need/desire for?
> 
> In any case, it seems less than ideal to introduce a change to existing 
> functions now only to change them again after Gina's RFC. 
> 
> Which means we'll be locked into binary flags even if enums (not even sure 
> what you have against their use in this case) will be possible later.
> 
> I also mentioned named parameters as an alternative, any thoughts on that?
> F.ex. changing one of your examples to:
> 
> spl_autoload_call('func', function: true, class: true); // Calls both 
> autoloaders with the name 'func'

Hey Jakob,

Thank you for sharing your thoughts on this. I see where you’re coming from 
regarding enums and named parameters as a more modern approach, and I 
appreciate your suggestion to consider them.

For the current RFC, I leaned towards using binary flags because they provide a 
level of consistency with existing PHP functions. There are also some technical 
challenges with using enums here, especially since enums seem incompatible with 
SPL due to its recursive dependency. This is something I hope we can address in 
the future, potentially through Gina's RFC.

I think named parameters could be a good middle ground and might make the 
function calls clearer. If you don't mind me asking, what do you have against 
binary flags?

I also understand your concern about making adjustments to the API twice and 
how it might have to change again if Gina's RFC passes. My thinking was that by 
moving forward with this proposal, we at least make some progress on function 
autoloading, which has been debated, off-and-on, for a long time.

I'm not opposed to holding off on merging the implementation if it makes sense 
to wait and see how Gina's proposal develops, but I also don't want to risk 
withdrawing my RFC and Gina's not passing for unrelated reasons; I hope that 
makes sense.

— Rob



Re: [PHP-DEV] RFC: Deprecate json_encode() on classes marked as non-serializable

2024-09-05 Thread Rob Landers


On Thu, Sep 5, 2024, at 10:55, Alexandru Pătrănescu wrote:
> 
> On Tue, Sep 3, 2024 at 2:27 PM Philip Hofstetter  
> wrote:
>> Hello,
>> 
>> As per my previous email to the list, I have now created the official RFC to 
>> deprecate calling json_serialize() on instances of classes marked with 
>> ZEND_ACC_NOT_SERIALIZABLE.
>> 
>> https://wiki.php.net/rfc/deprecate-json_encode-nonserializable
>> 
>> 
> 
> Sharing my experience, I never use json_encode on objects but on arrays (that 
> are both JSON objects or JSON arrays).
> When I see an object implementing JsonSerializable, I think it is the wrong 
> approach, and I am usually able to find a better way.
> Maybe if we could go back in time, we would not allow json_encode on an 
> object, except if it implemented JsonSerializable, but that ship sailed long 
> ago.
> 
> To your proposal, I think the BC break would be pretty big and I don't see a 
> way it would pass.
> On https://www.php.net/json_encode we can read:
> > If a value to be serialized is an object, then by default only publicly 
> > visible properties will be included.
> So that's the expected behaviour.
> 
> Yes, you can say that encoding as JSON is just "another serialization 
> method", but the default method in PHP, using json_encode()/json_decode(), is 
> not symmetrical when using objects.
> And here lies the difference with serialize()/unserialize(), as this one aims 
> to be symmetrical, and where it can't be, it has a way to stop the 
> serialization.
> 
> I am happy with the current way it works, getting an empty JSON object if 
> there are no public properties for a Generator or Closure.
> And I don't think having an error for them would improve the language in a 
> meaningful way, and the BC break is not worth it.
> 
> Regards,
> Alex
> 

To add to this, we apparently use json_encode at work to serialize custom 
exceptions, which appears to work. This RFC would break that, I think. 

— Rob

Re: [PHP-DEV] Local constants

2024-09-04 Thread Rob Landers


On Wed, Sep 4, 2024, at 23:11, John Bafford wrote:
> 
> > On Sep 4, 2024, at 16:45, Rob Landers  wrote:
> > 
> > I think, conceptually, this makes sense, but maybe I'm the only one who 
> > writes
> > 
> > $arr = doSomething();
> > $arr = array_map(fn($x) => $x->prop, $arr);
> > $arr = array_filter($arr, $filterFunc);
> > 
> > instead of
> > 
> > $arr = array_filter(array_map(fn($x) => $x->prop, doSomething()), 
> > $filterFunc);
> 
> IMO, that's a failing of PHP not supporting object syntax (and reasonable 
> api) for non-object values. In other languages, I can write code such as:
> 
> let arr = doSomething()
> .map { $0.prop }
> .filter(filterFunc)
> 
> Which is more readable than both PHP versions.

I think that sidesteps the reality we are currently in and is orthogonal to the 
actual problem I perceive. To be more clear, I think with the current APIs in 
PHP, having local constants would incentivize people to write less readable 
code for the reasons you mentioned. Could the APIs be improved? Probably, if 
not certainly, but that is not what is on the table. 

> 
> > And I feel like having constant variables would result in more of the 
> > latter, which I feel is more unreadable. Though I do note that my opinion 
> > of what is readable might be different from other people's.
> > 
> > That being said, I would much rather see block-scoped variables than local 
> > variables via let or var or something:
> > 
> > var $aNumber = 12;
> > 
> > foreach($arr as var $item) {
> >   echo $item; // item exists here
> > }
> > 
> > echo $item; // item doesn't exist here
> > 
> > PHP is simply too verbose to really benefit from local constants, but would 
> > benefit from block-scope far more. For example, with local constants, you 
> > couldn't write that foreach because that variable exists in the scope of 
> > the function and can only be defined once.
> > 
> > — Rob
> 
> I'd suggest that the foreach could be written with either of let/var to both 
> locally scope $item, which would explicitly disallow/allow $item to be 
> changed while in the loop.

Maybe, but I only mention that block-scope would be more valuable than local 
constants. That they work well together is also interesting.

> 
> Another benefit of local constants is preventing accidentally reusing a prior 
> variable. For example, if I set a temporary variable to something, and then 
> later in the function, reuse that same name for a new temporary but don't set 
> the value on all flow paths, then the value from earlier in the function 
> leaks through, whereas either using a constant, or a second explicit variable 
> definition would have exposed and/or prevented the issue.
> 
> -John

This is a problem that I have run into in only a handful of times in my entire 
career, across all languages, lots of times in JavaScript files of yesteryear 
when we had to deal with 10k loc handrolled files. I’ve seen this happen maybe 
2-3 times in php (where it has been a bug), and a couple of times in C. I don’t 
think I’ve ever run into it in C#, Scala, or Python. Maybe I’m just lucky, but 
I don’t feel like this is a valid reason for the feature.

Const was added to JavaScript (according to my cobwebbed memories) to introduce 
block scoping and optimizations around that. In PHP, we also have function 
scope, like JavaScript, but we also have lexical scope as well. Javascript did 
not have that at the time; in other words, the following would have output 
"world":

function test() {
  console.log(hello)
  var hello = 'world'
}

test();

Today, it doesn't, and neither would php. The problems that const/let set out 
to solve in Javascript do not apply here, IMHO.

— Rob

Re: [PHP-DEV] Local constants

2024-09-04 Thread Rob Landers
On Wed, Sep 4, 2024, at 22:17, John Bafford wrote:
> On Sep 4, 2024, at 15:23, Rob Landers  wrote:
> > 
> > On Tue, Sep 3, 2024, at 05:20, HwapX wrote:
> >> Hello internals!
> >> 
> >> I was wondering, has there been any discussion about supporting local 
> >> constants (variables that cannot be reassigned, perhaps even function 
> >> parameters)?
> > 
> > Out of curiosity, what value would this bring to PHP? In my experience, 
> > modern php methods and functions tend to fit on a single screen, at most 
> > being a few hundred lines for complex logic.
> > 
> > — Rob
> 
> This is about semantic clarity. If you define a variable as a constant, then, 
> not only do you know for certain that it cannot change, you are also stating 
> "it is my intent to not change this variable after setting it", and so if 
> someone later tries to do so, they (should) have to answer the question, 
> "*why* do I need to make this variable change?".
> 
> If it's useful to be able to annotate class members as being "readonly", it 
> is likewise useful to do that on a local scope.
> 
> With sufficient compiler support, *typed* constant variable declarations 
> might also allow for this:
> 
> let SomeType $foo; //or readonly, or writeonce, or whatever
> 
> if($something) {
> $foo = ...;
> } else if($somethingElse) {
> $foo = getSomeString(); //Error, type mismatch; possibly catchable at compile 
> time, definitely with static analysis
> } else if($thirdCondition) {
> $foom = ...; //oops, typo
> } else {
> throw new Exception(...)
> }
> 
> doSomething($foo); //Compiler error: $foo not initialized on all call paths.
> }
> 
> You might also tighten scoping with such variables:
> 
> foreach(...) {
> let $foo = //local working variable; possibly even shadowing a variable in 
> the parent scope
> }
> 
> //$foo is undefined; you can't access the last-set value from the loop 
> outside of the loop
> 
> Static analysis can catch all these errors, but it'd be nicer to be able to 
> do it without requiring docblocks. The language providing tools for being 
> more explicit about intent in code is never bad.
> 
> -John

I think, conceptually, this makes sense, but maybe I'm the only one who writes

$arr = doSomething();
$arr = array_map(fn($x) => $x->prop, $arr);
$arr = array_filter($arr, $filterFunc);

instead of

$arr = array_filter(array_map(fn($x) => $x->prop, doSomething()), $filterFunc);

And I feel like having constant variables would result in more of the latter, 
which I feel is more unreadable. Though I do note that my opinion of what is 
readable might be different from other people's.

That being said, I would much rather see block-scoped variables than local 
variables via let or var or something:

var $aNumber = 12;

foreach($arr as var $item) {
  echo $item; // item exists here
}

echo $item; // item doesn't exist here

PHP is simply too verbose to really benefit from local constants, but would 
benefit from block-scope far more. For example, with local constants, you 
couldn't write that foreach because that variable exists in the scope of the 
function and can only be defined once.

— Rob

Re: [PHP-DEV] weird error when saving RFCs

2024-09-04 Thread Rob Landers
On Wed, Sep 4, 2024, at 22:26, Derick Rethans wrote:
> On 4 September 2024 21:15:55 BST, Rob Landers  wrote:
> >Hello Internals,
> >
> >I receive the following error when saving an RFC:
> >
> >There was an unexpected problem communicating with SMTP: Unexpected return 
> >code - Expected: 250, Got: 554 | 554 5.5.2 <[i:p:v:6::addr]>: Helo command 
> >rejected: invalid ip address
> >
> >>From what I can gather, whatever SMTP server it is connecting to doesn't 
> >>understand ipv6 addresses.
> >
> >— Rob
> It's best to email systems@ with this kind of issues. To diagnose this, do 
> you get this every time you save something?

Thanks! I'm not sure if I am a member of that list or not; one way to find out.

> And it really helps to have really accurate timings of when it happens as the 
> log is so noisey.
> 
> cheers
> Derick
> 

It looks like it was at 19:07 GMT, today. Thankfully, the HTML source includes 
the current time of the last update. It only appears when saving and leaving 
the "minor changes" box unchecked.

— Rob

[PHP-DEV] weird error when saving RFCs

2024-09-04 Thread Rob Landers
Hello Internals,

I receive the following error when saving an RFC:

There was an unexpected problem communicating with SMTP: Unexpected return code 
- Expected: 250, Got: 554 | 554 5.5.2 <[i:p:v:6::addr]>: Helo command rejected: 
invalid ip address

>From what I can gather, whatever SMTP server it is connecting to doesn't 
>understand ipv6 addresses.

— Rob

Re: [PHP-DEV] RFC: Deprecate json_encode() on classes marked as non-serializable

2024-09-04 Thread Rob Landers
On Tue, Sep 3, 2024, at 13:24, Philip Hofstetter wrote:
> Hello,
> 
> As per my previous email to the list, I have now created the official RFC to 
> deprecate calling json_serialize() on instances of classes marked with 
> ZEND_ACC_NOT_SERIALIZABLE.
> 
> https://wiki.php.net/rfc/deprecate-json_encode-nonserializable
> 
> I have also created a PR with the implementation here:
> 
> https://github.com/php/php-src/pull/15724
> 
> I have considered other options, both constraining the implementation to just 
> Generator and/or to add special cases for Generator (and maybe Iterator), but 
> they either continue to keep the asymmetry between serialize() and 
> json_encode() and/or are making things even more inconsistent.
> 
> Please tell me what you think, especially, if you agree that 
> blanked-deprecating all of ZEND_ACC_NOT_SERIALIZABLE classes is acceptable 
> BC-wise (after a bit of deliberation over the weekend, I think it is and most 
> json-serializations of such marked classes are probably unintentional).
> 
> Thanks in advance for all comments
> 
> Philip
> 
> 

Hello Phillip.

I think it would be good to list these non-serializable objects in the RFC. It 
doesn't have to be exhaustive, but the list includes throwables, weak-maps, 
weak-references, closures, fibers, etc. Some people may be relying on this 
behavior (and I'd be curious to know that use case myself).

I just grepped for @not-serializable in php-src stubs, which appears to be the 
only use of ZEND_ACC_NOT_SERIALIZABLE.

— Rob

Re: [PHP-DEV] Local constants

2024-09-04 Thread Rob Landers
On Tue, Sep 3, 2024, at 05:20, HwapX wrote:
> Hello internals!
> 
> I was wondering, has there been any discussion about supporting local 
> constants (variables that cannot be reassigned, perhaps even function 
> parameters)?

Out of curiosity, what value would this bring to PHP? In my experience, modern 
php methods and functions tend to fit on a single screen, at most being a few 
hundred lines for complex logic.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-09-04 Thread Rob Landers
On Wed, Sep 4, 2024, at 17:16, Jakob Givoni wrote:
> 
> 
> On Tue, Sep 3, 2024 at 11:49 PM Rob Landers  wrote:
>> 1. I've removed the BC break—the 'type' of the autoloadee will not be passed 
>> to the autoloader. This can allow someone to use spl_autoload for function 
>> autoloading if they so desire.
> 
> Unless I'm missing something, the main example in the RFC still shows a 
> function which expects the $type as the second parameter. Is that intentional?

Nope! Thanks for the heads-up; despite reading it four or five times, I missed 
this! I've now fixed it.

>  
>> 2. I've removed artificial restrictions on the constants in which all 
>> functions that take them can accept both at the same time and behave 
>> appropriately.
> 
> I'm not a big fan passing flags and using binary operations to combine 
> options into a single parameter. 
> It works, but it's opaque and old-school.
> We have both named parameters and enums now, can't we just use those going 
> forward, making each option a separate parameter or using enums with 3 cases, 
> FUNCTION, CLASS or BOTH? 

Thank you for your opinion, but for cases like this, enums is probably one of 
the worst choices IMHO. As mentioned towards the end of the RFC, I'd like to 
add further support for other things, such as constants and stream filters. 
Further, it appears that enums cannot be used in SPL (at least, I couldn't get 
it to link) due to SPL having a recursive dependency on itself. This is what 
Gina's RFC seeks to rectify by breaking autoloading out of SPL. This RFC 
focuses purely on function autoloading.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-09-03 Thread Rob Landers
On Thu, Aug 15, 2024, at 17:22, Rob Landers wrote:
> Hello internals,
> 
> I've decided to attempt an RFC for function autoloading. After reading 
> hundreds of ancient (and recent) emails relating to the topic along with 
> several abandoned RFCs from the past, and after much review, I've decided to 
> put forth a variation of a previous RFC, as it seemed the least ambitious and 
> the most likely to work:
> 
> https://wiki.php.net/rfc/function_autoloading4
> 
> Please let me know what you think. An implementation should be along before 
> opening it for a vote (now that I realize how important that is).
> 
> — Rob

Hello internals,

I've updated the RFC text and clarified some language (hopefully) while also 
addressing a number of concerns.
 1.  I've removed the BC break—the 'type' of the autoloadee will not be passed 
to the autoloader. This can allow someone to use spl_autoload for function 
autoloading if they so desire.
 2. I've removed artificial restrictions on the constants in which all 
functions that take them can accept both at the same time and behave 
appropriately.
 3. I've performed multiple benchmarks on several projects (WordPress, Symfony, 
and others from techempower) and couldn't detect a statistically significant 
performance difference with this feature in multiple configurations.
 4. Some simplified aspects of Gina's implementation for core autoloading were 
integrated into the patch. However, I see that RFC as a general refactoring of 
the API more than function autoloading. This RFC is strictly about function 
autoloading while being very conservative with the autoloading API.
While minor issues in the PR are still being worked on, I plan to open the vote 
next Wednesday, September 11th 2024, barring any significant pushback or 
objections.

— Rob

Re: [PHP-DEV] Pre-RFC Discussion: Support for String Literals as Object Properties and Named Parameters in PHP

2024-09-02 Thread Rob Landers
On Tue, Sep 3, 2024, at 02:00, Hammed Ajao wrote:
> 
> 
> On Mon, Sep 2, 2024 at 1:50 PM Rob Landers  wrote:
>> __
>> On Sun, Sep 1, 2024, at 23:47, Hammed Ajao wrote:
>>> Dear PHP internals community,
>>> I hope this email finds you all well. I'd like to propose an idea that I 
>>> believe could enhance PHP's flexibility and consistency, especially when 
>>> working with string literals. I'm looking forward to hearing your thoughts 
>>> and feedback on this proposal.
>>> Introduction
>>> I'm suggesting two enhancements to PHP that I think could make our lives as 
>>> developers a bit easier:
>>> 
>>> Support for String Literals as Object Properties
>>> Support for String Literals as Named Parameters in Function Calls
>>> 
>>> The main goal here is to reduce our reliance on arrays and provide more 
>>> intuitive ways to define and access data, particularly in scenarios like 
>>> working with HTTP headers where we often deal with non-standard characters 
>>> and strings.
>>> 1. String Literals as Object Properties
>>> Current Situation
>>> As we all know, we typically define and access object properties using 
>>> standard identifiers:
>>> ```php
>>> class Foo {
>>> public string $contentType = "application/json";
>>> }
>>> 
>>> $obj = new Foo();
>>> $obj->contentType = "text/html";
>>> ```
>>> 
>>> But when we're dealing with data that includes non-standard characters or 
>>> strings (think HTTP headers), we often end up using associative arrays:
>>> ```php
>>> $headers = [
>>> "Content-Type" => "application/json",
>>> "X-Custom-Header" => "value"
>>> ];
>>> ```
>>> 
>>> I think we can all agree that this reliance on arrays can make our code 
>>> less intuitive, especially when we're managing complex data structures.
>>> Proposed Enhancement
>>> What if we could use string literals as object property names? Here's what 
>>> I'm thinking:
>>> ```php
>>> class MyHeaders {
>>> 
>>> public function __construct(
>>> public string "Content-Type" = "application/json",
>>> public string "Cache-Control" = "no-cache, no-store, 
>>> must-revalidate",
>>> public string "Pragma" = "no-cache",
>>> public string "Expires" = "0",
>>> public string "X-Frame-Options" = "SAMEORIGIN",
>>> public string "X-XSS-Protection" = "1; mode=block",
>>> public string "X-Content-Type-Options" = "nosniff",
>>> public string "Referrer-Policy" = "strict-origin-when-cross-origin",
>>> public string "Access-Control-Allow-Origin" = "*",
>>> public string "X-Custom-Header" = "value",
>>> ) {}
>>> 
>>> public static function create(string ...$headers): self {
>>> return new self(...$headers); // Throws an error if an unknown 
>>> named parameter is passed
>>> }
>>> 
>>> public function dispatch(): void {
>>> foreach ((array) $this as $name => $value) {
>>> header("$name: $value");
>>> }
>>> }
>>> }
>>> 
>>> $headers = new MyHeaders("Content-Type": "application/json", 
>>> "X-Custom-Header": "value");
>>> // or
>>> $headers = MyHeaders::create("Content-Type": "text/html; charset=utf-8", 
>>> "X-Custom-Header": "value");
>>> $headers->dispatch();
>>> ```
>>> This would allow us to include characters in property names that aren't 
>>> typically allowed in PHP identifiers, like hyphens or spaces. I think this 
>>> could make our code more readable and aligned with natural data 
>>> representation.
>>> Benefits
>>> 
>>> Greater Flexibility: We could create more natural and direct 
>>> representations of data within objects.
>>> Enhanced Consistency: This aligns with the proposed support for string 
>>> literals as named parameters, creating a more uniform language experience.
>

Re: [PHP-DEV] Pre-RFC Discussion: Support for String Literals as Object Properties and Named Parameters in PHP

2024-09-02 Thread Rob Landers
On Sun, Sep 1, 2024, at 23:47, Hammed Ajao wrote:
> Dear PHP internals community,
> I hope this email finds you all well. I'd like to propose an idea that I 
> believe could enhance PHP's flexibility and consistency, especially when 
> working with string literals. I'm looking forward to hearing your thoughts 
> and feedback on this proposal.
> Introduction
> I'm suggesting two enhancements to PHP that I think could make our lives as 
> developers a bit easier:
> 
> Support for String Literals as Object Properties
> Support for String Literals as Named Parameters in Function Calls
> 
> The main goal here is to reduce our reliance on arrays and provide more 
> intuitive ways to define and access data, particularly in scenarios like 
> working with HTTP headers where we often deal with non-standard characters 
> and strings.
> 1. String Literals as Object Properties
> Current Situation
> As we all know, we typically define and access object properties using 
> standard identifiers:
> ```php
> class Foo {
> public string $contentType = "application/json";
> }
> 
> $obj = new Foo();
> $obj->contentType = "text/html";
> ```
> 
> But when we're dealing with data that includes non-standard characters or 
> strings (think HTTP headers), we often end up using associative arrays:
> ```php
> $headers = [
> "Content-Type" => "application/json",
> "X-Custom-Header" => "value"
> ];
> ```
> 
> I think we can all agree that this reliance on arrays can make our code less 
> intuitive, especially when we're managing complex data structures.
> Proposed Enhancement
> What if we could use string literals as object property names? Here's what 
> I'm thinking:
> ```php
> class MyHeaders {
> 
> public function __construct(
> public string "Content-Type" = "application/json",
> public string "Cache-Control" = "no-cache, no-store, must-revalidate",
> public string "Pragma" = "no-cache",
> public string "Expires" = "0",
> public string "X-Frame-Options" = "SAMEORIGIN",
> public string "X-XSS-Protection" = "1; mode=block",
> public string "X-Content-Type-Options" = "nosniff",
> public string "Referrer-Policy" = "strict-origin-when-cross-origin",
> public string "Access-Control-Allow-Origin" = "*",
> public string "X-Custom-Header" = "value",
> ) {}
> 
> public static function create(string ...$headers): self {
> return new self(...$headers); // Throws an error if an unknown named 
> parameter is passed
> }
> 
> public function dispatch(): void {
> foreach ((array) $this as $name => $value) {
> header("$name: $value");
> }
> }
> }
> 
> $headers = new MyHeaders("Content-Type": "application/json", 
> "X-Custom-Header": "value");
> // or
> $headers = MyHeaders::create("Content-Type": "text/html; charset=utf-8", 
> "X-Custom-Header": "value");
> $headers->dispatch();
> ```
> This would allow us to include characters in property names that aren't 
> typically allowed in PHP identifiers, like hyphens or spaces. I think this 
> could make our code more readable and aligned with natural data 
> representation.
> Benefits
> 
> Greater Flexibility: We could create more natural and direct representations 
> of data within objects.
> Enhanced Consistency: This aligns with the proposed support for string 
> literals as named parameters, creating a more uniform language experience.
> Simplification: It could reduce our need for associative arrays, which can be 
> more error-prone and less intuitive.
> 
> 2. String Literals as Named Parameters in Function Calls
> If we're going to use string literals as object properties, it makes sense to 
> also use them as named parameters, especially in constructors with promoted 
> properties. And why stop at constructors? This leads to the second part of my 
> proposal.
> Current Situation
> We can use named parameters in function calls, but only with standard 
> identifiers:
> ```php
> function myHeaders(...$args) {
> foreach ($args as $key => $value) header("$key: $value");
> }
> ```
> 
> To use string literals with special characters, we have to use associative 
> arrays:
> ```php
> myHeaders(...["Content-Type" => "application/json"]);
> ```
> 
> This can be a bit cumbersome and less readable, especially for complex data 
> structures.
> Proposed Enhancement
> What if we could use string literals as named parameters? It might look 
> something like this:
> ```php
> foo("Content-Type": "application/json");
> ```
> 
> I think this syntax could offer several advantages:
> 
> Improved Readability: Our code could become clearer and more aligned with 
> natural data representation.
> Automatic Parameter Mapping: We could map string literals to corresponding 
> parameters without requiring manual intervention.
> Simplified Constructor Usage: This could be especially beneficial for 
> constructors where we need to pass complex data structures directly.
> 
> Impl

Re: [PHP-DEV] Re: [RFC] Default expression

2024-09-01 Thread Rob Landers
On Sun, Sep 1, 2024, at 14:39, Rowan Tommins [IMSoP] wrote:
> On 29/08/2024 22:52, Bilge wrote:
> > On 24/08/2024 17:49, Bilge wrote:
> >>
> >> New RFC just dropped: https://wiki.php.net/rfc/default_expression. I 
> >> think some of you might enjoy this one. Hit me with any feedback.
> >>
> > Now the dust has settled, I've updated the RFC to version 1.1. The 
> > premise of the RFC is unchanged, but the proposal has been expanded 
> > and a discussion section added to summarise the ~100 message thread to 
> > capture the major concerns raised in a condensed format. I hope I've 
> > done a good job of fairly and accurately representing your concerns, 
> > but if not please correct me.
> 
> 
> As promised, I have written up a full explanation of the type safety 
> issues here: https://wiki.php.net/rfc/default_expression/type_safety
> 
> I have tried to write this as a neutral description of the problem and 
> the possible approaches we could take, to be inserted directly into the 
> current RFC, rather than as a counter-opinion or a narrative of who said 
> what.
> 
> I have included the 4 options which I believe are the only ones we have; 
> it is then a matter of opinion which we think is best. For the record, 
> my opinion remains that option 3 (limit to conditional expressions) is 
> preferable, but I have assumed the RFC will continue to advocate for 
> option 1 (allow any expression and assume problems will be rare).
> 
> I hope I have explained it clearly enough this time to overcome the 
> previous misunderstandings of where the issue lies.
> 
> Regards,
> 
> -- 
> Rowan Tommins
> [IMSoP]
> 

Thank you Rowan,

I wasn't following the discussion closely and didn't realize this was the 
issue. Thank you for taking the time to describe it.

For option 1: 

Is manually copying the default also not type-safe? Is php a type-safe 
language? I think a lot of the arguments I saw suggested that people don't 
review libraries and their implementations when upgrading or installing them. 
This is just a shorthand for manually copy-pasting the default from other code, 
and this argument really only makes sense to me if there are no reviews before 
upgrading/using a library.

For option 3:

That being said, this is obviously playing with fire, and there will be people 
who (ab)use this and get burned; especially if they don't do due-diligence 
before using libraries. Thus a restriction may make a lot of sense; at least 
keeping it to the most obvious use cases should prevent the worst case 
scenarios imagined in this thread.

Realistically, I think we should only consider option (1) or (3). Option (3) -- 
if it can be done -- is the more conservative approach, and we can observe how 
it is used. We can always relax the feature in the future, based on feedback.

— Rob

Re: [PHP-DEV] What to do with ext/snmp?

2024-08-31 Thread Rob Landers
On Fri, Aug 30, 2024, at 20:13, Christoph M. Becker wrote:
> On 30.08.2024 at 19:05, Jim Winstead wrote:
> 
> [snip]
> 
> And generally, while there are many well maintained extensions on PECL,
> most (i.e. way more than half of the extension there) are outright
> abandoned, dead or half-dead, a lot of the latter barely kept alive by
> Remi Collet.  A next generation PECL (installer) will not change this;
> only people who actively care about these extension could, if these
> people have knowledge of PHP extension development.
> 
> I'm not saying that all PECL extensions deserve to be kept alive; there
> are good reasons for many to have been abandoned, e.g. because they were
> built on no longer supported libraries, are generally not useful
> anymore, or would be written in PHP nowadays (e.g. ext/dbase).
> 
> Instead I'm saying that we should be careful to unbundle extensions.
> This should probably seen as a last resort if we absolutely can't
> maintain the extension any longer, or it doesn't make sense to do that.
> I'm not sure yet that ext/snmp falls into this category.
> 
> It's easy to vote "yes, unbundle this extension" if you've never used
> the extension and are not planning to do so in the future.  It may be a
> death sentence, though.
> 
> Christoph
> 

I went over to pecl to see how hard it was to create a new extension (after 
being prompted by Gina to submit my GMP operator stuff as a pecl extension). It 
appears to be very involved with a checkmark:

"I have already discussed the topic of maintaining and/or adding a PECL 
extension on the pecl-...@lists.php.net mailing list, and we determined it's 
time for me to have a PECL account"

I, personally, can't imagine going through such a process. Not only do you have 
to convince gate-keepers you don't know to share your extension with (which 
higher up on the page says your code should be complete), but also convince 
end-users to use your extension. The barrier of entry is high, when it would be 
much easier to just create a repository and instructions; effectively making 
discovery of interesting php extensions nearly impossible.

If you are using something like Docker containers, there is 
https://github.com/mlocati/docker-php-extension-installer which go so far as to 
install extensions from github (and apply patches) if they aren't 
updated/available in pecl (example: memcached + php 8 had an issue that was 
fixed on github but didn't get an update on pecl for nearly a year, IIRC).

I'm pretty excited about pecl's replacement (how is that going anyway?) and 
hope it will be easier to create, maintain, and distribute extensions with.

In other words, I emphatically agree that moving extensions out of core and 
into pecl would be a death sentence.

— Rob

Re: [PHP-DEV] [RFC] [Discussion] Using and Mentioning Third-party Packages in PHP Documentation and Web Projects

2024-08-28 Thread Rob Landers
On Wed, Aug 28, 2024, at 09:51, John Coggeshall wrote:
> 
>> And that is how you will find that the "new" languages will "win". If we
>> don't promote how modern PHP Development works, then there will be more
>> "PHP, a fractal of bad design" articles to come for decades.
>> 
>> We *must* do better than this. It probably doesn't all need to be in the
>> documentation (doc-en), but it absolutely belongs on our website.
> 
> Hear Hear Derick!!
> 
> I am not advocating that php.net put its finger on the scale in favor of 
> Laravel over others with this comment, but why php.net does not have a 
> documentation analog similar to how Laravel's documentation is set up is 
> beyond me. Useful installation instructions, sections on "How do I do 
> database stuff", "Security", "Filtering Data", "Installing third party 
> packages" etc... there are too many people who have embedded in their brains 
> that PHP is a badly designed language because we don't teach or even 
> advertise to people how to write good PHP code... as others have mentioned as 
> an example, the lack of even a mention of composer on php.net is mind-blowing.
> 
> As Derick said, back 20+ years ago PHP had amazing documentation for the 
> times -- miles ahead IMO than most open source projects. But the world has 
> moved on, developers want and need higher level documentation that is more 
> opinionated on not just the dry APIs available you might use to connect to a 
> database (for example), but how to properly connect to a database. Back 20 
> years ago we had companies like Zend around who devoted considerable 
> resources to filling that gap for the community (along with O'Reilly, etc.) 
> but those entities are gone now and it is up to the project to pick up the 
> slack.
> 
> I also think it's a mistake to get too caught up with the concept of 
> "endorsements" and people worrying that "oh gosh if php.net talks about 
> Laravel and Zend Framework then that means something bad for XYZ framework" 
> (pick your favoriate techs here). It's easily solved by having a section on 
> "Popular PHP Frameworks" that explains the concept that PHP as a language 
> doesn't embrace any particular framework, the importance that you do 
> generally want to embrace a framework to do anything serious, and provide a 
> list of popular ones that people commonly turn to when building their apps. 
> As for using a framework or any other PHP-related tech in the project's 
> codebases... I think we're grossly overestimating how much weight that 
> decision would carry with the PHP community at large. Short of the PHP 
> Project stating "X is the official framework of PHP" (and especially if we 
> say "We don't have an official framework but here are good options that are 
> very popular" instead), the concern over the appearance of endorsements at 
> this point is really an invented issue rooted at least in part by historic 
> concerns that simply don't exist anymore.
> 
> Coogle

I agree with this to a point. What if I want my newish framework listed on the 
page? What are the qualifications for being listed (or unlisted) there? Can 
anyone add their own framework?

If anyone can add something to the list, then it eventually will become as 
overwhelming as https://github.com/uhub/awesome-php and if there are strict 
qualification requirements, the list needs to be reviewed periodically to 
remove projects that no longer meet those criteria.

— Rob

Re: [PHP-DEV] [RFC] [Discussion] Using and Mentioning Third-partyPackages in PHP Documentation and Web Projects

2024-08-28 Thread Rob Landers


On Wed, Aug 28, 2024, at 03:46, Jim Winstead wrote:
> On Tue, Aug 27, 2024, at 4:46 AM, Christoph M. Becker wrote:
> > On 27.08.2024 at 07:03, Andreas Heigl wrote:
> >
> >> I see this a bid differently to be honest. While I understand that using
> >> third party packages in our internal tools might make things easier in
> >> the short term it will cause a lot or additional work in the long term.
> >>
> >> Currently we have a lot of small scripts that do one thing. And they do
> >> it for a long time with very little maintenance effort. Blowing these
> >> scripts up with third-party libraries will mean that we will have to put
> >> in much more maintenance effort for either keeping the dependencies up
> >> to date or mostly rewriting the script the moment a change needs to
> >> happen as the libraries will be outdated.
> >>
> >> There are though some actual console applications like Pdt where it
> >> might be valid to use third party dependencies. But also here I'd ask
> >> whether the maintainability will be increased. A totally different
> >> question though is whether we actually need to maintain a special tool
> >> for building the docs or whether we can use a pre-existing tool for
> >> that. I am mainly thinking about either phpDocumentor or a default
> >> docbook renderer. But that is a totally differnt topic IMO.
> >>
> >> So I'd see this exactly the other way around:
> >>
> >> usage for infra needs very careful consideration to not increase the
> >> maintenance-burden on those that actually 'do' the maintenance.
> >
> > Well, the RFC is not about that projects *should* use some tools or
> > libraries or frameworks, but rather that they *can* choose to do so if
> > they deem it valuable.  Of course, the projects should not only look at
> > the short term benefit, but also on the long term maintenance burden.
> 
> This is the nut of it, really. The truth is that we are using and referring 
> to third-party PHP projects within the websites and documentation as has been 
> noted elsewhere in the thread. To say that this RFC should be a choice 
> between what I have proposed and it's opposite is to potentially create a lot 
> of work in ripping out those dependencies that have crept in over the years 
> if we really want the policy to be the opposite.
> 
> By saying that the web and documentation projects "can use or document the 
> existence of third-party PHP projects" it's not saying they will, but that 
> the decision making about that will be returned to those actually working on 
> those parts of the project and not subject to the current quasi-policy that 
> "we don't do that except when we do".
> 
> If someone wants to contribute a chapter to the documentation that says 
> "Here's what a framework is in PHP and here are a few examples of popular 
> ones", the people writing and translating the documentation should hash that 
> out. It shouldn't require an RFC, an argument on this list, and a vote among 
> people who aren't writing and translating the documentation. (Especially 
> because there are people whose sole or main contribution to the PHP project 
> is that documentation work and not to php-src and they don't even get to 
> vote!)
> 
> If someone wants to build an RFC tool for the project using some Composer 
> libraries or even a framework, the people who have taken on the 
> responsibility for maintaining the project websites and infrastructure should 
> hash that out. It shouldn't require an RFC, an argument on this list, and a 
> vote among people who aren't going to be working on it. (Especially because 
> there are people whose sole or main contribution to the PHP project is to 
> maintaining the websites and/or infrastructure and not to php-src and they 
> don't even get to vote!)
> 
> Jim
> 

As far as RFC tools go, if you are more familiar with GitHub-flavored Markdown 
(gfmd) and find the RFC Markdown somewhat hard to use, I have 
https://github.com/bottledcode/RFC which you can fork and write an RFC draft in 
gfmd which it will convert to the wiki format. I find it much easier this way 
because I have other tooling dedicated to working with gfmd (layout previewers, 
synced editors between my phone/computer, grammar highlighting, etc).

It would be interesting to have this in the php-src org as a more accessible 
way to draft RFCs and have it synced to the wiki. That could also be a place 
for technical reviews (not feature reviews, which would continue to exist here).

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-27 Thread Rob Landers
On Tue, Aug 27, 2024, at 22:15, Bilge wrote:
> On 26/08/2024 23:24, Rob Landers wrote:
>> On Sun, Aug 25, 2024, at 22:28, Gina P. Banyard wrote:
>> 
>>> For the past 2–3 months, you have sent the vast majority of emails on this 
>>> list, this is not what I would consider normal
>> 
>> To understand just how bad I was breaking this rule, I created 
>> https://email.catcounter.guru/ for anyone on the list to see where they 
>> currently stand with their post-ratio in comparison to others. It is updated 
>> every two hours, and you can enter an email address in the top-right to 
>> unmask an email address, otherwise the email addresses are anonymous.
> lol, nice. I somehow only managed to guess the top two. I think it's bugged. 
> No way I can't figure out another in the top 5 😏
> 
> At least you found something to keep yourself busy!
> 
> 
> 
> Cheers,
> Bilge
> 
> 

Hey Bilge,

You are in the top 10 for the day ;) but not yet past a 10% posting ratio at 
some point in the range, which is how to end up on the graph. Otherwise it gets 
too noisy with very many people only sending in one or two emails per day. If 
you change the window to 7 days, you should show up since you've been pretty 
active in the last week. 

If you get involved more than 14 days straight, you are almost guaranteed to 
appear on that graph. If you have an RFC, you are almost guaranteed to end up 
on that graph (due to replying to many people). If you get into an argument at 
some point, you are almost guaranteed to end up on that graph. If you do all 
three ... well, due to my GMP RFC 
(https://wiki.php.net/rfc/operator_overrides_lite), I am the top poster on the 
list for the last 3 months, by a huge margin. It was very unpopular and I 
fought hard. (https://email.catcounter.guru/?range=52&window=80)

> At least you found something to keep yourself busy!

I have more than enough to do; just not enough time. :D Pretty standard 
problems. The last week of the month is pretty hectic, and I usually get little 
time to read emails and reply to them until the next month.

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-26 Thread Rob Landers
On Sun, Aug 25, 2024, at 22:28, Gina P. Banyard wrote:
> On Friday, 23 August 2024 at 23:55, Rob Landers  wrote:
>> On Fri, Aug 23, 2024, at 23:06, Larry Garfield wrote:
>>> 
>>> With generics, the syntax isn't the hard part.  The hard part is type 
>>> inference, or accepting that generic-using code will just be 
>>> extraordinarily verbose and clumsy.  There is (as I understand from Arnaud, 
>>> who again can correct me if I'm wrong) not a huge amount of difference in 
>>> effort between supporting only Foo and supporting Foo>.  The 
>>> nesting isn't the hard part.  The hard part is not having to type Foo 
>>> 4 times across 2 files every time you do something with generics.  If that 
>>> can be resolved satisfactorily (and performantly), then the road map to 
>>> reified generics is reasonably visible.
>> 
>> Ok. But wasn't there something about nesting causing super-linear 
>> performance issues? So, disable nesting and don't worry about inference.
>> [...]
>> Ah, this is what I was thinking of. Thank you. Yeah, instead of "nesting" 
>> prior, I was referring to union types.
> 
> Rob, with all the kindness I can give, please condense your emails to have a 
> semblance of sense.
> This is not a bar where you are having a one on one conversation.
> You are sending emails to thousands of people on a mailing list that can read 
> you.
> It would be appreciated if you could go over everything you read, digest the 
> content, and then form a reply.
> Or at the minimum, if you realize that a previous remark you made does not 
> apply, redraft the email.
> And possibly even sit on it for a bit before sending it, as you routinely 
> come up with a point you forgot to include in your email.
> 
> Reading the mailing list is an exhausting task, especially when the volume is 
> excessive.
> As a reminder to everyone, we have rules: 
> https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
> 
> However, in your case, please note the following rule:
> 
>> If you notice that your posting ratio is much higher than that of other 
>> people, double-check the above rules. Try to wait a bit longer before 
>> sending your replies to give other people more time to digest your answers 
>> and more importantly give you the opportunity to make sure that you 
>> aggregate your current position into a single mail instead of multiple ones.
> 
> For the past 2–3 months, you have sent the vast majority of emails on this 
> list, this is not what I would consider normal nor expected for your level of 
> "seniority" (for the lack of better word) on the project.
> This is not to say to stop posting and replying, just to do it in a more 
> conscious manner for the rest of us reading you.
> 
> Best regards,
> 
> Gina P. Banyard
> 
>> 

Hi Gina!

I hope this email finds you well. Sincerely, thank you for your feedback; it's 
clear that you are addressing this issue with the best intentions.

I want to say that I understand the importance of this rule and keeping the 
mailing list conversations relevant, especially given the large audience. I 
want to also acknowledge that I have occasionally responded quickly without 
fully considering the impact on readability. Moving forward, I will make a 
conscious effort to ensure my emails are more thoroughly reviewed.

Regarding your point about condensing emails, I see where you are coming from. 
However, my approach has been to respond within the same thread to maintain 
context, which I believe helps keep the discussion more organized for threaded 
readers. I understand that there is probably a balance there and will be more 
mindful in the future.

> For the past 2–3 months, you have sent the vast majority of emails on this 
> list, this is not what I would consider normal

To understand just how bad I was breaking this rule, I created 
https://email.catcounter.guru/ for anyone on the list to see where they 
currently stand with their post-ratio in comparison to others. It is updated 
every two hours, and you can enter an email address in the top-right to unmask 
an email address, otherwise the email addresses are anonymous.

Best regards,

Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers
On Sun, Aug 25, 2024, at 20:46, Rowan Tommins [IMSoP] wrote:
> On 25/08/2024 18:44, John Bafford wrote:
> 
>> Although I'm not sold on the idea of using default as part of an 
>> expression, I would argue that a default function parameter value is 
>> fair game to be read and manipulated by callers. If the default value 
>> was intended to be private, it shouldn't be in the function declaration.
> 
> 
> There's an easy argument against this interpretation: child classes can 
> freely change the default value for a parameter, as long as they do not make 
> it mandatory. https://3v4l.org/SEsRm
> 
> That matches my intuition: that the public API, as a contract, states that 
> the parameter is optional; the specification of what happens when it is not 
> provided is an implementation detail.
> 
> For comparison, consider constructor property promotion; the caller shouldn't 
> know or care whether a class is defined as:
> 
> public function __construct(private int $bar) {}
> 
> or:
> 
> 
> private int $my_bar;
> public function __construct(int $bar) { $this->my_bar = $bar; }
> 
> The syntax sits in the function signature because it's convenient, not 
> because it's part of the API.
> 
> 
> 
>> One important case where reading the default value could be important is
>>  in interoperability with different library versions. For example, a 
>> library might change a default parameter value between versions. If 
>> you're using the library, and want to support both versions, you might 
>> both not want to set the value, and yet also care what the default value
>>  is from the standpoint of knowing what to expect out of the function.
> 
> 
> This seems contradictory to me. If you use the default, you're telling the 
> library that you don't care about that parameter, and trust it to provide a 
> default.
> 
> If you want to know what the library did with its arguments, reflecting the 
> signature will never be enough anyway. For example, it's quite common to 
> write code like this:
> 
> 
> function foo(?SomethingInterface $blah = null) {
> if ( $blah === null ) {
> $blah = self::_setup_default_blah();
> }
> // ...
> }
> 
> A caller can't tell by looking at the signature that a new version of the 
> library has changed what _setup_default_blah() returns. If the library 
> doesn't provide an API to get $blah out later, then it's a private detail 
> that the caller has no business inspecting.
> 
> 
> 
> Regards,
> 
> -- 
> Rowan Tommins
> [IMSoP]

I think you've hit an interesting point here, but probably not what you 
intended.

For example, let's consider this function:

json_encode(mixed $value, int $flags = 0, int $depth = 512): string|false

Already, you have to look up the default value of depth or set it to something 
that makes sense, as well as $flags. So you do this:

json_encode($value, JSON_THROW_ON_ERROR, 512);

You are doing this even when you omit the default. If you set it to a variable 
to spell it out:

$default_flags = 0 | JSON_THROW_ON_ERROR;
$default_depth = 512; // according to docs on DATE

json_encode($value, $default_flags, $default_depth);

Can now be rewritten:

json_encode($value, $default_flags = default | JSON_THROW_ON_ERROR, 
$default_depth = default);

This isn't just reflection, this is saving me from having to look up the 
docs/implementation and hardcode values. The implementation is free to change 
them, and my code will "just work."

Now, let's look at a more non-trivial case from some real-life use-cases, in 
the form of a plausible story:


public function __construct(
private LoggerInterface|null $logger = null,
private string|null $name = null,
Level|null $level = null,
)

This code constructs a new logger composed from an already existing logger. 
When constructing it, I may look up what the default values are and decide if I 
want to override them or not. Otherwise, I will leave it as null.

A coworker and I got to talking about this interface. It kind of sucks, and we 
don't like it. It's been around for ages, so we are worried about changing it. 
Specifically, we are wondering if we should use SuperNullLogger as the default 
instead of null (which happens to just create a NullLogger a few lines later). 
We are pretty sure making this change won't cause any issues, but to be extra 
safe, we will do it only on a single code path; further, we are 100% sure we 
are going to change this signature, so we need to do it in a forward-compatible 
way. Thus, we will set it to SuperNullLogger if-and-only-if the default value 
is null:

default ?? new SuperNullLogger()

Now, we can run this in production and see how well it performs. Incidentally, 
we discover that NullLogger implementation is superior and we can now change 
the default:

public function __construct(
private LoggerInterface $logger = new NullLogger(),
private string|null $name = null,
Level|null $level = null,
)

That one code path "magically" updates as soon as the library is 

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers


On Sun, Aug 25, 2024, at 18:21, Rowan Tommins [IMSoP] wrote:
> On 25/08/2024 16:54, Rob Landers wrote:
> > Hi Rowan, you went through a lot of trouble to write this out, and the 
> > reasoning makes sense to me. However, all the nonsensical things you 
> > say shouldn’t be allowed are already perfectly allowed today, you just 
> > have to type a bunch of boilerplate reflection code. There is no new 
> > behavior here, just new syntax. 
> 
> 
> Firstly, your response to John was essentially "please give more 
> details" [https://externals.io/message/125183#125214], and your response 
> to me is "thanks for the details, but I'm not going to engage with 
> them". That's a bit frustrating.

Oh, my apologies! That wasn’t my intention! With John and yourself, I do agree 
with you. I’m just trying to understand the logic in limiting it. As in, “I 
intuitively feel the same way but I don’t know why but maybe you do.” Intuition 
sucks sometimes. 

> 
> Secondly, I don't think "it's possible with half a dozen lines of 
> reflection, so it's fine for it to be a first-class feature of the 
> language syntax" is a strong argument. The Reflection API is a bit like 
> the Advanced Settings panel in a piece of software, it comes with a big 
> "Proceed with Caution" warning. You only move something from that 
> Advanced Settings panel to the main UI when it's going to be commonly 
> used, and generally safe to use. I don't think allowing arbitrary 
> operations on a value that's declared as the default of some other 
> function passes that test.
> 
> Regards,
> 
> -- 
> Rowan Tommins
> [IMSoP]
> 

That makes sense, but is it uncommon because it is hard and slow, or because it 
is genuinely not a common need?

— Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers


On Sun, Aug 25, 2024, at 17:31, Rowan Tommins [IMSoP] wrote:
> On 25/08/2024 14:35, Larry Garfield wrote:
>> My other concern is the list of supported expression types.  I 
>> understand how the implementation would naturally make all of those 
>> syntactically valid, but it seems many of them, if not most, are 
>> semantically nonsensical.
> 
> 
> I tend to agree with Larry and John that the list of operators should be 
> restricted - we can always allow more in future, but restricting later is 
> much harder.
> 
> A few rules that seem logical to me:
> 
> 1) The expression should be reasonably guaranteed to produce the same type as 
> the actual default.
> 
> 
> - No casts
> - No comparison operators, because they produce booleans from non-boolean 
> input
> - No "<=>". Technically, it has an integer result, but it's rare to use it as 
> one, rather than a kind of three-value boolean
> - No "instanceof"
> - No "empty"
> 
> 2) The expression should not have side effects (outside of exotic operator 
> overloads).
> 
> 
> - No "include", "require", etc
> - No "throw"
> - No "print"
> - Borderline, but I would also say no "clone"
> 
> 3) The expression should be passing additional information into the function, 
> not pulling information out of it. The syntax shouldn't be a way to write 
> obfuscated reflection, or invert data flow from callee to caller.
> 
> 
> - No assignments.
> - No ternaries with "default" on the left-hand side - "$foo ? $bar : default" 
> is acting on local knowledge, but "default ? $foo : $bar" is acting on 
> information the caller shouldn't know
> - Same for "?:" and "??"
> - No "match" with "default" as the condition or branch, for the same reason. 
> "match($foo) { $bar => default }" is fine, match(default) { ... }" or 
> "match($foo) { default => ... }" are not.
> 
> Note that these can be seen as aspects of the same rule: the aim of the 
> expression should be to transform the default value into another value of the 
> same type, not to pull it out and perform arbitrary operations based on it.
> 
> 
> 
> I believe that leaves us with:
> 
> 
> - Arithmetic operators: binary + - * / % **, unary + -
> - Bitwise operators: & | ^ << >>  ~
> - Boolean operators: && || and or xor !
> - Conditions with default on the RHS: $foo ? $bar : default, $foo ?: default, 
> $foo ?? default, match($foo) { $bar => default }
> - Parentheses: (((default)))
> 
> 
> 
> Even then, I look at that list and see more problems than use cases. As the 
> RFC points out, library authors already worry about the maintenance burden of 
> named argument support, will they now also need to question whether someone 
> is relying on "default + 1" having some specific effect?
> 
> Maybe we should instead require justification for each addition:
> 
> 
> - Bitwise | is nicely demonstrated in the RFC
> - Bitwise & could probably be justified on similar grounds
> - "$foo ? $bar : default" is discussed in the RFC
> - The other "conditions with default on the RHS" in my shortlist above fit 
> the same basic use case
> 
> Beyond that, I'm struggling to think of meaningful uses: "whatever the 
> function sets as its default, do the opposite"; "whatever number the function 
> sets as default, raise it to the power of 3"; etc. Again, they can easily be 
> added in later versions, if a use case is pointed out.
> 
> 
> 
> Regards,
> 
> -- 
> Rowan Tommins
> [IMSoP]

Hi Rowan, you went through a lot of trouble to write this out, and the 
reasoning makes sense to me. However, all the nonsensical things you say 
shouldn’t be allowed are already perfectly allowed today, you just have to type 
a bunch of boilerplate reflection code. There is no new behavior here, just new 
syntax. 

— Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers


On Sun, Aug 25, 2024, at 16:58, John Coggeshall wrote:
> 
> 
>> If the underlying API changes the argument type, consumers will have an 
>> issue regardless. For those cases where the expression is simply `default`, 
>> you'd actually be protected from the API change, which is a net benefit 
>> already. 
>> 
>> This also protects the user from changes in the argument names. 
> 
> As I said, I don't have a particular problem with `default`  as a keyword to 
> express "whatever the default value might be in the function declaration", 
> but I do have some real concerns about its use as an operand in an 
> expression. The RFC provides for a single valid use case of operators (i.e. 
> things like `default | JSON_PRETTY_PRINT` ), yet calls for a huge array of 
> valid operations,  many of which the RFC itself notes don't make much / any 
> sense. I'd personally like to see this RFC dramatically reduce the scope of 
> operations supported with `default`  as an operand initially (e.g. perhaps 
> only bitwise ops), and revisit additional operations as needed down the road. 
> IMO there is a very small subset of all PHP operators that make any sense at 
> all in this context, and even fewer that I think are a good idea to allow 
> even if they might make some sort of sense.

Which operants don’t make sense?

— Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers


On Sun, Aug 25, 2024, at 15:35, Larry Garfield wrote:
> On Sat, Aug 24, 2024, at 11:49 AM, Bilge wrote:
> > Hi gang,
> >
> > New RFC just dropped: https://wiki.php.net/rfc/default_expression. I 
> > think some of you might enjoy this one. Hit me with any feedback.
> >
> > This one already comes complete with working implementation that I've 
> > been cooking for a little while. Considering I don't know C or PHP 
> > internals, one might think implementing this feature would be 
> > prohibitively difficult, but considering the amount of help and guidance 
> > I received from Ilija, Bob and others, it would be truer to say it would 
> > have been more difficult to fail! Huge thanks to them.
> >
> > Cheers,
> > Bilge
> 
> I am still not fully sold on this, but I like it a lot better than the 
> previous attempt at a default keyword.  It's good that you mention named 
> arguments, as those do replace like 95% of the use cases for "put default 
> here" in potential function calls, and the ones it doesn't, you call out 
> explicitly as the justification for this RFC.
> 
> The approach here seems reasonable overall.  The mental model I have from the 
> RFC is "yoink the default value out of the function, drop it into this 
> expression embedded in the function call, and let the chips fall where they 
> may."  Is that about accurate?
> 
> My main holdup is the need.  I... can't recall ever having a situation where 
> this is something I needed.  Some of the examples show valid use cases (eg, 
> the "default plus this binary flag" example), but again, I've never actually 
> run into that myself in practice.

Potentially the most useful place would be in attributes. Take crell\serde (:p) 
for instance:

#[SequenceField(implodeOn: default . ' ', joinOn: ' ' . default . ' ')]

Where you may just want it to be a little more readable, but aren't interested 
in the default implosion. In attributes, it has to be a static expression and I 
think this passes that test? At least that is one place I would find most 
useful.

Then there are things like the example I gave before, where you need to call 
some library code as library code and pass through the intentions. It also gets 
us one step closer to something like these shenanigans:

function configureSerializer(Serde $serializer = new SerdeCommon(formatters: 
default as $formatters));

Where we can call configureSerializer(formatters: new JsonStreamFormatter()).

Some pretty interesting stuff.

> 
> My other concern is the list of supported expression types.  I understand how 
> the implementation would naturally make all of those syntactically valid, but 
> it seems many of them, if not most, are semantically nonsensical.  Eg, 
> `default > 1` would take a presumably numeric default value and output a 
> boolean, which should really never be type compatible with the function being 
> called.  (A param type of int|bool is a code smell at best, and a fatal 
> waiting to happen at worst.)  In practice, I think a majority of those 
> expressions would be logically nonsensical, so I wonder if it would be better 
> to only allow a few reasonable ones and block the others, to keep people from 
> thinking nonsensical code would do something useful.

I'm reasonably certain you can write nonsensical PHP without this feature. I 
don't think we should be the nanny of developers.

— Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers


On Sun, Aug 25, 2024, at 12:01, Bilge wrote:
> On 25/08/2024 10:49, Juliette Reinders Folmer wrote:
> > (resending as I accidentally originally send a private reply instead 
> > of sending the below to the list)
> >
> > On 24-8-2024 18:49, Bilge wrote:
> >> Hi gang,
> >>
> >> New RFC just dropped: https://wiki.php.net/rfc/default_expression. I 
> >> think some of you might enjoy this one. Hit me with any feedback.
> >>
> >> This one already comes complete with working implementation that I've 
> >> been cooking for a little while. Considering I don't know C or PHP 
> >> internals, one might think implementing this feature would be 
> >> prohibitively difficult, but considering the amount of help and 
> >> guidance I received from Ilija, Bob and others, it would be truer to 
> >> say it would have been more difficult to fail! Huge thanks to them.
> >>
> >> Cheers,
> >> Bilge
> >>
> >
> > Hi Bilge,
> Hi :)
> > I like the idea, but see some potential for issues with ambiguity, 
> > which I don't see mentioned in the RFC as "solved".
> >
> > Example 1:
> > ```php
> > function foo($paramA, $default = false) {}
> > foo( default: default ); // <= Will this be handled correctly ?
> > ```
> 
> No, but not because of my RFC, but because $paramA is a required 
> parameter that was not specified. Assuming that was just a typo, the 
> following works as expected:
> 
> function foo($paramA = 1, $default = false) {
> var_dump($default);
> }
> foo(default: default); // bool(false)
> 
> > Example 2:
> > ```php
> > callme(
> > match($a) {
> > 10 => $a * 10,
> > 20 => $a * 20,
> > default => $a * default, // <= Based on a test in the PR this 
> > should work. Could you confirm ?
> > }
> > );
> > ```
> Yes.
> > Example 3:
> > ```php
> > switch($a) {
> > case 'foo':
> > return callMe($a, default); // I presume this shouldn't be a 
> > problem, but might still be good to have a test for this ?
> > default:
> > return callMe(10, default); // I presume this shouldn't be a 
> > problem, but might still be good to have a test for this ?
> > }
> > ```
> Yes.
> > On that note, might it be an idea to introduce a separate token for 
> > the `default` keyword when used as a default expression in a function 
> > call to reduce ambiguity ?
> Considering the Bison grammar compiles, I believe there can be no 
> ambiguity. I specifically picked `default` because I think it is the 
> most intuitive keyword to use for this, and it's conveniently already a 
> reserved word.

Other tools parse the tokens directly (for example, I have a tool to take php 
classes and convert them to graphql specifications. It parses the tokens 
emitted the tokenization extension), and having "default" tokens in unexpected 
places presents an ambiguity and BC break for those tools. By having a token 
(DEFAULT_PARAM_VALUE) or something to disambiguate might be better. I had 
assumed it was a separate token when I first read it, so this is a good point.

— Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-25 Thread Rob Landers


On Sun, Aug 25, 2024, at 04:41, Mike Schinkel wrote:
> 
> 
>> On Aug 24, 2024 at 5:16 PM, mailto:rob@bottled.codes>> wrote:
>> I'm not sure what you mean here. I use this method all the time :) much to 
>> the chagrin of some of my coworkers.
>> 
>> function stuff($foo = 'bar', $baz = 'world');
>> 
>> stuff(...[ ...($foo ? ['foo' => $foo] : []), ...($baz ? ['baz' => $baz] : 
>> [])]);
> 
> And you are one who complains about gotos!  😲
> 
> -Mike

Haha, there is a difference between production/professional code and internal 
tools. Internal tools are a place to experiment and have a little fun, IMHO.

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Rob Landers


On Sun, Aug 25, 2024, at 00:58, Rowan Tommins [IMSoP] wrote:
> 
> 
> On 24 August 2024 19:16:13 BST, Stephen Reay  wrote:
> >
> >> On 25 Aug 2024, at 00:01, Ilija Tovilo  wrote::
> >> 
> >> 1. Flipping lookup order: ~a few dozens of changes
> >> 2. Global only: ~3 000 changes
> >> 3. Local only: ~139 000 changes
> 
> >There's also an impact on internals development/RFC with either (1) or (2): 
> >*any* proposed new global function in the standard library now has a BC 
> >barrier to pass if it *might* conflict with one defined by anyone in 
> >userland, in any namespace. 
> 
> Just to correct this point, this is only a problem for option 1. With option 
> 2, if the function exists only in a namespace, you have to unambiguously 
> refer to that namespaced name (via whatever syntax), and there is no chance 
> of collision with a global name in future. For me, that's one of the biggest 
> advantages of that option. 
> 
> And to repeat myself: I agree that option 3 is the ideal in theory, but in 
> practice the short/medium term impact is so big, I'm not convinced it's worth 
> it for the long term gain.
> 
> Rowan Tommins
> [IMSoP]
> 

It may also be worth looking at it a different way: if functions didn’t exist 
today (such as a 100% OOP language) and we wanted to implement functions: how 
would we implement it?

If it is different than what we have today, the short term impact is almost 
always worth it; whatever it is. And just to toss out another random aphorism: 
the short-term pain of change is often better than the long term agony of 
stagnation. 

— Rob

Re: [PHP-DEV] [RFC] Default expression

2024-08-24 Thread Rob Landers
On Sat, Aug 24, 2024, at 18:49, Bilge wrote:
> Hi gang,
> 
> New RFC just dropped: https://wiki.php.net/rfc/default_expression. I 
> think some of you might enjoy this one. Hit me with any feedback.
> 
> This one already comes complete with working implementation that I've 
> been cooking for a little while. Considering I don't know C or PHP 
> internals, one might think implementing this feature would be 
> prohibitively difficult, but considering the amount of help and guidance 
> I received from Ilija, Bob and others, it would be truer to say it would 
> have been more difficult to fail! Huge thanks to them.
> 
> Cheers,
> Bilge
> 

This is pretty awesome! I see this as some syntax sugar, to be honest:

> as soon as two or more nullable arguments are involved

I'm not sure what you mean here. I use this method all the time :) much to the 
chagrin of some of my coworkers.

function stuff($foo = 'bar', $baz = 'world');

stuff(...[ ...($foo ? ['foo' => $foo] : []), ...($baz ? ['baz' => $baz] : [])]);

Having this would be a lot less verbose.

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Rob Landers


On Sat, Aug 24, 2024, at 20:16, Stephen Reay wrote:
> 
> 
> > On 25 Aug 2024, at 00:01, Ilija Tovilo  wrote:
> > 
> > Hi Stephen
> > 
> > On Sat, Aug 24, 2024 at 1:54 PM Stephen Reay  
> > wrote:
> >> 
> >> Thanks for clarifying. Out of curiosity, how much optimisation do you 
> >> imagine would be possible if the lookups were done the same was as classes 
> >> (ie no fallback, names must be local, qualified or imported with `use`)?
> > 
> > I haven't measured this case specifically, but if unqualified calls to
> > local functions are indeed rare (which the last analysis seems to
> > indicate), then it should make barely any difference. Of course, if
> > your code makes lots of use of them, then the story might be
> > different. That said, the penalty of an ambiguous internal call is
> > much higher than that of a user, local call, given that internal calls
> > sometimes have special optimizations or can even be entirely executed
> > at compile time. For local calls, it will simply lead to a double
> > lookup on first execution.
> > 
> >> I am aware this is a BC break. But if it's kosher to discuss introducing a 
> >> never ending BC break I don't see why this isn't a valid discussion 
> >> either. It would give *everyone* that elusive 2-4% performance boost, 
> >> would resolve any ambiguity about which function a person intended to call 
> >> (the claimed security issue) and would bring consistency with the way 
> >> classes/etc are referenced.
> > 
> >> From my analysis, there were 2 967 unqualified calls to local
> > functions in the top 1 000 repositories. (Disclaimer: There might be a
> > "use function" at the top for some of these, the analysis isn't that
> > sophisticated.)
> > 
> > I also ran the script to check for unqualified calls to global
> > functions (or at least functions that weren't statically visible in
> > that scope in any of the repositories files), and there were ~139 000
> > of them. It seems like this is quite a different beast. To summarize:
> > 
> > 1. Flipping lookup order: ~a few dozens of changes
> > 2. Global only: ~3 000 changes
> > 3. Local only: ~139 000 changes
> > 
> > While much of this can be automated, huge diffs still require
> > reviewing time, and can lead to many merge conflicts which also take
> > time to resolve. I would definitely prefer to go with 1. or
> > potentially 2.
> > 
> > Ilija
> > 
> 
> 
> Hi Ilija,
> 
> I understand that a change like (3) is a huge BC break, and as I said 
> earlier, I wasn't actually suggesting that is the action to take, because I 
> don't think there is sufficient reason to take *any* action. But given that 
> some people in this thread seem convinced that *a* change to functionality is 
> apparently required, I do think every potential change, and it's pros and 
> cons, should be discussed.
> 
> 
> As I've said numerous times, and been either outright dismissed or ignored: 
> there has been a consistent push from a non-trivial number of internals 
> members that userland developers should make better use of regular functions, 
> rather than using classes as fancy namespaces. There was a recent RFC vote 
> that implicitly endorsed this opinion.
> 
> Right now, the lookup rules make namespaced regular functions a consistent 
> experience for developers, but the lack of autoload makes it unpopular, and 
> the lack of visibility for such symbols can be problematic. 
> 
> With the change you're proposing, there will be *another* hurdle that makes 
> the use of regular namespaced functions harder/less intuitive, or potentially 
> (with option 1) unpredictable over PHP versions, due to the constant threat 
> of BC breaks due to new builtin functions - right when we have not one but 
> two RFCs for function autoloading (arguably the biggest barrier to their 
> increased usage in userland).
> 
> 
> 
> So the whole reason I asked about (3) is because it would
> - (a) bring consistency with class/interface/trait symbols;
> - (b) inherently bring the much desired 2% performance boost for function 
> calls, because people would be forced to qualify the names;
> - (c) have zero risk of of future WTF BC break when a new global function 
> interrupting local function lookups;
> - (d) have no need for a new "simpler" qualifying syntax (you can't get 
> shorter than 1 character);
> - (e) presumably simplify function autoloading, because there's no longer any 
> "fallback" step to worry about before triggering an autoloader;
> - (e) even solve the "security" concerns John raised, because the developer 
> would be forced to qualify their usage if they wanted to use the builtin 
> function - their intent is always explicit, never guessed.
> 
> 
> 
> Yes, it is a huge BC break in terms of the amount of code that's affected. 
> But it's almost certainly one of the simplest BC break to "fix" in the 
> history of PHP BC breaks.
> 
> 
> How much code was affected when register globals was removed? Or when magic 
> quotes was removed? Or when short cod

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Rob Landers


On Sat, Aug 24, 2024, at 18:34, Ilija Tovilo wrote:
> Stephen
> 
> On Sat, Aug 24, 2024 at 2:00 PM Stephen Reay  wrote:
> >
> > When I said this thread reads like an April fools joke that wasn't a 
> > challenge you know.
> 
> We *just* had somebody temporarily banned for ad-hominem attacks like
> a week ago. Please familiarize yourself with the mailing list rules.
> They apply to everyone.
> 
> https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
> 
> Most significantly:
> 
> > a. Make everybody happier, ...
> 
> and
> 
> > 1. Do not post when you are angry. Any post can wait a few hours. Review 
> > your post after a good breather, or a good nights sleep.
> 
> Ilija
> 

For what it’s worth. I specifically was not offended and took it as 
humour/sarcasm/exasperation.

I actually did lol. 

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Rob Landers


On Sat, Aug 24, 2024, at 13:59, Stephen Reay wrote:
> 
> 
> > On 24 Aug 2024, at 16:24, Rob Landers  wrote:
> > 
> > In other words, if you want to autoload a global function, you need to call 
> > it fully qualified.
> 
> When I said this thread reads like an April fools joke that wasn't a 
> challenge you know.
> 
> 
> Are you seriously suggesting that unqualified function lookups should be 
> global first, then local, except if it's to be autoloader and then the global 
> ones *have* to be fully qualified?

More like it only supports autoloading locally namespaced functions when 
unqualified. So, everything else works exactly the same.

Here's a table that might help (note, just typed it up off the top of my head 
so may have errors) for a global-first behavior:

| defined | qualified | type   | from   | autload | name   | example
 |
|-|---|||-||-|
| true| true  | global | N/A| false   | N/A| 
\strlen('hello')|
| false   | true  | global | N/A| true| \myfunc| 
\myfunc('hello')|
| true| false | global | global | false   | N/A| 
strlen('hello') |
| true| false | global | ns | false   | N/A| 
strlen('hello') |
| false   | false | global | ns | true| ns\strlen  | 
strlen('hello') |
| false   | false | global | global | true| \strlen| 
\strlen('hello')|
| true| true  | ns | N/A| false   | N/A| 
\ns\myfunc('hello') |
| false   | true  | ns | N/A| true| ns\myfunc  | 
\ns\myfunc('hello') |
| true| false | ns | ns | false   | N/A| 
myfunc('hello') |
| false   | false | ns | ns | true| ns\myfunc  | 
myfunc('hello') |

With "local-first": if your autoloader receives a name "ns\strlen" then you 
should look for ns/strlen. An optimized autoloader will have a function map 
(similar to class map) that can quickly determine if that function exists in 
the project or not and where to load it from. For example, in my tests, I have 
a function map that breaks up the map into a specialized trie that appears to 
faster than an array for an arbitrary number of functions. In this case, it 
would know to drop it after about 1-2 steps into the prefix tree, return and 
let it look up the global.

With global-first, the autoloader never even gets called for something like 
strlen; instead it will be resolved in the global scope.

Now let’s look at the case if you want to have a written function called 
"myfunc()" in the global namespace. You want it to be autoloaded. Now, in some 
namespace ("ns"), the developer writes calls "myfunc()" unqualified, which is 
yet to be defined. The autoloader will be called (in both implementations) with 
the name "ns\myfunc" and it will be up to the autoloader implementation what to 
do about this. It can first walk the trie and decide there is nothing to do 
here, which is the most performant option. Alternatively, it can get the 
basename of  "ns\myfunc" (which would be "myfunc") and walk the trie again. Say 
it does that and finds your function. Now when we return from the autoloader, 
we have to check the function table for both, again.

If we only allow autoloading from the current namespace for unqualified calls, 
we simplfiy autoloading implementations and speed up things for everyone. 
Someone can come along and amend this with an RFC in the future, but it would 
be much harder to go the other way around.

Further, you can always call your global function, like "\myfunc()" and it 
would "just work."

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Rob Landers


On Sat, Aug 24, 2024, at 11:00, Rob Landers wrote:
> On Fri, Aug 23, 2024, at 23:57, Ilija Tovilo wrote:
>> On Fri, Aug 23, 2024 at 9:41 PM Rowan Tommins [IMSoP]
>>  wrote:
>> >
>> > On 23 August 2024 18:32:41 BST, Ilija Tovilo  
>> > wrote:
>> > >IMO, 1. is too drastic. As people have mentioned, there are tools to
>> > >automate disambiguation. But unless we gain some other benefit from
>> > >dropping the lookup entirely, why do it?
>> >
>> > I can think of a few disadvantages of "global first":
>> >
>> > - Fewer code bases will be affected, but working out which ones is harder. 
>> > The easiest migration will probably be to make sure all calls to 
>> > namespaced functions are fully qualified, as though it was "global only".
>> 
>> To talk about more concrete numbers, I now also analyzed how many
>> relative calls to local functions there are in the top 1000 composer
>> packages.
>> 
>> https://gist.github.com/iluuu1994/9d4bbbcd5f378d221851efa4e82b1f63
>> 
>> There were 4229 calls to local functions that were statically visible.
>> Of those, 1534 came from thecodingmachine/safe, which I'm excluding
>> again for a fair comparison. The remaining 2695 calls were split
>> across 210 files and 27 repositories, which is less than I expected.
>> 
>> The calls that need to be fixed by swapping the lookup order are a
>> subset of these calls, namely only the ones also clashing with some
>> global function. Hence, the process of identifying them doesn't seem
>> fundamentally different. Whether the above are "few enough" to justify
>> the BC break, I don't know.
>> 
>> > - The engine won't be able to optimise calls where the name exists locally 
>> > but not globally, because a userland global function could be defined at 
>> > any time.
>> 
>> When relying on the lookup, the lookup will be slower. But if the
>> hypothesis is that there are few people relying on this in the first
>> place, it shouldn't be an issue. It's also worth noting that many of
>> the optimizations don't apply anyway, because the global function is
>> also unknown and hence a user function, with an unknown signature.
>> 
>> > - Unlike with the current way around, there's unlikely to be a use case 
>> > for shadowing a namespaced name with a global one; it will just be a 
>> > gotcha that trips people up occasionally.
>> 
>> Indeed. But this is a downside of both these approaches.
>> 
>> > None of these seem like showstoppers to me, but since we can so easily go 
>> > one step further to "global only", and avoid them, why wouldn't we?
>> >
>> > Your answer to that seems to be that you think "global only" is a bigger 
>> > BC break, but I wonder how much difference it really makes. As in, how 
>> > many codebases are using unqualified calls to reference a namespaced 
>> > function, but *not* shadowing a global name?
>> 
>> I hope this provides some additional insight. Looking at the analysis,
>> I'm not completely opposed to your approach. There are some open
>> questions. For example, how do we handle functions declared and called
>> in the same file?
>> 
>> namespace Foo;
>> function bar() {}
>> bar();
>> 
>> Without a local fallback, it seems odd for this call to fail. An
>> option might be to auto-use Foo\bar when it is declared, although that
>> will require a separate pass over the top functions so that functions
>> don't become order-dependent.
>> 
>> Ilija
>> 
> 
> Hey Ilija,
> 
> I'm actually coming around to global first, then local second. I haven't 
> gotten statistically significant results yet though, but preliminary results 
> show that global first gives symfony/laravel their speed boost and function 
> autoloading gives things like wordpress their speed boost. Everyone wins.
> 
> For function autoloading, it is only called on the local check. So, it looks 
> kinda like this:
> 
>  1. does it exist in global namespace?
>1. yes: load the function; done.
>2. no: continue
>  2. does it exist in local namespace?
>1. yes: load the function; done.
>2. no: continue
>  3. call the autoloader for local namespace.
>  4. does it exist in local namespace?
>1. yes: load the function; done.
>2. no: continue
>  5. does it exist in the global namespace?
>1. yes: load the function; done.
>2. no: continue
> 

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-24 Thread Rob Landers
On Fri, Aug 23, 2024, at 23:57, Ilija Tovilo wrote:
> On Fri, Aug 23, 2024 at 9:41 PM Rowan Tommins [IMSoP]
>  wrote:
> >
> > On 23 August 2024 18:32:41 BST, Ilija Tovilo  wrote:
> > >IMO, 1. is too drastic. As people have mentioned, there are tools to
> > >automate disambiguation. But unless we gain some other benefit from
> > >dropping the lookup entirely, why do it?
> >
> > I can think of a few disadvantages of "global first":
> >
> > - Fewer code bases will be affected, but working out which ones is harder. 
> > The easiest migration will probably be to make sure all calls to namespaced 
> > functions are fully qualified, as though it was "global only".
> 
> To talk about more concrete numbers, I now also analyzed how many
> relative calls to local functions there are in the top 1000 composer
> packages.
> 
> https://gist.github.com/iluuu1994/9d4bbbcd5f378d221851efa4e82b1f63
> 
> There were 4229 calls to local functions that were statically visible.
> Of those, 1534 came from thecodingmachine/safe, which I'm excluding
> again for a fair comparison. The remaining 2695 calls were split
> across 210 files and 27 repositories, which is less than I expected.
> 
> The calls that need to be fixed by swapping the lookup order are a
> subset of these calls, namely only the ones also clashing with some
> global function. Hence, the process of identifying them doesn't seem
> fundamentally different. Whether the above are "few enough" to justify
> the BC break, I don't know.
> 
> > - The engine won't be able to optimise calls where the name exists locally 
> > but not globally, because a userland global function could be defined at 
> > any time.
> 
> When relying on the lookup, the lookup will be slower. But if the
> hypothesis is that there are few people relying on this in the first
> place, it shouldn't be an issue. It's also worth noting that many of
> the optimizations don't apply anyway, because the global function is
> also unknown and hence a user function, with an unknown signature.
> 
> > - Unlike with the current way around, there's unlikely to be a use case for 
> > shadowing a namespaced name with a global one; it will just be a gotcha 
> > that trips people up occasionally.
> 
> Indeed. But this is a downside of both these approaches.
> 
> > None of these seem like showstoppers to me, but since we can so easily go 
> > one step further to "global only", and avoid them, why wouldn't we?
> >
> > Your answer to that seems to be that you think "global only" is a bigger BC 
> > break, but I wonder how much difference it really makes. As in, how many 
> > codebases are using unqualified calls to reference a namespaced function, 
> > but *not* shadowing a global name?
> 
> I hope this provides some additional insight. Looking at the analysis,
> I'm not completely opposed to your approach. There are some open
> questions. For example, how do we handle functions declared and called
> in the same file?
> 
> namespace Foo;
> function bar() {}
> bar();
> 
> Without a local fallback, it seems odd for this call to fail. An
> option might be to auto-use Foo\bar when it is declared, although that
> will require a separate pass over the top functions so that functions
> don't become order-dependent.
> 
> Ilija
> 

Hey Ilija,

I'm actually coming around to global first, then local second. I haven't gotten 
statistically significant results yet though, but preliminary results show that 
global first gives symfony/laravel their speed boost and function autoloading 
gives things like wordpress their speed boost. Everyone wins.

For function autoloading, it is only called on the local check. So, it looks 
kinda like this:

 1. does it exist in global namespace?
   1. yes: load the function; done.
   2. no: continue
 2. does it exist in local namespace?
   1. yes: load the function; done.
   2. no: continue
 3. call the autoloader for local namespace.
 4. does it exist in local namespace?
   1. yes: load the function; done.
   2. no: continue
 5. does it exist in the global namespace?
   1. yes: load the function; done.
   2. no: continue

It checks the scopes in reverse order after autoloading because it is more 
likely that the autoloader loaded a local scope function than a global one. 
This adds a small inconsistency (if the autoloader were to load both a global 
and non-global function of the same name), but keeps autoloading fast for 
unqualified function calls. By checking global first, for OOP-centric codebases 
like Symfony and Laravel that call unqualified global functions, they never hit 
the autoloader. For things that do call qualified local-namespace functions, 
they hit the autoloader and immediately start loading them. The worst 
performance then becomes autoloading global functions that are called 
unqualified. Not only do you have to strip out the current namespace in the 
autoloader, but you have to deal with being the absolute last check in the 
function table. However, (and I'm still trying to figure 

Re: [PHP-DEV] State of Generics and Collections

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 23:06, Larry Garfield wrote:
> On Fri, Aug 23, 2024, at 1:38 PM, Rob Landers wrote:
> > On Fri, Aug 23, 2024, at 20:27, Bruce Weirdan wrote:
> >> On Fri, Aug 23, 2024 at 4:27 PM Larry Garfield  
> >> wrote:
> >>> Moving those definitions to attributes is certainly possible, though 
> >>> AFAIK both the PHPStan and Psalm devs have expressed zero interest in it.
> >>> Part of the challenge is that such an approach will either still involve 
> >>> string parsing,
> >> 
> >> That's not really a challenge and would help somewhat with the current 
> >> status quo where we have to guess where the type ends and the textual part 
> >> of the comment begins. But it gets ugly for any type that has to include 
> >> quotes (literal strings, array keys, etc). Technically one can use 
> >> nowdocs, but it's not much better:  https://3v4l.org/4hpte
> >>  
> >>> or will involve a lot of deeply nested attribute classes. 
> >> 
> >> Yeah, that would look like Lisp's S-exprs, but much worse - which, in my 
> >> opinion, would harm adoption.
> >> 
> >> All in all, in my opinion attribute-based solutions are less ergonomic 
> >> than what we already have now in docblocks.
> >> 
> >> --
> >>   Best regards,
> >>   Bruce Weirdan 
> >> mailto:weir...@gmail.com
> >
> > Thank you Larry for expressing some of the problems. Is there any 
> > reason nesting has to be supported out of the gate? Think about type 
> > hints. It started with some basic functionality and then grew over 
> > time. There is no reason we have to have a new kitchen sink, oven, 
> > dishwasher and stove when all we want is a new refrigerator. 
> >
> > — Rob
> 
> While I understand the temptation to "just do part of it", which comes up 
> very often, I must reiterate once again that can backfire badly.  That is 
> only sensible when:
> 
> 1. There's a very clear picture to get from A->Z.
> 2. The implementation of C and D cannot interfere with the design or 
> implementation of J or K.
> 3. The steps along the way offer clear self-contained benefits, such that if 
> nothing else happens, it's still a "complete" system and a win.
> 4. The part being put off to later isn't just putting off the "hard part".
> 
> In practice, the level at which you get all four is quite coarse, much 
> coarser than it seems most people on this list think.

I wasn't intending to just say "just do it," but rather, is it "good enough." 
As I mentioned in another email on this topic, right now there is only one 
person in the world who can work on the problem. Sure, we can leave drive-by 
comments and our own experiences/opinions here and on github, but ultimately, 
the knowledge of how it works and how it can be improved exists solely within 
one (or thereabouts) person's brain; on the entire planet.

This is sort-of how when we write software, we try to keep small PRs. Small PRs 
can be reviewed quickly and merged. Other developers on the team can start 
interacting with the code, even if the feature that it pertains to is 
incomplete. From that point forward, other developers can improve that code, 
separately from the person who is working on the feature. The knowledge of how 
it works is shared and people with different perspectives and experiences can 
make it better.

> 
> Examples of where we have done that:
> 
> * Enums.  The initial Enum RFC is part one of at least 3 steps.  Step 2 is 
> pattern matching, Step 3 is ADTs/tagged unions.  Those are still coming, but 
> all three were spec'ed out in advance (1), we're fairly confident that the 
> enum design will play nice with tagged unions (2), and enums step 1 has very 
> clearly been hugely positive for the language (3, 4).
> 
> * Property hooks and aviz.  These were designed together.  They were 
> originally a single planning document, way back in Nikita's original RFC.  
> After effectively doing all the design work of both together, we split up the 
> implementations to make them easier.  Hooks was still a large RFC, but that 
> was after we split things up.  That meant we had a clear picture of how the 
> two would fit together (1, 2), either RFC on its own would have been 
> beneficial to the language even if they're better together (2, 3), and both 
> were substantial tasks in themselves (4).
> 
> * Gina's ongoing campaign to make PHP's type juggling have some passing 
> resemblance to logic.
> 
> With generics, the syntax isn't th

Re: [PHP-DEV] [RFC] On the need of a `is_int_string` ?

2024-08-23 Thread Rob Landers


On Fri, Aug 23, 2024, at 23:10, Vincent Langlet wrote:
> I found a simpler implementation later which rely on array_keys
> ```
> fn is_int_string(string $s): bool => \is_int(array_keys([$s => null])[0]);
> ```
> 
> I considered that `is_int_string` was better in the same namespace than
> `is_object`, `is_array`, `is_int`, `is_numeric`, ... but maybe there was 
> something better
>  than `int_string` to describe this category of string since english is not 
> good (integish ? integable ? integerable ?).
> But indeed it could be interesting to relate this method to the array 
> namespace...

Don't forget to bottom post!

I'm curious which one is faster/more efficient. :) 

> Anyway, this topic does not seems to interest lot of developer so far ^^'

I think it would be worth writing an RFC for. While super-niche feeling, I was 
literally bit by this the other day when storing what I thought were numbers as 
keys, but it turned out they got stored as strings (2-digit numbers, and the 
zero-prefix bit me). That being said, I'm not sure how this function would have 
helped me (other than tossing in a few asserts to make sure things were sane). 
So, writing an RFC will force you to think about usecases like that and some 
examples to highlight the issue it helps solve.

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 21:41, Rowan Tommins [IMSoP] wrote:
> 
> 
> On 23 August 2024 18:32:41 BST, Ilija Tovilo  wrote:
> >IMO, 1. is too drastic. As people have mentioned, there are tools to
> >automate disambiguation. But unless we gain some other benefit from
> >dropping the lookup entirely, why do it?
> 
> I can think of a few disadvantages of "global first":
> 
> - Fewer code bases will be affected, but working out which ones is harder. 
> The easiest migration will probably be to make sure all calls to namespaced 
> functions are fully qualified, as though it was "global only".
> - Even after the initial migration, users will have to watch out for new 
> conflicting global functions. Again, this can be avoided by just pretending 
> it's "global only". 
> - The engine won't be able to optimise calls where the name exists locally 
> but not globally, because a userland global function could be defined at any 
> time.
> - Unlike with the current way around, there's unlikely to be a use case for 
> shadowing a namespaced name with a global one; it will just be a gotcha that 
> trips people up occasionally.
> 
> None of these seem like showstoppers to me, but since we can so easily go one 
> step further to "global only", and avoid them, why wouldn't we? 
> 
> Your answer to that seems to be that you think "global only" is a bigger BC 
> break, but I wonder how much difference it really makes. As in, how many 
> codebases are using unqualified calls to reference a namespaced function, but 
> *not* shadowing a global name?

I can think of more than one one-off script where I have written something like 
this:

namespace blah;

function read_and_process_file(): array {
}

function do_something(array $file): void { }

$file = read_and_process_file();
var_dump($file);
// die(); // debug
do_something($file);

If it were global only, then how would I call those files? 
namespace\read_and_process_file()?

That seems worse ergonomics and not better, for very little gain.


> 
> Regards,
> Rowan Tommins
> [IMSoP]
> 

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-23 Thread Rob Landers
On Mon, Aug 19, 2024, at 19:08, Derick Rethans wrote:
> Hi!
> 
> Arnaud, Larry, and I have been working on an article describing the 
> state of generics and collections, and related "experiments".
> 
> You can find this article on the PHP Foundation's Blog:
> https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
> 
> cheers,
> Derick
> 

Hello,
I had an idea I wanted to share:

Since PHP Opcache now has an Intermediate Representation (IR), could we 
potentially store this IR in a file and recompile it on demand? This approach 
could make PHP ahead-of-time compiled (though I believe this already happens 
in-memory, to some extent, via opcache).

I'm curious to hear your thoughts on the potential benefits and drawbacks of 
this idea. Could this solve some existing issues (especially in regards to 
generics/etc), or might it introduce new ones? Additionally, would it make 
sense to consider breaking PHP into separate components: the runtime, the 
libraries/extensions, and the compiler?

Regards,

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-23 Thread Rob Landers


On Fri, Aug 23, 2024, at 20:27, Bruce Weirdan wrote:
> On Fri, Aug 23, 2024 at 4:27 PM Larry Garfield  wrote:
>> Moving those definitions to attributes is certainly possible, though AFAIK 
>> both the PHPStan and Psalm devs have expressed zero interest in it.
>> Part of the challenge is that such an approach will either still involve 
>> string parsing,
> 
> That's not really a challenge and would help somewhat with the current status 
> quo where we have to guess where the type ends and the textual part of the 
> comment begins. But it gets ugly for any type that has to include quotes 
> (literal strings, array keys, etc). Technically one can use nowdocs, but it's 
> not much better:  https://3v4l.org/4hpte
>  
>> or will involve a lot of deeply nested attribute classes. 
> 
> Yeah, that would look like Lisp's S-exprs, but much worse - which, in my 
> opinion, would harm adoption.
> 
> All in all, in my opinion attribute-based solutions are less ergonomic than 
> what we already have now in docblocks.
> 
> --
>   Best regards,
>   Bruce Weirdan 
> mailto:weir...@gmail.com

Thank you Larry for expressing some of the problems. Is there any reason 
nesting has to be supported out of the gate? Think about type hints. It started 
with some basic functionality and then grew over time. There is no reason we 
have to have a new kitchen sink, oven, dishwasher and stove when all we want is 
a new refrigerator. 

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers


On Fri, Aug 23, 2024, at 19:32, Ilija Tovilo wrote:
> On Fri, Aug 23, 2024 at 5:49 PM Rowan Tommins [IMSoP]
>  wrote:
> >
> > Other proposals aim to shift that balance - leaving some inconsistency, but 
> > less compatibility break.
> >
> > And most users don't object to using a leading backslash, they just (quite 
> > reasonably) have no idea what impact it has on the ability of the engine to 
> > optimise their code.
> 
> For some context: Before proposing this change, I asked Symfony if
> they were interested in disambiguating calls to improve performance,
> given I did some work to make fully qualified calls faster. But they
> were not, stating that the change would be too verbose for their
> liking.

I think if someone values code beauty more than speed, they can do that. 
Although, I find that rather hilarious when their code base is littered with 
goto, “for speed.”

What it really sounds like is that they realized you would just change the 
language and they wouldn’t have to review those changes… /s

> 
> Making unqualified calls to mean local would force Symfony into making
> the change, which is not the approach I'm interested in taking. Making
> them global would likely reduce breakage by much, but not nearly as
> much as keeping the fallback.

I don’t think changing the language for a specific framework(s) is a good idea.

> 
> From reading the responses, it seems we have three primary camps:
> 
> 1. People who don't think BC is a problem, and would like to drop
> either the global or local lookup entirely, requiring disambiguation.
> 2. People who do think BC is a problem, and would like some opt-in
> mechanism to explicitly pick global or local scope.
> 3. People who aren't convinced that the performance improvements are
> worth it to begin with, or that the developers themselves are
> responsible for disambiguation.
> 
> IMO, 1. is too drastic. As people have mentioned, there are tools to
> automate disambiguation. But unless we gain some other benefit from
> dropping the lookup entirely, why do it? Consistency with class
> lookups is a factor, but is it enough to break a large portion of
> codebases? The summed up time of every maintainer installing and
> running a tool that modifies a large portion of the codebase, and then
> dealing with conflicts in existing branches is not miniscule. Fixing
> local calls will also require context from other files to correctly
> disambiguate. I'm not aware if any tools actually consider context, or
> just take the naive approach of making known, internal calls global,
> and leaving the rest.

Aren’t we doing that anyway with your proposal? Sure, maybe that doesn’t 
require (much) changes, right now. But there is an RFC being discussed right 
now which introduces a new function called “parse_html”

This seems like a super generic name that did not take global-first into 
account. I know for a fact I have seen code with that exact function name 
several times in my career.

> 
> 2. misses the point of the immediate performance gains without
> modifications to the codebase. Even if the disambiguation itself is a
> one-liner, it still needs to be added to every codebase and every
> file, and still requires fixing actual local calls that may be made
> within the same file.
> 
> I obviously also disagree with 3. as I wouldn't have sent this
> proposal otherwise. :) Performance improvements are hard to come by
> nowadays. It was measured on real codebases (Symfony and Laravel).
> 
> Ilija
> 

Applications are more likely to get better performance gains in symfony by 
uninstalling doctrine and writing optimized queries, to be completely honest.

— Rob

Re: [PHP-DEV] Re: Negatively Voted Notes

2024-08-23 Thread Rob Landers


On Fri, Aug 23, 2024, at 15:05, Bilge wrote:
> On 23/08/2024 11:34, Derick Rethans wrote:
>> On Wed, 10 Jul 2024, Derick Rethans wrote:
>> 
>> 
>>> We discussed this during one of our foundation meetings, and we propose:
>>> 
>>> - to delete all notes with a rating less than -5 that are older than a 
>>>   year.
>>> 
>> As general consensus was that this correct, I will be creating a script 
>> for this.
> What is to stop a malicious actor committing a distributed downvote and 
> wiping the entire notes database?
> 
> Cheers, Bilge
> 

I imagine they have a snapshot from before it was discussed. 

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers


On Fri, Aug 23, 2024, at 14:56, Christian Schneider wrote:
> Am 23.08.2024 um 12:27 schrieb Rob Landers :
> > On Fri, Aug 23, 2024, at 12:14, Christian Schneider wrote:
> >> Am 23.08.2024 um 11:34 schrieb Nick Lockheart :
> >> > I think we are all trying to achieve the same thing here.
> >> 
> >> I'm not sure who "we" and what "same thing" here exactly is.
> > 
> > Nick was replying to me :p, judging by the quoted paragraph.
> 
> The "all" in his sentence suggested to me that he means more than him and you.
> But then again I might have misinterpreted this.
> 
> > As far as function overloading goes, I recommend checking out a draft RFC 
> > I've been working on a very, very long time: 
> > https://wiki.php.net/rfc/records. In some off-list discussions, it was 
> > clear that if I wanted this syntax, I would need to pursue function 
> > autoloading.
> 
> Definitely an interesting read, thanks a lot for the work you put into it!
> 
> > Further, function autoloading is a clearly missing feature that would be 
> > useful in many situations.
> 
> The "clearly missing" and "many" part is where I disagree. But I was mainly 
> considering current PHP, not future PHP syntax like the Records stuff, agreed.
> 
> > If function autoloading doesn't work out, I will need to take a different 
> > approach to that syntax (which is fine, but not something I want because I 
> > chose the syntax for a very good reason).
> 
> I know you do not want to discuss this here as it is off-topic but it kind of 
> feels the only advantage is to get rid of "new" in the usage of Records. But 
> I'll leave it at that as to per your request, we can revisit that once the 
> RFC hits the discussion stage.
> 
> > That being said, I'm not ready to discuss records here, so this is the 
> > first and last time I'll mention it on the thread. There is a Reddit post 
> > in r/php and a GitHub repo if you are interested in discussing records. 
> > There are very many things to work out still, and it is very much 
> > work-in-progress.
> 
> Also a bit off-topic but I still have to mention it, maybe worth another 
> thread:
> I understand where you are coming from but at the same time it feels a bit 
> worrying to me to use another medium (reddit) for a discussion about future 
> language features when we have this mailing list.

Don't be worried about it too much. Many RFCs start somewhere else first before 
they end up here. First as an idea, then a draft, then they ask 
friends/coworkers to read them over, etc. By the time it ends up on the list, a 
lot of work has been done (in some cases). Sometimes, they are simple-ish RFCs 
that need little work and are pretty straightforward, but for more complex 
ones, there are usually several cycles before it will end up on the mailing 
list. Further, it has come to my attention that an implementation is basically 
an unwritten requirement, so spending time on that is also a delay there. At 
least, that has been my experience so far with that one.

> 
> I hope this won't mean that questions/suggestions/concerns on this mailing 
> list won't be discredited because of discussions which happened elsewhere. 
> I'm sorry if I sound a bit paranoid here but I've been in this situation 
> before in other (not software related) aspects of my life before where I was 
> told that something was already decided and people were not willing to go 
> back on certain issues because of that.

The way I look at it, nothing is set in stone until everyone has seen it and 
had a chance to respond. Yes, there are good reasons that things are the way 
they are in that RFC, and during discussion, I expect those reasons will come 
up. I have no idea if those reasons stand up under scrutiny, and I won't find 
out until then. These are all known unknowns. 

There are some people on the list who believe once it is on the list, it is 
unchangeable unless you are a voter and act accordingly, such as ignoring 
non-voter concerns. I, personally, feel that shouldn't be how things work. It 
is "our language" (voter or not) and not "Rob's language." I guess we will see 
how that plays out in the coming months.

> 
> Regards,
> - Chris
> 

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 14:16, Rob Landers wrote:
> On Fri, Aug 23, 2024, at 11:27, Nick Lockheart wrote:
>> On Fri, 2024-08-23 at 09:16 +0100, Rowan Tommins [IMSoP] wrote:
>> > 
>> > 
>> > On 23 August 2024 01:42:38 BST, Nick Lockheart 
>> > wrote:
>> > > 
>> > > > 
>> > > > BUT, if people already complain about "\" being ugly, having to
>> > > > write
>> > > > "namespace\" is going to make them REALLY grumpy...
>> > > > So maybe at the same time (or, probably, in advance) we need to
>> > > > come
>> > > > up with a nicer syntax for explicitly referencing the current
>> > > > namespace.
>> > > 
>> > >namespace foo using global functions;
>> > > 
>> > > - or - 
>> > > 
>> > >namespace foo using local functions;
>> > > 
>> > > 
>> > > Tell PHP what you want at the per-file level.
>> > 
>> > 
>> > This doesn't seem mutually exclusive to me. If you have a file where
>> > you've opted for "using global functions", you might want a way to
>> > reference a function in the current namespace. 
>> 
>> Correct, so if you use the example:
>> 
>> namespace foo using global functions;
>> 
>> you can write:
>> 
>> array_key_exists();
>> 
>> and it will be resolved as global without a namespace lookup and will
>> use the dedicated opcode.
>> 
>> But if you need to use a local function you can do:
>> 
>> \foo\sort();
>> 
>> 
>> The proposed global/local declaration as part of the namespace
>> declaration just turns off namespace lookups and sets the default
>> resolution for **unqualified** names.
>> 
>> Fully qualified names are not affected.
>> 
>> 
>> > It also doesn't address my other point, that having global as the
>> > default mode (even if we provide an option for local) is much less
>> > disruptive to existing code.
>> 
>> 
>> They are compatible, but related decisions.
>> 
>> I think it would be easier for people to accept a new PHP version where
>> unqualified names were always global, if we also had an option to make
>> local/namespaced the default resolution for *unqualified* names, on a
>> per-file basis, for those who need that.
>> 
>> 
>> Thus, there are multiple decision points:
>> 
>> 1. Should we do namespace lookups on unqualified function calls at all?
>> 
>> 2. If yes to 1, should we lookup in global first or local first?
>> 
>> 3. Regardless of 1 or 2, should we let developers explicitly specify a
>> behavior for unqualified calls in the namespace declaration?
>> 
>> 4. If yes to 1, should the behavior of namespace lookups change for
>> user-defined functions vs PHP built-in function names?
>> 
>> 
>> These aren't mutually exclusive, but they all work together to create a
>> complete behavior.
>> 
>> There are several ways that the above options could be combined:
>> 
>> 
>> 
>> ### OPTION ONE ###
>> 
>> Using a regular namespace declaration still does an NS lookup, in the
>> same order, just like it normally works now.
>> 
>> That means that code that uses:
>> 
>> namespace foo;
>> 
>> will behave exactly the same as today, with no BC breaks.
>> 
>> Developers using the new PHP version could opt-in to explicit namespace
>> behavior with:
>> 
>> namespace foo using global functions;
>> 
>> or
>> 
>> namespace foo using local functions;
>> 
>> In both cases, *fully-qualified* names still work the same.
>> 
>> Only *unqualified* names are affected by this directive, and they use
>> local only or global only, depending on the declaration.
>> 
>> 
>> 
>> ### OPTION TWO ###
>> 
>> Namespace lookup is removed from a future version of PHP.
>> 
>> Code that uses the current namespace declaration: 
>> 
>> namespace foo;
>> 
>> will assume that all unqualified function calls are global scope.
>> 
>> To use a function in the local namespace, it can be fully qualified
>> with:
>> 
>> \foo\MyFunction();
>> 
>> 
>> But, developers could also write:
>> 
>>  namespace foo using local functions;
>> 
>> And all unqualified function names would be resolved to local at
>&

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 11:27, Nick Lockheart wrote:
> On Fri, 2024-08-23 at 09:16 +0100, Rowan Tommins [IMSoP] wrote:
> > 
> > 
> > On 23 August 2024 01:42:38 BST, Nick Lockheart 
> > wrote:
> > > 
> > > > 
> > > > BUT, if people already complain about "\" being ugly, having to
> > > > write
> > > > "namespace\" is going to make them REALLY grumpy...
> > > > So maybe at the same time (or, probably, in advance) we need to
> > > > come
> > > > up with a nicer syntax for explicitly referencing the current
> > > > namespace.
> > > 
> > >namespace foo using global functions;
> > > 
> > > - or - 
> > > 
> > >namespace foo using local functions;
> > > 
> > > 
> > > Tell PHP what you want at the per-file level.
> > 
> > 
> > This doesn't seem mutually exclusive to me. If you have a file where
> > you've opted for "using global functions", you might want a way to
> > reference a function in the current namespace. 
> 
> Correct, so if you use the example:
> 
> namespace foo using global functions;
> 
> you can write:
> 
> array_key_exists();
> 
> and it will be resolved as global without a namespace lookup and will
> use the dedicated opcode.
> 
> But if you need to use a local function you can do:
> 
> \foo\sort();
> 
> 
> The proposed global/local declaration as part of the namespace
> declaration just turns off namespace lookups and sets the default
> resolution for **unqualified** names.
> 
> Fully qualified names are not affected.
> 
> 
> > It also doesn't address my other point, that having global as the
> > default mode (even if we provide an option for local) is much less
> > disruptive to existing code.
> 
> 
> They are compatible, but related decisions.
> 
> I think it would be easier for people to accept a new PHP version where
> unqualified names were always global, if we also had an option to make
> local/namespaced the default resolution for *unqualified* names, on a
> per-file basis, for those who need that.
> 
> 
> Thus, there are multiple decision points:
> 
> 1. Should we do namespace lookups on unqualified function calls at all?
> 
> 2. If yes to 1, should we lookup in global first or local first?
> 
> 3. Regardless of 1 or 2, should we let developers explicitly specify a
> behavior for unqualified calls in the namespace declaration?
> 
> 4. If yes to 1, should the behavior of namespace lookups change for
> user-defined functions vs PHP built-in function names?
> 
> 
> These aren't mutually exclusive, but they all work together to create a
> complete behavior.
> 
> There are several ways that the above options could be combined:
> 
> 
> 
> ### OPTION ONE ###
> 
> Using a regular namespace declaration still does an NS lookup, in the
> same order, just like it normally works now.
> 
> That means that code that uses:
> 
> namespace foo;
> 
> will behave exactly the same as today, with no BC breaks.
> 
> Developers using the new PHP version could opt-in to explicit namespace
> behavior with:
> 
> namespace foo using global functions;
> 
> or
> 
> namespace foo using local functions;
> 
> In both cases, *fully-qualified* names still work the same.
> 
> Only *unqualified* names are affected by this directive, and they use
> local only or global only, depending on the declaration.
> 
> 
> 
> ### OPTION TWO ###
> 
> Namespace lookup is removed from a future version of PHP.
> 
> Code that uses the current namespace declaration: 
> 
> namespace foo;
> 
> will assume that all unqualified function calls are global scope.
> 
> To use a function in the local namespace, it can be fully qualified
> with:
> 
> \foo\MyFunction();
> 
> 
> But, developers could also write:
> 
>  namespace foo using local functions;
> 
> And all unqualified function names would be resolved to local at
> compile time. Global functions could still be accessed with a `\` if
> this directive was used:
> 
> \array_key_exists();
> 
> 
> 
> ### OPTION THREE ###
> 
> Namespace lookup is removed from a future version of PHP.
> 
> Code that uses the current namespace declaration:
> 
> namespace foo;
> 
> ...will assume that an *unqualified* function name is a global function
> *IF* it is a PHP built-in function.
> 
> Otherwise, *unqualified* function names that are *not* PHP built-in
> functions will be presumed to be local to the namespace.
> 
> With Option Three, developers can still fully-qualify their functions:
> 
> \foo\array_key_exists();
> 
> ...to override a built-in name with a user function in the current
> namespace.
> 
> Likewise, a fully-qualified:
> 
> \MyFunction();
> 
> called from inside a namespace will still call the global function.
> 
> Only unqualified names are affected.
> 
> As an additional optional feature of Option Three, developers can
> change this behavior with:
> 
> namespace foo using global functions;
> 
> or
> 
> namespace foo using local functions;
> 
> 
> Only *unqualified* names are affected by this directive, and they use
> local only or

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 12:14, Christian Schneider wrote:
> Am 23.08.2024 um 11:34 schrieb Nick Lockheart :
> > I think we are all trying to achieve the same thing here.
> 
> I'm not sure who "we" and what "same thing" here exactly is.
> 
> I recall the following arguments for changing the current situation about 
> function look ups:
> - Performance
> - Function autoloading
> - Consistency
> 
> Did I miss something big?

Nick was replying to me :p, judging by the quoted paragraph.

> 
> First of all I don't think the performance argument holds enough weight as 
> I'm very doubtful this impacts performance of a real world application in a 
> significant way. And for people *really* hitting this problem there is a 
> solution already.
> Secondly I am a bit confused about the whole function autoloading discussion: 
> There is already a good-enough mechanism (putting them as static functions 
> inside a tool class). I just don't consider the hoops we have to jump through 
> to get a more "pure" or fine-grained solution for a special problem not worth 
> it. As for the "don't use classes for static functions" I've yet to see a 
> good argument apart from personal preference.
> As far as consistency goes I've yet to encounter someone being confused about 
> function resolution. But then again I'm not reaching namespaces for PHP 
> classes.

As far as function overloading goes, I recommend checking out a draft RFC I've 
been working on a very, very long time: https://wiki.php.net/rfc/records. In 
some off-list discussions, it was clear that if I wanted this syntax, I would 
need to pursue function autoloading. Further, function autoloading is a clearly 
missing feature that would be useful in many situations. If function 
autoloading doesn't work out, I will need to take a different approach to that 
syntax (which is fine, but not something I want because I chose the syntax for 
a very good reason). That being said, I'm not ready to discuss records here, so 
this is the first and last time I'll mention it on the thread. There is a 
Reddit post in r/php and a GitHub repo if you are interested in discussing 
records. There are very many things to work out still, and it is very much 
work-in-progress.

> 
> While modern tooling possibly can adapt source code to the new style 
> efficiently I have to maintain too many installations of PHP projects on 
> various hosters to looking forward to that. And the argument that "you can 
> just stay on an old PHP version" is just not a feasible solution either..
> 
> Maybe we should take a step back and reevaluate the pros and cons. 
> 
> - Chris
> 

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 10:08, Nick Lockheart wrote:
> 
> > 
> > If we were to go with any major change in the current lookup where it
> > is perf or nothing, this is what I would propose for php 9.0
> > (starting with an immediate deprecation):
> >1. any unqualified call simply calls the current namespace
> >2. >= php 9.0: no fallback to global
> >3. < php 9.0: emit deprecation notice if falls back to global
> > This is how classes work (pretty sure), so it would be consistent.
> > 
> > Going the other way (global first) doesn't really make sense because
> > it is inconsistent, IMHO. Will it suck? Probably. Will it be easy to
> > fix? Probably via Rector.
> > 
> > — Rob
> 
> 
> A third option, which I haven't seen come up on the list yet, is that
> unqualified functions that are PHP built-ins are treated as global, and
> using a function having the same name as a built-in, in a namespace
> scope, requires a fully qualified name to override the built-in.
> 
> It seems that if someone is writing `array_key_exists()` or similar
> they probably mean the built-in function, and in the rare cases where
> they do mean `\foo\array_key_exists()`, they can write it explicitly.
> 
> Functions that are *not* on the built-in function list could default to
> the local namespace.

I was actually thinking of doing something like this for function autoloading, 
where extensions could register global functions that bypass the autoloader and 
go straight to global if it isn't defined in the local namespace already. I 
decided not to even bring it up because it felt controversial (it would 
effectively be global first, except for user functions). Though, it might be a 
nice compromise?

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-23 Thread Rob Landers
On Fri, Aug 23, 2024, at 09:27, Nick Lockheart wrote:
> On Fri, 2024-08-23 at 07:39 +0100, Rowan Tommins [IMSoP] wrote:
> > 
> > 
> > On 23 August 2024 00:15:19 BST, Mike Schinkel 
> > wrote:
> > > Having to prefix with a name like Foo, e.g. Foo\strlen() is FAR
> > > PREFERABLE to _\strlen() because at least it provides satiating
> > > information rather than the empty calories of a cryptic shorthand. 
> > > #jmtcw, anyway.
> > 
> > I knew I'd regret keeping the example short. Realistically, it's not
> > a substitute for "\Foo\strlen", it's a substitute for
> > "\AcmeComponents\SplineReticulator\Utilities\Text\strlen".
> > 
> > Having a syntax for "relative to current" is incredibly common in
> > other path-like syntaxes. The most common marker is ".", and ".\foo"
> > is literally how you'd refer to something in the current directory
> > under DOS/Windows. But unfortunately, we don't have "." available, so
> > I wondered if "_" would feel similar enough.
> > 
> > Another option would be to find a shorter keyword than "namespace" to
> > put it in front. "ns\strlen(...)" is an obvious step from what we
> > have currently, but it's not very obvious what it means, so maybe
> > there's a different word we could use.
> > 
> > Rowan Tommins
> > [IMSoP]
> 
> Could be mistaken, but I think the way PHP handles namespaces
> internally is sort of the same as a long string, rather than as a
> tree/hierarchy.
> 
> ie. \AcmeComponents\SplineReticulator\Utilities\Text\strlen
> 
> is really like:
> 
> class AcmeComponentsSplineReticulatorUtilitiesTextstrlen {
> 
>public function __construct(){
> 
>}
> 
> }
> 
> And the "AcmeComponentsSplineReticulatorUtilitiesText" just kind of
> gets appended to the front when the class name is registered.
> 
> I haven't done work on the namespace code, but I recall reading this
> somewhere recently.

This is mostly correct, the only thing missing from your strings is the `\` 
character. I believe this even happens during compilation. Meaning it sees your 
namespace/uses and then rewrites the function/class calls during compile time. 
Thus an unqualified call is prepended with the current namespace defined at the 
top of the file.

If we were to go with any major change in the current lookup where it is perf 
or nothing, this is what I would propose for php 9.0 (starting with an 
immediate deprecation):
 1. any unqualified call simply calls the current namespace
 2. >= php 9.0: no fallback to global
 3. < php 9.0: emit deprecation notice if falls back to global
This is how classes work (pretty sure), so it would be consistent.

Going the other way (global first) doesn't really make sense because it is 
inconsistent, IMHO. Will it suck? Probably. Will it be easy to fix? Probably 
via Rector.

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-22 Thread Rob Landers
On Wed, Aug 21, 2024, at 20:32, John Coggeshall wrote:
> 
> 
> On Aug 21 2024, at 2:10 pm, Ilija Tovilo  wrote:
>> 
>> Including a malicious composer package already allows for arbitrary
>> code execution, do you really need more than that?
> 
> Of course. We've seen many examples in the wild of 3rd party libraries 
> getting hijacked to inject malicious code (e.g. the whole `xz`  attack). This 
> behavior in PHP is not obvious, and provides a way to covertly target and 
> hijack specific highly sensitive functions without an obvious way to detect 
> it -- while otherwise behaving exactly as a developer would expect.
> 
> Why possibly would we want to make it easier to perform such an attack, which 
> as Illija pointed out is actually making PHP slower, in the name of backward 
> compatibility? Defense in depth is a cornerstone of application security.
> 
> John

If you have the ability to inject arbitrary code, you've already lost. It 
doesn't matter whether they use this feature, or just register a shutdown 
function, autoloader, replace classes/functions/methods entirely, or whatever. 
Should we remove those features as well?

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-21 Thread Rob Landers


On Wed, Aug 21, 2024, at 10:23, John Coggeshall wrote:
> 
> 
> On Aug 2 2024, at 4:37 pm, Bilge  wrote:
>> My only concern is there needs to be an alternative way to do this: 
>> intercepting internal calls. Sometimes, whether due to poor architecture or 
>> otherwise, we just need to be able to replace an internal function call. One 
>> example I can think of recently is where I had to replace `header()` with a 
>> void function in tests, just to stop some legacy code emitting headers 
>> before the main framework kicked in, then unable to emit its own response 
>> because HTTP headers had already been sent. In a perfect world it shouldn't 
>> be necessary, but sometimes it is, so I think for this proposal to be 
>> palpable there must still be a way to achieve this.
> 
> Just a tangent thought to the above, but I've always been a little concerned 
> with the idea that a malicious composer package could potentially do nasty 
> things because PHP looks at the local namespace first for functions. For 
> example, if a composer package focused on Laravel that defines malicious 
> versions of internal functions for common namespaces like `App\Models` , 
> `App\Http\Controllers` , etc. it could do some nasty stuff -- and 
> supply-chain attacks aren't exactly uncommon.  Even worse is Wordpress or any 
> other PHP-based software package that allows arbitrary plugins to be 
> installed by non-technical users who really would have no idea if the package 
> was safe even if they were looking at the code.
> 
>  // something.php
> namespace App\Models;
> 
> function password_hash(string $password, string|int|null $algo, array 
> $options = []): string
> {
>print("Hello");
>return $password;
> }
> 
>  // my code
> namespace App\Models;
> 
> include "something.php";
> 
> password_hash('foobar', PASSWORD_DEFAULT);

If this is an attack vector for your application, then fully qualified names is 
the way to go (WordPress does this nearly everywhere, for example).

> 
> I don't recall why local namespace first won, but IMO it wasn't a great call 
> out the gate for that reason alone.  Yes, you can always use `\password_hash` 
>  instead of `password_hash` , but making the default insecure and slower is 
> silly IMO -- and not fixing it because of BC seems like the weaker argument 
> here.
> 
> John

It's not (at least for me) the BC break. It's being able to override global 
functions. There are legitimate use-cases outside of testing. For example, 
consider when a global function signature changes. In your library, you have to 
check the php version. You can change this 100 times for every single call, or 
you can just wrap it in a function that supports the old signature and proxies 
it to the new signature. In other words, it provides options that may be better 
than the alternative.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-20 Thread Rob Landers
On Mon, Aug 19, 2024, at 20:28, Mel Dafert wrote:
> On August 15, 2024 5:22:51 PM GMT+02:00, Rob Landers  
> wrote:
> >Hello internals,
> >
> >I've decided to attempt an RFC for function autoloading. After reading 
> >hundreds of ancient (and recent) emails relating to the topic along with 
> >several abandoned RFCs from the past, and after much review, I've decided to 
> >put forth a variation of a previous RFC, as it seemed the least ambitious 
> >and the most likely to work:
> >
> >https://wiki.php.net/rfc/function_autoloading4
> >
> >Please let me know what you think. An implementation should be along before 
> >opening it for a vote (now that I realize how important that is).
> >
> >— Rob
> 
> Hello,
> 
> I've also noted that on another RFC, but I think it applies here too: 
> Any new constants that are shaped like an enum and used like an enum should 
> just be
> an actual enum. I think instead of `SPL_AUTOLOAD_CLASS` and 
> `SPL_AUTOLOAD_FUNCTION`,
> there should be an `enum AutoloadType` (to be bikeshed), and the new 
> parameters
> should be typed as such, instead of `int`.
> 
> Regards,
> Mel
> 

Hey Mel,

I spent a couple of hours trying to do this today. But due to the weird 
dependency hell that SPL lives in right now, I'm not sure it can be an enum (or 
at least, the linker kept messing up my version of PHP to the point where it 
couldn't even run—it was bizarre; I've only seen stuff like that with 
incompatible cgo shenanigans. I will have to try again later when I next rebase 
my branch). 

I've reached out to Gina about their RFC and my RFC joining forces. Gina's RFC 
introduces an entirely new API for autoloading and neatly side-steps the whole 
thing.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-20 Thread Rob Landers


On Wed, Aug 21, 2024, at 01:53, Matthew Weier O'Phinney wrote:
> 
> 
> On Tue, Aug 20, 2024, 4:56 PM Rob Landers  wrote:
>> __
>> 
>> 
>> On Tue, Aug 20, 2024, at 18:07, Rob Landers wrote:
>>> On Tue, Aug 20, 2024, at 08:50, Rowan Tommins [IMSoP] wrote:
>>>> 
>>>> 
>>>> On 20 August 2024 00:21:22 BST, Rob Landers  wrote:
>>>> >
>>>> >I assume you are worried about something like this passing test?
>>>> >
>>>> >--TEST--
>>>> >show called only once
>>>> >--FILE--
>>>> >>>> >
>>>> >namespace test;
>>>> >
>>>> >spl_autoload_register(function($name) {
>>>> >echo "name=$name\n";
>>>> >}, true, false, SPL_AUTOLOAD_FUNCTION);
>>>> >
>>>> >echo strlen('foo');
>>>> >echo strlen('bar');
>>>> >echo strlen('baz');
>>>> >?>
>>>> >--EXPECT--
>>>> >name=test\strlen
>>>> >333
>>>> >
>>>> >In my RFC, I mention it is called exactly once.
>>>> 
>>>> 
>>>> I haven't looked at the PR, only the RFC, and I did not see this very 
>>>> important detail explained anywhere. The only thing I can see is this 
>>>> rather ambiguous sentence:
>>>> 
>>>> >  The function autoloader will not be called again. 
>>>> 
>>>> That could mean not called again for the current call (compared with 
>>>> proposals that call it a second time with the unequalled name); it could 
>>>> mean not called again for the current line of code (based on the current 
>>>> caching behaviour); or never called again for that combination of 
>>>> namespace and name; or possibly, never called again for that combination 
>>>> of namespace, name, and callback function.
>>>> 
>>>> That's not a small detail of the implementation, it's a really fundamental 
>>>> difference from previous proposals. 
>>>> 
>>>> So I would like to repeat my first response to your RFC: that it should 
>>>> sound more time explaining your approach to the multiple lookup problem.
>>>> 
>>>> Regards,
>>>> Rowan Tommins
>>>> [IMSoP]
>>>> 
>>> 
>>> Thanks Rowan,
>>> 
>>> That's a fair critique.
>>> 
>>> I expect some of the wording will be more clear once I write out the 
>>> documentation -- even if it isn't used directly, I tend to write out 
>>> documentation to force myself to reconcile the code with the plan, find 
>>> logic bugs, perform larger scale tests, and create tests to verify 
>>> assertions in the documentation. From there, I'll update the plan or code 
>>> to get everything to match and spend some time on clarity. It's the hardest 
>>> part, IMHO, as it requires diligently ensuring everything is correct. In 
>>> other words, writing the documentation makes it feel like a "real thing" 
>>> and it triggers what small amount of perfectionism I have.
>>> 
>>> — Rob
>> 
>> I have an experimental library that I use for testing these kinds of things. 
>> There are aspects of it that I could work with to make use of function 
>> autoloading. Thus, I did so and benchmarked the performance of unit tests. 
>> The unit testing library makes a ton of "unqualified function calls".
>> 
>> I spent some time working on two autoloaders:
>>  1. A naive autoloader: parses out the file to load, checks if it exists, 
>> and then requires the file.
>>  2. An optimized autoloader: only cares about the namespace it has 
>> registered. All others are an instant return.
> 
> Not to sound flippant, but you do realize that composer does a lot of 
> optimizations just like this, right?
> 
> PSR-4 exists in large part due to a realization that better optimizations 
> were possible when you mapped namespaces to source code directories. And 
> Composer takes it a step further when you have it create an optimized loader, 
> because then it maps classes directly to the files that provide them, 
> preventing I/O up to the point that a require is performed. 
> 
> My point is that this is why folks have been suggesting that this is a solved 
> problem. Globally qualify functions, and you get immediate performance 
> benefits due to removal of the need to look 

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-20 Thread Rob Landers
On Tue, Aug 20, 2024, at 23:56, Faizan Akram Dar wrote:
> 
> 
> On Tue, Aug 20, 2024 at 11:34 PM Ilija Tovilo  wrote:
>> Hi Levi
>> 
>> On Tue, Aug 20, 2024 at 5:14 PM Levi Morrison
>>  wrote:
>> >
>> > I have long been in favor of a larger BC break with better language
>> > consistency. Class lookup and function lookup with respect to
>> > namespaces should be treated the same. The difficulty is getting a
>> > majority of people to vote yes for this. Keep in mind that qualifying
>> > every global function is annoying but probably can be somewhat
>> > automated, and will bring better performance. So again, this improves
>> > the existing code even without upgrading.
>> >
>> > Yes, there would be complaints about it. Yes, there are probably some
>> > people or projects who wouldn't upgrade. I don't particularly care, as
>> > there are increasingly more operating systems and companies providing
>> > LTS support for long periods of time. Probably Zend.com will offer LTS
>> > support for the last PHP 8.X release, and possibly there will be some
>> > distro which also has it. I believe it's the right thing to do
>> > because:
>> >
>> >  1. It's faster.
>> >  2. It enables function autoloading in a similar manner to class 
>> > autoloading.
>> >  3. It's more consistent, and simpler to teach and maintain.
>> >
>> > It's rare that you get all of these together, often you have to make
>> > tradeoffs within them.
>> 
>> The approach I originally proposed also solves 1. and 2. (mostly) with
>> very little backwards incompatibility. Consistency is absolutely
>> something to strive for, but not at the cost of breaking most PHP
>> code.
>> 
>> To clarify on 2.: The main issue with function autoloading today is
>> that the engine needs to trigger the autoloader for every unqualified
>> call to global functions, given that the autoloader might declare the
>> function in local scope. As most unqualified calls are global calls,
>> this adds a huge amount of overhead.
>> 
>> Gina solved this in part by aliasing the local function to the global
>> one after the first lookup. However, that still means that the
>> autoloader will trigger for every new namespace the function is called
>> in, and will also pollute the function table.
>> 
>> Reversing the lookup order once again avoids local lookup when calling
>> global functions in local scope, which also means dodging the
>> autoloader. The caveat is that calling local functions in local scope
>> triggers the autoloader on first encounter, but at least it can be
>> marked as undeclared in the symbol table once, instead of in every
>> namespace, which also means triggering the autoloader only once.
>> 
>> Ilija
> 
> Hi,
> 
> I completely agree with Levi's perspective, aligning class and function 
> lookup with respect 
> to namespaces seems a very sensible option.
> It will improve consistency and pave the road for autoloading functions 
> without quirks.
> 
> The impact of fixing functions look up is overstated. For instance, 
> PHP-CS-Fixer can add
>  "global namespace qualifiers" to all global functions in a matter of 
> minutes, it is not like
>  people have to go through code and change it manually.
> 
> 
> To ease the transition, PHP can ship a small fixer with the next PHP version 
> for changing
>  global function usage (prepending \ or adding use statements) and be done 
> with the 
> inconsistency once and for all.
> 
> 
> Kind regards,
> Faizan
> 
>  

I am currently working on benchmarks specifically related to my function 
autoloading RFC, and I'm (not yet) certain there will be any performance 
impacts related to function autoloading. I may end up eating my hat here, but 
in any case, there is only speculation at this point.

If this change improves performance; that's great. However, I don't think we 
should be changing things just for the sake of performance though (or the 
opposite). It's great to be aware of how things affect performance, but I don't 
think we should make decisions purely based on it; otherwise we will never add 
any new features to PHP.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-20 Thread Rob Landers


On Tue, Aug 20, 2024, at 18:07, Rob Landers wrote:
> On Tue, Aug 20, 2024, at 08:50, Rowan Tommins [IMSoP] wrote:
>> 
>> 
>> On 20 August 2024 00:21:22 BST, Rob Landers  wrote:
>> >
>> >I assume you are worried about something like this passing test?
>> >
>> >--TEST--
>> >show called only once
>> >--FILE--
>> >> >
>> >namespace test;
>> >
>> >spl_autoload_register(function($name) {
>> >echo "name=$name\n";
>> >}, true, false, SPL_AUTOLOAD_FUNCTION);
>> >
>> >echo strlen('foo');
>> >echo strlen('bar');
>> >echo strlen('baz');
>> >?>
>> >--EXPECT--
>> >name=test\strlen
>> >333
>> >
>> >In my RFC, I mention it is called exactly once.
>> 
>> 
>> I haven't looked at the PR, only the RFC, and I did not see this very 
>> important detail explained anywhere. The only thing I can see is this rather 
>> ambiguous sentence:
>> 
>> >  The function autoloader will not be called again. 
>> 
>> That could mean not called again for the current call (compared with 
>> proposals that call it a second time with the unequalled name); it could 
>> mean not called again for the current line of code (based on the current 
>> caching behaviour); or never called again for that combination of namespace 
>> and name; or possibly, never called again for that combination of namespace, 
>> name, and callback function.
>> 
>> That's not a small detail of the implementation, it's a really fundamental 
>> difference from previous proposals. 
>> 
>> So I would like to repeat my first response to your RFC: that it should 
>> sound more time explaining your approach to the multiple lookup problem.
>> 
>> Regards,
>> Rowan Tommins
>> [IMSoP]
>> 
> 
> Thanks Rowan,
> 
> That's a fair critique.
> 
> I expect some of the wording will be more clear once I write out the 
> documentation -- even if it isn't used directly, I tend to write out 
> documentation to force myself to reconcile the code with the plan, find logic 
> bugs, perform larger scale tests, and create tests to verify assertions in 
> the documentation. From there, I'll update the plan or code to get everything 
> to match and spend some time on clarity. It's the hardest part, IMHO, as it 
> requires diligently ensuring everything is correct. In other words, writing 
> the documentation makes it feel like a "real thing" and it triggers what 
> small amount of perfectionism I have.
> 
> — Rob

I have an experimental library that I use for testing these kinds of things. 
There are aspects of it that I could work with to make use of function 
autoloading. Thus, I did so and benchmarked the performance of unit tests. The 
unit testing library makes a ton of "unqualified function calls".

I spent some time working on two autoloaders:
 1. A naive autoloader: parses out the file to load, checks if it exists, and 
then requires the file.
 2. An optimized autoloader: only cares about the namespace it has registered. 
All others are an instant return.
In the "vanilla" case, I was mostly concerned with variation. I wanted a 
statistically significant result, so once I got my system into a stable state 
and I was no longer seeing any variance, I started benchmarking. 

For the naive autoloader, I saw a performance degradation of about 6% and lots 
of variability. This is probably due to the "file_exists" check being done 
every time an unqualified name was called.

However, for the optimized autoloader, I ended up with less variability (🤔) 
than the vanilla approach and absolutely no measurable performance degradation.

Now, time to try this on a larger scale... WordPress. It's pretty much the only 
large codebase I know of that makes use of tons of functions.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-20 Thread Rob Landers
On Tue, Aug 20, 2024, at 08:50, Rowan Tommins [IMSoP] wrote:
> 
> 
> On 20 August 2024 00:21:22 BST, Rob Landers  wrote:
> >
> >I assume you are worried about something like this passing test?
> >
> >--TEST--
> >show called only once
> >--FILE--
> > >
> >namespace test;
> >
> >spl_autoload_register(function($name) {
> >echo "name=$name\n";
> >}, true, false, SPL_AUTOLOAD_FUNCTION);
> >
> >echo strlen('foo');
> >echo strlen('bar');
> >echo strlen('baz');
> >?>
> >--EXPECT--
> >name=test\strlen
> >333
> >
> >In my RFC, I mention it is called exactly once.
> 
> 
> I haven't looked at the PR, only the RFC, and I did not see this very 
> important detail explained anywhere. The only thing I can see is this rather 
> ambiguous sentence:
> 
> >  The function autoloader will not be called again. 
> 
> That could mean not called again for the current call (compared with 
> proposals that call it a second time with the unequalled name); it could mean 
> not called again for the current line of code (based on the current caching 
> behaviour); or never called again for that combination of namespace and name; 
> or possibly, never called again for that combination of namespace, name, and 
> callback function.
> 
> That's not a small detail of the implementation, it's a really fundamental 
> difference from previous proposals. 
> 
> So I would like to repeat my first response to your RFC: that it should sound 
> more time explaining your approach to the multiple lookup problem.
> 
> Regards,
> Rowan Tommins
> [IMSoP]
> 

Thanks Rowan,

That's a fair critique.

I expect some of the wording will be more clear once I write out the 
documentation -- even if it isn't used directly, I tend to write out 
documentation to force myself to reconcile the code with the plan, find logic 
bugs, perform larger scale tests, and create tests to verify assertions in the 
documentation. From there, I'll update the plan or code to get everything to 
match and spend some time on clarity. It's the hardest part, IMHO, as it 
requires diligently ensuring everything is correct. In other words, writing the 
documentation makes it feel like a "real thing" and it triggers what small 
amount of perfectionism I have.

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-20 Thread Rob Landers


On Tue, Aug 20, 2024, at 14:08, Arnaud Le Blanc wrote:
> Hi Rob,
> 
> On Mon, Aug 19, 2024 at 7:51 PM Rob Landers  wrote:
> 
> > > Invariance would make arrays very difficult to adopt, as a library can 
> > > not start type hinting generic arrays without breaking user code, and 
> > > users can not pass generic arrays to libraries until they start using 
> > > generic arrays type declarations.
> >
> > This seems like a strawman argument, to a degree. In other words, it seems 
> > like you could combine static arrays and fluid arrays to accomplish what 
> > you are seeking to do. In other words, use static arrays but allow casting 
> > to treat it as "fluid."
> >
> > In other words, simply cast to get your example to compile:
> >
> > function f(array $a) {}
> > function g(array $a) {}
> >
> > $a = (array) [1]; // array unless cast
> >
> > f($a); // ok
> > g((array)$a); // ok
> >
> > And the other way:
> >
> > function f(array $a) {}
> > function g(array $a) {}
> >
> > $a = [1];
> >
> > f((array)$a); // ok, type check done during cast
> > g($a); // ok
> 
> There is potential for breaking changes in both of your examples:
> 
> If f() is a library function that used to be declared as `f(array
> $a)`, then changing its declaration to `f(array $a)` is a
> breaking change in the Static Arrays flavour, as it would break
> library users until they change their code to add casts.

I don't think we should be scared of breaking changes; php 9.0 is coming 🔜 
anyway. You could also consider it as "an array might be array, but an 
array is always an array"

> 
> Similarly, the following code would break (when calling g()) if h()
> was changed to return an array:
> 
> function h(): array {}
> function g(array $a);
> 
> $a = h();
> g($a);
> 
> Casting would allow users to pass generic arrays to libraries that
> don't support generics yet, but that's expensive as it requires a
> copy.

Why does it require a copy? It should only require a copy if the contents are 
changed (CoW) and at that point, you can know what rules to apply based on the 
coerced/casted type. I'm doing a similar thing for the Literal Strings RFC, 
where it is a type that is also indistinguishable from a string until something 
happens to it and it is no longer a literal string.

So passing a array to a function that only accepts an array shouldn't 
matter. Once inside that function, all type-checking can be disabled for that 
array. One approach to that could be to just smack a "type-check strategy" 
function pointer on zvals, potentially, as that would give the most flexibility 
for casting, aliases, generics, etc. Don't get me started on the current type 
checking; it is a mess and inconsistent depending on what is doing the checking 
(constructor promoted props, properties, method args, function args). Then you 
can just copy the zval, change a function pointer, but point it to the same 
array (which will CoW) and change the strategy during casting.

In other words, you could cheaply cast an array to array by 
(essentially) changing a couple of function pointers, but array to 
array would be expensive. So I imagine there would strategies for changing 
strategies... probably. I don't know, I literally just thought of this off the 
top of my head, so it probably needs more work.

> 
> Best Regards,
> Arnaud
> 


— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-20 Thread Rob Landers


On Tue, Aug 20, 2024, at 10:41, Nick Lockheart wrote:
> 
> > We would upgrade that to a warning in PHP 9.2, and it would end up
> > being an error on PHP 10 and have a BC break.
> > 
> > I don't think adding a \ to each function call is ugly, that's what
> > we have for classes, and it works fine; or an use statement.
> > 
> > So, why do we think that after people get used to it, they would
> > still consider it ugly? Never heard the "ugliness" mentioned for
> > classes.
> 
> 
> Respectfully, I think `\` is ugly for both functions and classes.
> 
> 
> > Now, I know this would be a big BC break, but it brings consistency
> > to the language and forces everyone to improve their code
> > performance.
> 
> There should be a directive for this, like:
> 
> namespace foo using global functions;
> 
> ...which automatically acts as if all functions have a \ in front of
> them, unless they are fully qualified.
> 

Respectfully, I feel like this gets into the heart of a problem with RFCs, 
where if someone wants to implement something, they have to solve everyone’s 
problems.

In this case, there is a problem with performance issues due to multiple 
lookups (though, I’m not convinced fully), so if someone wants to implement 
function autoloading, they also have to solve this problem (Gina and I have 
both independently solved it in various ways).

Personally, I’m of the opinion that if you want performance, you know what to 
do: fully qualify your names. If you don’t care (which is what I gather from 
the first email in this thread where maintainers were not willing to change 
their code), then “deal with it.”

The vast majority of performance issues won’t be caused by function lookups, 
but by databases and poorly written code. Maybe I am wrong, but I rather like 
what we currently have, whatever benchmarks have to say on the matter. 

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-20 Thread Rob Landers


On Tue, Aug 20, 2024, at 03:53, Bob Weinand wrote:
> On 20.8.2024 03:31:05, Larry Garfield wrote:
>> On Mon, Aug 19, 2024, at 5:16 PM, Bob Weinand wrote:
>> 
>> 
>>> Regarding the Collections PR, I personally really don't like it:
>>> 
>>>  • It implements something which would be trivial if we had reified 
>>> generics. If this ever gets merged, and generics happen later, it would 
>>> be probably outdated and quirkiness the language has to carry around.
>>>  • It's not powerful. But rather a quite limited implementation. No 
>>> overrides of the built-in methods possible. No custom operations ("I 
>>> want a dict where a specific property on the key is the actual unique 
>>> key", "I want a custom callback be executed for each modification"). 
>>> It's okay as a PoC, but far from a complete enough implementation.
>>> 
>> I think we weren't that clear on that section, then.  The intent is that 
>> dedicated collection classes are, well, classes.  They can contain 
>> additional methods, and probably can override the parent methods; though the 
>> latter may have some trickiness if trying to access the internal data 
>> structure, which may or may not look array-ish.  (That's why it's just a PoC 
>> and we're asking for feedback if it's worth trying to investigate further.)
> I assumed so, as said "okay as a PoC" :-)
>>>  • It's a very specialized structure/syntax, not extensible for 
>>> userland at all. Some functionality like generic traits, where you'd 
>>> actually monomorphize the contained methods would be much more 
>>> flexible. E.g. class Articles { use Sequence; }. Much less 
>>> specialized syntax, much more extensible. And generic traits would be 
>>> doable, regardless of the rest of the generics investigation.
>>> In fact, generic traits (essentially statically replacing the generic 
>>> arguments at link-time) would be an useful feature which would remain 
>>> useful even if we had fully reified generics.
>>> I recognize that some functionality will need support of internal 
>>> zend_object_handlers. But that's not a blocker, we might provide some 
>>> default internal traits with PHP, enabling the internal class handlers.
>>> So to summarize, I would not continue on that path, but really invest 
>>> into monomorphizable generic traits instead.
>>> 
>> Interesting.  I have no idea why Arnaud has mainly been investigating 
>> reified generics rather than monomorphized, but a monomorphized trait has 
>> potential, I suppose.  That naturally leads to the question of whether 
>> monomorphized interfaces would be possible, and I have no idea there.  (I 
>> still hold out hope that Levi will take another swing at 
>> interface-default-methods.)
>> 
>> Though this still wouldn't be a path to full generics, as you couldn't 
>> declare the inner type of an object at creation time, only code time.  
>> Still, it sounds like an area worth considering.
>> 
>> --Larry Garfield
> Nikita did the investigation into monomorphized generics a long time ago 
> (https://github.com/PHPGenerics/php-generics-rfc/issues/44). So it was mostly 
> concluded that reified generics would be the way to go. The primary issue 
> Arnauld is currently investigating, is propagation of generic information via 
> runtime behaviour, inference etc.
> 
> It would be solving large amounts of problems if you'd have to fully specify 
> the specific instance of a generic every time you instantiate one. But PHP is 
> at heart a dynamic language where typing is generally opt-in (also when 
> constructing new objects of generic classes for example). And we want to 
> avoid "new List, WeakReference>>()"-style 
> nesting where not necessary.
> 

I generally follow the philosophy:
 1. get it working
 2. get it working well
 3. get it working fast
And inference seems like a type (2) task. In other words, I think people would 
be fine with generics, even if they had to type it out every single time. At 
least for a start. From there, you'd have multiple people able to tackle the 
inference part, proposing RFCs to make it happen, etc. vs. now where basically 
only one person on the planet can attempt to tackle a very complex problem that 
doesn't exist yet. That isn't to say it isn't useful research, because you want 
to write things in such a way that you can implement inference when you get to 
(2), but an actual implementation shouldn't be sought out yet, just 
understanding the problem and solution space is likely enough to do (1) while 
taking into account (2) -- such as choosing algorithms, op-codes, data 
structures, etc.

For a feature like this, perfect is very much the enemy of good.

> "Monomorphization of interfaces" does not really make a lot of sense as a 
> concept. Ultimately in an interface, all you do is providing information for 
> classes to type check against, which happens at link time, once. (Unless you 
> mean interface-default-methods, but that would just be an implicitly 
> implemented trait implementation wise, real

Re: [PHP-DEV] State of Generics and Collections

2024-08-19 Thread Rob Landers
On Mon, Aug 19, 2024, at 19:08, Derick Rethans wrote:
> Hi!
> 
> Arnaud, Larry, and I have been working on an article describing the 
> state of generics and collections, and related "experiments".
> 
> You can find this article on the PHP Foundation's Blog:
> https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
> 
> cheers,
> Derick
> 

As an experiment, awhile ago, I went a different route for reified generics by 
'hacking' type aliases (which I was also experimenting with). Such that a 
generic becomes compiled into a concrete implementation with a dangling type 
alias:

class Box {
  function __construct(T $thing) {}
}

is essentially compiled to

class Box {
  use alias __Box_T => ???;

  function __construct(__Box_T $thing) {}
}

This just gets a T type alias (empty-ish, with a mangled name) that gets filled 
in during runtime (every instance gets its own type alias table, and uses that 
along with the file alias table). There shouldn't be any performance impact 
this way (or at least, as bad as using type aliases, in general; which is also 
an oft-requested feature).

Thus, when you create a new Box it just fills in that type alias for T as 
int. Nesting still works too Box> is just an int type alias on the 
inner Box and the outer Box alias is just Box. Type-checking basically works 
just like it does today (IIRC, Box literally got stored as "Box" for 
fast checking), and reflection just looks up the type aliases and unmangles 
them -- though I know for certain I never finished reflection and got bogged 
down in GC shenanigans.

There were probably some serious cons in that approach, but I ran out of free 
time to investigate. If you are doing experiments, it is probably worth looking 
into.

FYI though, people seemed really turned off by file-level type aliases (at 
least exposed to user-land, so I never actually pursued it).

— Rob







Re: [PHP-DEV] function autoloading v4 RFC

2024-08-19 Thread Rob Landers


On Mon, Aug 19, 2024, at 23:17, Rowan Tommins [IMSoP] wrote:
> On 19/08/2024 17:23, Rob Landers wrote:
> 
> > As far as performance for ambiguous functions go, I was thinking of 
> > submitting an RFC, where ambiguous function calls are tagged during 
> > compilation and always resolve lexically, sorta like how it works now:
> >
> > echo strlen($x); // resolves to global, always
> > require_once "my-strlen";
> > echo strlen($x); // resolves to my strlen, always
> >
> > This works by basically rewriting the function name once resolved and 
> > may make the code more predictable. If I can pull it off, it can be 
> > relegated to a technical change that doesn’t need an RFC. Still 
> > working on it.
> 
> 
> I'm not entirely clear what you're suggesting, but I think it might be 
> either what already happens, or the same as Gina is proposing.

Yeah, that would be a separate RFC, so I didn't go into the weeds, but my 
point, is that it would result in no change. But here we go :D

> Consider this code [https://3v4l.org/184k3]:
> 
> namespace Foo;
> 
> foreach ( [1,2,3] as $i ) {
>  echo strlen('hello'), ' ';
>  shadow_strlen();
>  echo strlen('hello'), '; ';
> }
> 
> function shadow_strlen() {
> if ( ! function_exists('Foo\\strlen') ) {
> function strlen($s) {
> return 42;
> }
> }
> }
> 
> In PHP 5.3, this outputs '5 42; 42 42; 42 42;'  That's fairly 
> straight-forward: each time the function is called, the engine checks if 
> "Foo\strlen" is defined.
> 
> Since PHP 5.4, it outputs '5 42; 5 42; 5 42;'   The engine caches the 
> result of the lookup against each compiled opcode, so the first strlen() 
> is cached as a call to \strlen() and the second as a call to \Foo\strlen().
> 
> As I understand it, Gina is proposing that it would instead output '5 5; 
> 5 5; 5 5;' - the function would be "pinned" by making "\Foo\strlen" an 
> alias to "\strlen" for the rest of the program, and the function_exists 
> call would immediately return true.
> 
> Neither, as far as I can see, can happen at compile time, because the 
> compiler doesn't know if and when a definition of \Foo\strlen() will be 
> encountered.

To be fair, this isn't even really completely figured out yet. I was mostly 
wanting to point out that it maybe could be a totally separate issue. But, the 
gist is that at compile time, we can mark a function as "ambiguous," meaning we 
don't really know if the function exists (because it isn't fully qualified). 
The main issue with Gina's implementation (if I am understanding it properly, 
and I potentially am not, so take this with a grain of salt) is that this 
(https://3v4l.org/0jCpW) could fail if it were pinned, where before, it would 
not.

In my idea, a function becomes pinned until it isn't, with strict rules that 
kick it out of being pinned and I think those rules are complex enough to 
warrant being a completely different RFC or PR.

> 
> 
> > In other words, maybe pinning could be solved more generally in a 
> > future RFC, decrease your RFC’s scope and chance for sharp edge cases.
> 
> If anything, I think it would need to be the other way around: change 
> the name resolution logic, so that an autoloading proposal was more 
> palatable, because it didn't require running the autoloader an unbounded 
> number of times for the same name.
> 
> Proposing both at once seems reasonable, as the autoloading gives an 
> extra benefit to outweigh the breaking change to shadowing behaviour.
> 
> Regards,

I assume you are worried about something like this passing test?

--TEST--
show called only once
--FILE--

--EXPECT--
name=test\strlen
333

In my RFC, I mention it is called exactly once. I could maybe add it as a test 
in the PR. I've committed it as another test on the RFC implementation.

> 
> -- 
> Rowan Tommins
> [IMSoP]
> 

— Rob

Re: [PHP-DEV] State of Generics and Collections

2024-08-19 Thread Rob Landers
On Mon, Aug 19, 2024, at 19:08, Derick Rethans wrote:
> Hi!
> 
> Arnaud, Larry, and I have been working on an article describing the 
> state of generics and collections, and related "experiments".
> 
> You can find this article on the PHP Foundation's Blog:
> https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
> 
> cheers,
> Derick
> 

Nice! It is awesome to see some movement here. Just one thing:

> Invariance would make arrays very difficult to adopt, as a library can not 
> start type hinting generic arrays without breaking user code, and users can 
> not pass generic arrays to libraries until they start using generic arrays 
> type declarations.

This seems like a strawman argument, to a degree. In other words, it seems like 
you could combine static arrays and fluid arrays to accomplish what you are 
seeking to do. In other words, use static arrays but allow casting to treat it 
as "fluid."

In other words, simply cast to get your example to compile:

function f(array $a) {}
function g(array $a) {}

$a = (array) [1]; // array unless cast

f($a); // ok
g((array)$a); // ok

And the other way:

function f(array $a) {}
function g(array $a) {}

$a = [1];

f((array)$a); // ok, type check done during cast
g($a); // ok

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-19 Thread Rob Landers
On Mon, Aug 19, 2024, at 14:03, Gina P. Banyard wrote:
> On Thursday, 15 August 2024 at 17:22, Rob Landers  wrote:
>> Hello internals,
>> 
>> I've decided to attempt an RFC for function autoloading. After reading 
>> hundreds of ancient (and recent) emails relating to the topic along with 
>> several abandoned RFCs from the past, and after much review, I've decided to 
>> put forth a variation of a previous RFC, as it seemed the least ambitious 
>> and the most likely to work:
>> 
>> https://wiki.php.net/rfc/function_autoloading4
>> 
>> Please let me know what you think. An implementation should be along before 
>> opening it for a vote (now that I realize how important that is).
> 
> I had a quick glance at the RFC, and I really don't think this is a good 
> approach or good API.
> Having autoloading in SPL causes dumb issues, for example ext/session must 
> register it has a dependency on ext/spl just for the autoloader. [1]
> This means that currently ext/session depends on ext/spl, ext/spl depends on 
> ext/standard, and ext/standard depends on ext/session, a glorious circular 
> dependency.
> 
> This main problem has pushed me again to rebase my PR [2] for the Core 
> Autoloading RFC. [3]
> The wording of the RFC hasn't been updated, as this is frankly my least 
> favourite part of any RFC.
> The PR passes CI and has also produced benchmark results from CI. [4]
> 
> I would rather have some collaboration on a proper solution than, IMHO, this 
> suboptimal solution.
> I still need to get opinions from other people if it makes sense to remove 
> the zend_autoload_class function pointer API and have the VM directly call 
> zend_autoload.
> Because from what I remember 2 years ago, some profiling tools hook into it 
> to track autoloading time.
> This might be improved by introducing new observer hooks.
> 
> 
> Best regards,
> 
> Gina P. Banyard
>> 
> 
> [1] https://github.com/php/php-src/pull/14544#issuecomment-2294907817
> [2] https://github.com/php/php-src/pull/8294
> [3] https://wiki.php.net/rfc/core-autoloading
> [4] https://github.com/php/php-src/actions/runs/10441267948
> 

Hey Gina,

I’d love to collaborate on this feature. For what it’s worth though, I did a 
ton of research on it (mostly reading every discussion I could find on the 
topic, and prior RFCs) and I felt that this API was the most likely to be 
accepted. You even have a comment on your PR asking why not an API similar to 
this one (!) though your reasoning is sound for why it is a bad idea, and I 
believe it is the superior API.

There are some key differences between our two RFC’s that I think are worth 
discussing (besides the obvious API differences):
 1. in your RFC, it calls the autoloader with the global function it isn’t 
found in both scopes. In mine, it calls it once not found in the local scope 
and calls the autoloader once (for the local scope). This seemed to be a highly 
liked proposal in one of the older discussions (2013-ish?) that appeared to not 
result in a new RFC, as it would bypass a lot of perceived performance issues 
in earlier RFCs. If an autoloader so desires, the autoloader can check the 
global scope (by getting the “base name” of the function), but autoloading the 
global scope should be a niche application these days.
 2. I like the “pinning” aspect. I haven’t seen your code yet, but I suspect it 
just registers the global function in the current namespace? If so, does this 
affect the __namespace__ global? Probably not, but I am just curious. What 
happens if I manually require a file with a pinned function in it?
 3. Should we update function_exists like I did in mine to include an autoload 
argument?
 4. You mention no default autoloader for classes and functions, while I agree 
that this should be the case, will the spl library still provide the default 
class autoloader that can be registered?
As far as performance for ambiguous functions go, I was thinking of submitting 
an RFC, where ambiguous function calls are tagged during compilation and always 
resolve lexically, sorta like how it works now:

echo strlen($x); // resolves to global, always
require_once "my-strlen";
echo strlen($x); // resolves to my strlen, always

This works by basically rewriting the function name once resolved and may make 
the code more predictable. If I can pull it off, it can be relegated to a 
technical change that doesn’t need an RFC. Still working on it. 

In other words, maybe pinning could be solved more generally in a future RFC, 
decrease your RFC’s scope and chance for sharp edge cases.

In any case, if you are up for a high bandwidth conversation to collaborate, we 
can join a call, collaborate in the open, or whatever you feel most comfortable 
with. I’m very excited to see this feature.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-18 Thread Rob Landers
On Sun, Aug 18, 2024, at 11:36, Stephen Reay wrote:
> 
> Hi Rob,
> 
>> On 18 Aug 2024, at 04:33, Rob Landers  wrote:
>> 
>> I wouldn't consider it a BC break, no. But (ironically?), Symfony crashes 
>> with this change.
> 
> 
> I wasn't aware of that specific code before but it's exactly the type of 
> issue I was talking about earlier.
> 
>> Ah, good catch. I've updated this and gone through other relevant functions.
> The RFC now says "The spl_autoload function will not be updated.", but that 
> will *also* break if it isn't updated to at least *account for*, even if it 
> doesn't *use* the second argument given.
> 
> However I'm also curious why you would *specifically* make it *not* support 
> function loading?
> The current implementation should work unmodified, once the signature is 
> changed to accept an int as the second parameter (and move the current 2nd 
> parameter to 3rd),  There is nothing "class specific" in the existing 
> implementation except for a couple of variable names.
> 
> 
> I would imagine you also need to update spl_autoload_unregister[1] - it also 
> needs to be able to identify the type of autoloader it's operating on 
> (because the same autoloader might be defined for both types).
> 
> And lastly I think you'd need to adapt spl_autoload_functions[2] too - 
> perhaps the same as the others, introduce a parameter `int $type = 
> SPL_AUTOLOAD_CLASS`, so existing code works as-is, otherwise it'd be 
> impossible to know how a given autoloader was registered.
> 
> 
> 1: https://www.php.net/manual/en/function.spl-autoload-unregister.php
> 2: https://www.php.net/manual/en/function.spl-autoload-functions.php
> 
> 
> Cheers
> 
> Stephen 

Hello again internals,

The implementation has now reached a stable point and things discovered during 
the implementation have been reflected in the RFC text itself.

I did my best to assess the impact to Opcache, but I suspect someone much more 
familiar with how opcache works will need to take a look. From my understanding 
of the changes needed in zend_execute, it looks like there are some changes 
needed to the JIT helpers, but there may be more changes required that aren't 
so obvious. I've added a helper that can be used as a drop-in replacement for 
looking up functions from the function table directly, which should help.

Lastly, I do plan to open a docs PR in the coming week that will probably 
trigger some smaller updates to the RFC; mostly to smooth out the wording and 
make it more friendly for non-technical people, but the technical aspects 
shouldn't change (barring the discussion here, of course).

> The RFC now says "The spl_autoload function will not be updated.", but that 
> will *also* break if it isn't updated to at least *account for*, even if it 
> doesn't *use* the second argument given.

I've updated the text to reflect exactly that, it did require updating. ;)

> However I'm also curious why you would *specifically* make it *not* support 
> function loading?
> The current implementation should work unmodified, once the signature is 
> changed to accept an int as the second parameter (and move the current 2nd 
> parameter to 3rd),  There is nothing "class specific" in the existing 
> implementation except for a couple of variable names.

I mostly decided not to support it to avoid the inevitable bikeshedding of how 
a "default function autoloader" would work. I really want to push that to a 
separate RFC, unless there was a general consensus of an implementation. If we 
are fine reusing the existing default class autoloading, then I am fine with 
that.

> And lastly I think you'd need to adapt spl_autoload_functions[2] too - 
> perhaps the same as the others, introduce a parameter `int $type = 
> SPL_AUTOLOAD_CLASS`, so existing code works as-is, otherwise it'd be 
> impossible to know how a given autoloader was registered.

I've also added those functions as well.

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-17 Thread Rob Landers
On Sun, Aug 18, 2024, at 00:40, Rowan Tommins [IMSoP] wrote:
> 
> 
> On 17 August 2024 22:33:03 BST, Rob Landers  wrote:
> >I wouldn't consider it a BC break, no. But (ironically?), Symfony crashes 
> >with this change. It really shouldn't but ...
> 
> I don't think it makes sense to say "it breaks existing code, but it's not a 
> compatibility break".
> 
> Perhaps what you're saying is "it's only a BC break for code that's not 
> following best practices"?
> 
> But more relevant than whether you think the current code is "correct" is the 
> fact that a) it will need to be changed to work with your proposal; and b) 
> the change is simple and can be done in advance. 
> 
> So the RFC should acknowledge this BC break, but could argue that it's small 
> enough to include in a minor version. This is actually really common - RFCs 
> that introduce a new global function often acknowledge that it would break 
> existing userland functions with that name. Between that and obviously 
> serious BC breaks like *removing* a function, there's a big grey area where 
> we have to make a judgement call.
> 
> Regards,
> Rowan Tommins
> [IMSoP]
> 

Hey Rowan,

Ah, that's a good tip and point. Thank you. I'll update the RFC!

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-17 Thread Rob Landers
On Fri, Aug 16, 2024, at 02:22, Stephen Reay wrote:
> 
> Sent from my iPhone
> 
> > On 16 Aug 2024, at 04:45, Rob Landers  wrote:
> > 
> > Userland functions don't throw that error, so it shouldn't be an issue. 
> > (You can pass as many arguments as you want to a userland function, as long 
> > as there are enough of them).
> 
> Hi Rob,
> 
> I didn't mean the actual internal error about too many args I meant the 
> concept - it's possible a userland autoloader callback has some signature 
> other than a single string parameter.
> 
> I don't know if that happens in the wild or if that's considered a BC or not 
> but there is a hypothetical case there where old code would stop working.
> 
> Cheers
> 
> Stephen 
> 

I wouldn't consider it a BC break, no. But (ironically?), Symfony crashes with 
this change. It really shouldn't but if you take a look at 
https://github.com/symfony/symfony/blob/7.2/src/Symfony/Component/Config/Resource/ClassExistenceResource.php#L70
 and it's definition in 
https://github.com/symfony/symfony/blob/7.2/src/Symfony/Component/Config/Resource/ClassExistenceResource.php#L142,
 you'll see that they define a function that accepts two parameters. It crashes 
when it receives a second parameter of `int` instead of it's expected exception.

I've opened a drive-by PR as an attempt to fix it 
(https://github.com/symfony/symfony/pull/58030), but I suspect there will be 
other projects as well that do something silly like that. They'll just have to 
fix it.

— Rob

Re: [PHP-DEV] [RFC] Re: Decoding HTML and the Ambiguous Ampersand

2024-08-16 Thread Rob Landers
Hey Dennis,

This looks like top posting because you’ve got a lot to read — and well written 
— but I want to reply to some points inline. 

On Fri, Aug 16, 2024, at 20:43, Dennis Snell wrote:
> >On Fri, Aug 16, 2024, at 02:59, Dennis Snell wrote
> 
> Thanks for the question, Rob, I hope this finds you well!
> 
> >The RFC mentions that encoding must be utf-8. How are programmers supposed 
> >to work with this if the php file itself isn’t utf-8
> 
> From my experience it’s the opposite case that is more important to consider. 
> That is, what happens when we mix UTF-8 source code with latin1 or UTF-8 
> source HTML with the system-set locale. I tried to hint at this scenario in 
> the "Character encodings and UTF-8” section.
> 
> Let’s examine the fundamental breakdown case:
> 
> ```php
> “é” === decode_html( “é” );
> ```
> 
> If the source is UTF-8 there’s no problem. If the source is ISO-8859-1 this 
> will fail because xE9 is on the left while xC3 xA9 is on the right. _Except_ 
> if `zend.multibyte=1` and (`zend.script_encoding=iso-8859-1` _or_ if 
> `declare(encoding=‘iso-8859-1’)` is set). The source code may or may not be 
> converted into a different encoding based on configurations that most 
> developers won’t have access to, or won’t examine.
> 
> Even with source code in ISO-8859-1, the `zend.script_encoding` and 
> `zend.multibyte` set, `html_entity_decode()` _still_ reports UTF-8 unless 
> `zend.default_charset` is set _or_ one of the `iconv` or `mbstring` internal 
> charsets is set.

I just want to pause here and say, “holy crap.” That is quite complex and those 
edges seem sharp!

> 
> My point I’m trying to make is that the current situation today is a 
> minefield due to a dizzying array of system-dependent settings. Most modern 
> code will either be running UTF-8 source code or will be converting source 
> code _to_ UTF-8 or many other things will already be helplessly broken beyond 
> this one issue.

Unfortunately, we don’t always get to choose the code we work on. There is 
someone on this list using SHIFT_JIS. They probably know more about the ins and 
outs of dealing with utf-8 centric systems from that encoding. Hopefully they 
can comment more about why this would or would not be a bad idea. 

> 
> UTF-8 is the unifier that lets us escape this by having a defined and 
> explicit encoding at the input and output.

Utf-8 is pretty good, right now, but I don’t think we should marry the language 
to it. Will it be “the standard” in 10 years, 20 years, 100 years? Languages 
change, cultures change. Some people I know use a font to change triple equals 
from a literal === to ≡. How long until php recognizes that as a literal 
operator?

But anyway, to get back on topic; I, personally, would rather see something 
more flexible, with sane defaults for utf-8.

> 
> > or the input is meaningless in utf-8 or if changing it to utf-8 and back 
> > would result in invalid text?
> 
> There shouldn't be input that’s meaningless in UTF-8 if it’s valid in any 
> other encoding. Indeed, I have placed the burden on the calling code to 
> convert into UTF-8 beforehand, but that’s not altogether different than 
> asking someone to declare into what encoding the character references ought 
> to be decoded.

There’s a huge performance difference between converting a string from/to 
different encodings and instructing a function what to parse in the current 
encoding and also be useful when the page itself is not utf8. 

> 
> ```diff
> -html_entity_decode( $html, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5, 
> ‘ISO-8859-1’ );
> +$html = mb_convert_encoding( $html, ‘UTF-8’, ‘ISO-8859-1’ );
> +$html = decode_html( HTML_TEXT, $html );
> +$html = mb_convert_encoding( $html, ‘ISO-8859-1’, ‘UTF-8’ );
> ```
> 
> If an encoding can go into UTF-8 (which it should) then it should also be 
> able to return for all supported inputs. That is, we cannot convert into 
> UTF-8 and produce a character that is unrepresentable in the source encoding, 
> because that would imply it was there in the source to begin with. 
> Furthermore, if the HTML decodes into a code point unsupported in the 
> destination encoding, it would be invalid either directly via decoding, or 
> indirectly via conversion.
> 
> ```diff
> -“\x1A” === html_entity_decode( “🅰”, ENT_QUOTES | ENT_SUBSTITUTE | 
> ENT_HTML5, ‘ISO-8859-1’ );
> +”?” === mb_convert_encoding( decode_html( HTML_TEXT, “🅰” ), 
> ‘ISO-8859-1’, ‘UTF-8’ );
> ```
> 
> This gets really confusing because neither of these outputs is a proper 
> decoding, as character encodings that don’t support the full Unicode code 
> space cannot adequately represent all valid HTML inputs. HTML is a Unicode 
> decoding by specification, so even in a browser with ` charset=“ISO-8859-1”>🅰` the text content will still be `🅰`, not `?` 
> or the invisible ASCII control code SUB.

I was of the understanding that meta charset was too late to set the encoding 
(but it’s been awhile since I’ve read the html5 spec) 

Re: [PHP-DEV] String enums & __toString()

2024-08-16 Thread Rob Landers


On Fri, Aug 16, 2024, at 21:57, John Coggeshall wrote:
>> Didja really?
>> 
>> https://wiki.php.net/rfc/auto-implement_stringable_for_string_backed_enums
> I swear I did. 
> 
> That said, looking at that RFC it's a slightly different take than what I am 
> suggesting. This RFC suggests that `string`  enums automatically implement 
> `Stringable` . I am pointing out that it's a little silly IMO that we don't 
> allow `__toString()`  at all and let the developer choose if they want to 
> implement `Stringable` . Looking through the PR comments it's pretty clear 
> there was a decent amount of support for this alternative idea. It also looks 
> like the PR for this died on the vine and I'm curious is there is any 
> interest in reviving the discussion?
> 
> I'm not seeing an obvious upside to forbidding straight out `__toString()` , 
> especially (as is pointed out in the PR comments) that you can use other 
> native interfaces like `JsonSerializable`  without issue.

Hello,

I personally find it an interesting design choice. I agree with it, and 
disagree with it at the same time. Like, anytime I find myself wanting to reach 
for enums as strings, I realize I want a constant instead. But then someone 
sees a string type and manually types it instead of using the constant (heh, 
I’ve been guilty of that myself a few times).

That being said, I would like to be able to use | and & on integer enums more 
than I would strings as stringables. Something like “flags” mode in C#. Maybe 
even make “flag” a backing type of enums. It would make a ton of json flags 
much simpler to reason about (and filters/sanitizers).



— Rob

Re: [PHP-DEV] Time-out Posting to PHP Internals List

2024-08-16 Thread Rob Landers


On Fri, Aug 16, 2024, at 22:16, Kris Craig wrote:
> 
> 
> On Fri, Aug 16, 2024 at 8:11 AM Derick Rethans  wrote:
>> Hi,
>> 
>> In the last months, you have had numerous suggestions to keep it civil 
>> on the mailing list, in the "C++ Enhancements" thread [1,2,3,4,5,6], and 
>> multiple other threads [7,8,9], and by many individuals. Despite that, 
>> you did continue with ad hominems and related belligerent language, 
>> bordering on trolling.
>> 
>> You have therefore been given a 3-month timeout, which will be enforced 
>> by blocking you from sending email to the php.net domain, including 
>> subdomains.
>> 
>> Evading, or attempts to evade, this restriction will result in longer 
>> expulsions.
>> 
>> with kind regards,
>> Derick Rethans
>> 
>> [1] https://externals.io/message/124879#124905
>> [2] https://externals.io/message/124879#124931
>> [3] https://externals.io/message/124879#124936
>> [4] https://externals.io/message/124879#124948
>> [5] https://externals.io/message/124879#124950
>> [6] https://externals.io/message/124879#124951
>> [7] https://externals.io/message/123573#123605
>> [8] https://externals.io/message/124139#124168
>> [9] https://externals.io/message/123769#123888
> 
> This message kinda freaked me out until I realized it wasn't directed at me 
> lol.  Couldn't this have just been sent to the individual privately instead?  
> I really don't need to see a notification whenever somebody gets themselves 
> temp-banned from the list, just saying.
> 
> --Kris
>  

I saw the first few lines on my phone and had a similar wtaf reaction. Adding 
the person’s name in the salutation would help instead of a general “hello”. 
But otherwise, I think it is healthy for it to be public. Voices silently 
silenced, even for a good reason, are never a good thing. Also, it gives people 
a chance to advocate for the banned or object to the banning, if they so 
choose. 

— Rob

Re: [PHP-DEV] Re: Decoding HTML and the Ambiguous Ampersand

2024-08-16 Thread Rob Landers


On Fri, Aug 16, 2024, at 02:59, Dennis Snell wrote:
> 
>> On Jul 9, 2024, at 4:55 PM, Dennis Snell  wrote:
>> 
>> Greetings all,
>> 
>> The `html_entity_decode( … ENT_HTML5 … )` function has a number of issues 
>> that I’d like to correct.
>> 
>>  - It’s missing 720 of HTML5’s specified named character references.
>>  - 106 of these are named character references which do not require a 
>> trailing semicolon, such as `´`
>>  - It’s unaware of the ambiguous ampersand rule, which allows these 106 in 
>> special circumstances.
>> 
>> HTML5 asserts that the list of named character references will not expand in 
>> the future. It can be found authoritatively at the following URL:
>> 
>> https://html.spec.whatwg.org/entities.json
>> 
>> The ambiguous ampersand rule smoothes over legacy behavior from before HTML5 
>> where ampersands were not properly encoded in attribute values, specifically 
>> in URL values. For example, in a query string for a search, one might find 
>> `?q=dog¬=cat`. The `¬` in that value would decode to U+AC (¬), but 
>> since it’s in an attribute value it will be left as plaintext. Inside normal 
>> HTML markup it would transform into `?q=dog¬=cat`. There are related nuances 
>> when numeric character references are found at the end of a string or 
>> boundary without the semicolon.
>> 
>> The function signature of `html_entity_decode()` does not currently allow 
>> for correcting this behavior. I’d like to propose an RFC or a bug fix which 
>> either extends the function (perhaps by adding a new flag like 
>> `ENT_AMBIGUOUS_AMPERSAND`) or preferably creates a new function. For the 
>> missing character references I wonder if it would be enough to add them to 
>> the list of default translatable references.
>> 
>> One challenge with the existing function is that the concept of the 
>> translation table stands in contrast with the fixed and static nature of 
>> HTML5’s replacement tables. A new function or set of functions could open up 
>> spec-compliant decoding while providing helpful methods that are necessary 
>> in many common server-side operations:
>> 
>>   - `html_decode( ‘attribute’ | ‘data’, $raw_text, $input_encoding = ‘utf-8' 
>> )`
>>   - `html_text_contains( ‘attribute’ | ‘data’, $raw_haystack, $needle, 
>> $input_encoding = ‘utf-8’ )`
>>   - `html_text_starts_with( ‘attribute’ | ‘data’, $raw_haystack, $needle, 
>> $input_encoding = ‘utf-8’ )`
>> 
>> These methods are handy for inspecting things like encoded attribute values 
>> in a memory-efficient and processing-efficient way, when it’s not necessary 
>> to decode the entire value. In common situations, one encounters data-URIs 
>> with potentially megabytes of image data and processing only the first few 
>> or tens of bytes can save a lot of overhead.
>> 
>> We’re exploring pure-PHP solutions to these problems in WordPress in 
>> attempts to improve the reliability and safety of handling HTML. I’d love to 
>> hear your thoughts and know if anyone is willing to work with me to create 
>> an RFC or directly propose patches. We’ve created a step function which 
>> allows finding the next character reference and decoding it separately, 
>> enabling some novel features like highlighting the character references in 
>> source text.
>> 
>> Should I propose an RFC for this?
>> 
>> Warmly,
>> Dennis Snell
>> Automattic Inc.
> 
> All,
> 
> I have submitted an RFC draft for including the proposed feature from this 
> issue. Thanks to everyone who helped me in this process. It’s my first RFC, 
> so I apologize in advance for any mistakes I’ve made in the process.
> 
> https://wiki.php.net/rfc/decode_html
> 
> This is proposed for a future PHP version after 8.4.
> 
> Warmly,
> Dennis Snell

Hey Dennis,

The RFC mentions that encoding must be utf-8. How are programmers supposed to 
work with this if the php file itself isn’t utf-8 or the input is meaningless 
in utf-8 or if changing it to utf-8 and back would result in invalid text?

— Rob

Re: [PHP-DEV] [DISCUSSION] Class Constant Enums?

2024-08-16 Thread Rob Landers
On Fri, Aug 16, 2024, at 16:17, Larry Garfield wrote:
> On Fri, Aug 16, 2024, at 6:35 AM, Alexandru Pătrănescu wrote:
> > Hi Nick,
> >> 
> >> Is there any interest in having enums as class constants?
> >> 
> >> I'm often finding cases where I would like to have an enum inside of a
> >> class, but don't want a free-floating enum that's basically like
> >> another class.
> >> 
> > 
> > .. 
> > 
> >> 
> >> class SSHClient {
> >> 
> >>public const enum CommandResult
> >>{
> >>case Success;
> >>case Failure;
> >>case Unknown;
> >>case Timeout;
> >>}
> >> 
> >>// ...
> >> }
> >> 
> >> 
> >> // Usage:
> >> 
> >> SSHClient::CommandResult::Success
> >
> >
> > I feel this topic could be maybe more broad and be called "nested 
> > classes" that are already supported in multiple languages: Java, Swift, 
> > Python, C#, C++, JavaScript, etc.
> >
> > The syntax you showed is usually identical with what other languages 
> > use, except that probably the const is unnecessary.
> > The nested class can have visibility as sometimes having it private 
> > makes sense.
> > Accessing it through `::` is probably fine, but a deeper look at the 
> > grammar might be necessary.
> > The nested class would have access to parent class private properties 
> > and methods.
> >
> > I also mentioned this topic on the subject of defining a type in an 
> > autoloader compatible way.
> > And indeed, a type could also be defined nested in a class if we want 
> > to support that as well.
> >
> > Now, this feature is not simple, and I think it needs proper 
> > sponsorship from someone experienced with internals.
> >
> > Regards,
> > Alex
> 
> I agree with Alexandru.  Since enums are 90% syntactic sugar over classes, 
> "inner enums" would be 80% of the way to "inner classes".  And I would be in 
> favor of inner classes. :-)  There's a lot of potential benefits there, but 
> also a lot of edge cases to sort out regarding visibility, what is allowed to 
> extend from what, etc.  But that would support inner enums as well.

>From recently looking into this for totally unrelated reasons, nested enums 
>would be far easier to implement on a grammar level. Enums also have some 
>constraints that make it simpler than the general “nested classes,” such as 
>rules regarding inheritance.

As for the actual implementation, it’ll be the edges that kill you.

I would recommend just doing enums, and keep the scope smaller. 


Re: [PHP-DEV] [RFC] On the need of a `is_int_string` ?

2024-08-15 Thread Rob Landers
On Thu, Aug 15, 2024, at 17:42, Vincent Langlet wrote:
> Hi,
> 
> When string is used as an array key, it's sometimes casted to an int.
> As explained in https://www.php.net/manual/en/language.types.array.php:
> "Strings containing valid decimal ints, unless the number is preceded by a + 
> sign, will be cast to the int type. E.g the key "8" will actually be stored 
> under 8. On the other 08 will not be cast as it isn't a valid decimal 
> integer."
> 
> This behavior cause some issues, especially for static analysis. As an 
> example https://phpstan.org/r/5a387113-de45-4bef-89af-b6c52adc5f69
> vs real life https://3v4l.org/pDkoB
> 
> Currently most of static analysis rely on one/many native php functions to 
> describe types.
> PHPStan/Psalm supports a `numeric-string` thanks to the `is_numeric` method.
> 
> I don't think there is a native function to know if the key will be casted to 
> an int. The implementation would be something similar (but certainly better 
> and in C) to 
> ```
> function is_int_string(string $s): bool
> {
> if (!is_numeric($s)) {
> return false;
> } 
> 
> $a[$s] = $s;
> 
> return array_keys($a) !== array_values($a);
> }
> ```
> 
> Which gives:
> is_numeric('08') => true
> ctype_digit('08') => true
> is_int_string('08') => false
> 
> is_numeric('8') => true
> ctype_digit('8') => true
> is_int_string('8') => true
> 
> is_numeric('+8') => true
> ctype_digit('+8') => false
> is_int_string('+8') => false
> 
> is_numeric('8.4') => true
> ctype_digit('8.4') => false
> is_int_string('8.4') => false
> 
> Such method would allow to easily introduce a `int-string` type in static 
> analysis and the opposite, a `non-int-string` one (cf 
> https://github.com/phpstan/phpstan/issues/10239#issuecomment-1837571316).
> 
> WDYT about adding a `is_int_string` method then ?
> 
> Thanks

Hello,

At the risk of bikeshedding, it would probably be better to define it in the 
`array_*` space, maybe something like `array_key_is_string(string $key): bool`?

As for your function definition, it can be simplified a bit:

return (($s[0] ?? '') > 0 || (($s[0] ?? '') === '-' && ($s[1] ?? '') > 0)) && 
is_numeric($s);

I believe that covers all the cases that I could think of:

01, -01, +01, 1, 1.2, -1, -1.2, ~1, ~01

— Rob

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-15 Thread Rob Landers
On Thu, Aug 15, 2024, at 20:40, Rowan Tommins [IMSoP] wrote:
> On 15/08/2024 16:22, Rob Landers wrote:
>> Hello internals,
>> 
>> I've decided to attempt an RFC for function autoloading. After reading 
>> hundreds of ancient (and recent) emails relating to the topic along with 
>> several abandoned RFCs from the past, and after much review, I've decided to 
>> put forth a variation of a previous RFC, as it seemed the least ambitious 
>> and the most likely to work:
>> 
>> https://wiki.php.net/rfc/function_autoloading4
> 
> 
> Hi Rob,
> 
> While brevity can sometimes be a virtue, I feel like there's a lot left to 
> the reader's interpretation here.
> 
> Specifically, one of the main issues that has come up in previous discussions 
> of the topic is the behaviour of unqualified function names, which check the 
> current namespace first, then fall back to global scope. Your RFC implies an 
> approach to this, but doesn't actually spell it out, nor discuss its pros and 
> cons.
> 

It doesn't go too much into details here on purpose, especially since there was 
some recent discussion on changing the order. That being said, while writing 
the reply below, I realized it simply wasn't clear enough. I've updated the RFC 
to be more clear, in regards to the current behavior.

> The fully qualified case is straight-forward: the autoloader is called, and 
> if still not defined, an error is thrown. But for the unqualified case, there 
> are multiple scenarios, and you only give the behaviour for one of them:
> 

To fill in your chart:

> Defined in current namespace? | Defined in global namespace? | Proposed 
> behaviour
> --+--+--
> No| No   | Prefixed name
> No| Yes  | Prefixed name
> Yes   | No   | No change
> Yes   | Yes  | No change
> 
> The third and fourth cases (where the function exists in the current 
> namespace) are straight-forward, although it wouldn't hurt to spell them out: 
> presumably, the namespaced function is used as now, so no autoloading is 
> needed.
> 
> The complex case has always been the second one: the function doesn't exist 
> in the current namespace, but *does* exist in the global namespace. (Or, an 
> autoloader *defines* it in the global namespace.)
> 

This should have the same behavior as in the class autoloader. In my attempt to 
be vague (in case the loading order is changed, which I assumed would affect 
class autoloading as well), I wasn't very clear on this. Meaning if you create 
a class called "Fiber" and a function called "strlen" in the current namespace, 
in unloaded files, an autoloader should be given the opportunity to load them.

I should also probably define some vocabulary so this is all less confusing. 
and Done.


> In concrete terms, what does this code output:
> 
> spl_autoload_register( function($function, $type) { echo "$function..."; }, 
> type:SPL_AUTOLOAD_FUNCTION);
> 
> namespace Foo {
>foreach (['hello', 'goodbye'] as $word) {
>   echo strlen($word), ';';
>}
> }
> 
> a) "Foo\strlen...5;Foo\strlen...7;" (the autoloader is called every time the 
> function is encountered)
> b) "Foo\strlen...5;7;" (the autoloader is called once, then somehow marked 
> not to run again for this name
> c) "5;7;" (the autoloader is never run for this code)
> 

I believe the "most correct" answer is (a) -- and is what the current class 
autoloader does as well. Option (b) would make it impossible to do any kind of 
dynamic code generation or old-fashioned file-based deployments.


namespace global {
spl_autoload_register(function ($function) {
   echo "$function...";
});
}

namespace Foo {

foreach (['hello', 'goodbye'] as $word) {
   if (!class_exists(Fiber::class)) {
  echo "Test...";
   }
}
}

Since I foresee (based on previous conversations) about people being worried 
about performance:

I think this is best left to autoloader authors to manage. If there isn't a 
function autoloader, then there won't be any performance impact (or at least, 
minimal). However, if there is one, the author can take into account how their 
codebase works. For example, I highly suspect FIG will create a PSR for this 
(eventually), based on how they expect things to work. I suspect a project like 
WordPress could make use of it and implement things 

Re: [PHP-DEV] function autoloading v4 RFC

2024-08-15 Thread Rob Landers
On Thu, Aug 15, 2024, at 18:18, Stephen Reay wrote:
> 
> 
>> On 15 Aug 2024, at 22:22, Rob Landers  wrote:
>> 
>> Hello internals,
>> 
>> I've decided to attempt an RFC for function autoloading. After reading 
>> hundreds of ancient (and recent) emails relating to the topic along with 
>> several abandoned RFCs from the past, and after much review, I've decided to 
>> put forth a variation of a previous RFC, as it seemed the least ambitious 
>> and the most likely to work:
>> 
>> https://wiki.php.net/rfc/function_autoloading4
>> 
>> Please let me know what you think. An implementation should be along before 
>> opening it for a vote (now that I realize how important that is).
>> 
>> — Rob
> 
> Hi Rob,
> 
> I like the simplicity of this, however your RFC doesn't document the changes 
> required to `spl_autoload`[1] to allow it to keep working with this new 
> functionality.

Ah, good catch. I've updated this and gone through other relevant functions.

> 
> The same issue (unexpected additional argument) potentially affects userland 
> autoloaders too, but obviously the individual authors can fix that themselves 
> (whether this would count as a BC break is not immediately clear to me)

Userland functions don't throw that error, so it shouldn't be an issue. (You 
can pass as many arguments as you want to a userland function, as long as there 
are enough of them).

> 
> 
> Slightly tangentially, you may also want to look at a change to 
> `spl_autoload_call` to accept a `SPL_AUTOLOAD_*` argument, so that it works 
> consistently.

Done.

Thank you Stephen for pointing out this oversight.

> 
> Cheers
> 
> Stephen
> 
> 1: https://www.php.net/spl_autoload
> 

— Rob

Re: [PHP-DEV] [RFC] [VOTE] Transform exit() from a language construct into a standard function

2024-08-08 Thread Rob Landers


On Thu, Aug 8, 2024, at 16:10, Andreas Heigl wrote:
> Hey Gina, hey all
> 
> Am 08.08.24 um 15:44 schrieb Gina P. Banyard:
> > On Wednesday, 7 August 2024 at 17:07, Andreas Heigl  
> > wrote:
> >> Stupid question maybe, but are we voting on the RFC or on the patch?
> >>
> >> If the patch does not match what.the RFC proposes, then the patch has 
> >> a problem. That should IMO though not affect voting on an RFC.
> >>
> >> Or am I.missimg something?
> > 
> > In theory, it is the RFC idea.
> > In practice, a lot of the times it is the patch for complex features.
> > 
> > However, it is still within the purview of core developers to veto the 
> > implementation of an RFC.
> > Which could be the case here rather than voting against the RFC outright.
> 
> I have no problem that core developers veto a certain implementation of 
> an RFC. I actually expect them to do so.
> 
> But the vote should IMO *always and exclusively* be based on the RFC. 
> Not on the implementation. If the voting happens based on the 
> implementation due to the complexity of the features that means that the 
> RFC is not wel written and needs to be improved. Or the implementation 
> is problematic and needs to be vetoed by the core developers.
> 
> Why do I think so? It becomes completely intransparent why an RFC was 
> rejected when the voting happens based on a meager implementation as 
> after some years no one will understand why a well written RFC was 
> rejected. Especially when the discussion then also happens off-list and 
> on the actual code in github as that tears apart the information sources 
> that need to be taken into consideration in hindsight.

I would expect any implementation done before the RFC is voted on to be 
entirely proof-of-concept, and not expected to be mergable as-is. Basically, a 
way to test out the new proposed RFC, but may have issues (such as memory leaks 
or not all edge cases implemented). I, personally, wouldn’t expect an RFC to be 
declined based on the initial patch. That’s good information to add to the rfc 
how-to page. 

— Rob

Re: [PHP-DEV] [Discussion] Sandbox API

2024-08-06 Thread Rob Landers


On Tue, Aug 6, 2024, at 20:59, Niels Dossche wrote:
> On 06/08/2024 10:41, Nick Lockheart wrote:
> > 
> > Sandbox: Security
> > 
> > A SandBox has two use cases:
> > 
> > 1. Unit Testing of code with mocks or stubs, and also, allowing testing
> > with different environments.
> > 
> > 2. The secure running of 3rd party code inside a 1st party application.
> > 
> 
> The use-case of securely running 3rd party code inside your application is 
> impossible at this moment, and will still be impossible after a sandbox API 
> is introduced.
> The reason is that the PHP interpreter as it is today is not memory safe. It 
> is relatively easy to cause memory corruption by only using PHP code by 
> abusing things like custom error handlers set from userland. This in turn can 
> be used to gain arbitrary read/write primitives which has been shown to 
> circumvent disable_functions & open_basedir, and some PoCs can even run 
> arbitrary commands. It would be doable to extend these tricks to circumvent a 
> sandboxing API.
> As such, a sandboxing API for securely executing 3rd party code is only 
> possible after the interpreter has become memory safe.
> Although some work has been done in PHP 8.3 to plug many of these memory 
> safety bugs in the VM, much more work remains and would likely require 
> complicated changes.
> So therefore I propose to only focus on the mocking functionality of your 
> proposal for now, until the time comes that the interpreter is memory safe.
> I would therefore also not call it "sandbox".
> 
> Introducing a sandbox API for security also opens up a can of worms for the 
> security policy.
> Right now we are assuming an attacker model of a remote attacker, and that 
> the code running on your server is trusted.
> But that would change when an official sandbox API is introduced.
> 
> Kind regards
> Niels
> 
Hey Niels,

I find this assertion kind of scary from a shared hosting perspective or even 
from a 3v4l kind of perspective. How do these services protect themselves if 
php is inherently insecure?

— Rob

Re: [PHP-DEV] [Discussion] Sandbox API

2024-08-06 Thread Rob Landers
Hey Nick,

Looking forward to the RFC!

On Tue, Aug 6, 2024, at 19:28, Nick Lockheart wrote:
> > 
> > This looks quite valuable, and I assume auto loading would work just
> > like normal? Register an autoloader that will eventually require the
> > file and call this function?
> > 
> > It would be nice to provide a simplified api as well, maybe
> > “CopyCurrentEnvironment()” or something?  In most cases, it is
> > easier/faster to find things to remove vs. adding everything on every
> > plugin/request every time. 
> > 
> > In saying that, it would be great if there was an api for “sharing” a
> > base-sandbox pool via shm (or similar to a pool) so that the base vm
> > doesn’t need to be recreated potentially hundreds of times per
> > request. 
> > 
> 
> I didn't want to be too overwhelming on the first post, but since it
> seems the feedback is positive, here's a more complete list of what I
> think should be included:
> 
> 
> // Passthroughs:
> 
> // Make all user and built-in global functions
> // available inside the sandbox:
> SPLSandBox::PassGlobalFunctions();

Bike shed: maybe have PassGlobals() instead? Or rather, PassNamespace and have 
\ be a valid namespace. 

> 
> // Make all built-in (but not user) functions
> // available inside the sandbox:
> SPLSandBox::PassBuiltInFunctions();
> 
> 
> // Make all built-in (but not user) functions
> // available inside the sandbox, EXCEPT blacklisted functions:
> SPLSandBox::PassBuiltInFunctionsExcept(['eval','exit']);
> 
> (assuming exit becomes a function).

I feel like exit (function or not) should just return from the sandbox and 
shouldn’t be disable-able. For example, a plugin might detect a valid etag and 
set headers to 302 and exit. 

> 
> 
> // Allow only specific functions to be called (whitelist method):
> $aWhiteList = ['array_key_exists','in_array'];
> SPLSandBox::PassFunctions($aWhiteList);

Might be a good idea to combine these two? Allow passing a whitelist AND a 
blacklist? Are these supposed to be static or on an instance of a sandbox?

> 
> // Allow specific classes to be used by sandbox code:
> $aClassList = ['\MyAPP\PluginAPI'];
> SPLSandBox::PassClasses($aClassList);
> 
> 
> // Allow specific constants to be seen by sandbox code:
> SPLSandBox::PassConstants(['\DB_USERNAME','\DB_PASSWORD']);
> 
> 
> 
> // Language Construct Callbacks:
> 
> The callbacks allow the outer code to control and monitor certain
> language features of the sandboxed code during execution.
> 
> // Called when the sandbox code tries to include or require something:
> SPLSandBox::RegisterIncludeHandler();

Does including a file from outside the sandbox (next call) call this handler?

> 
> // Includes a file into the sandbox:
> SPLSandBox::Include('path/to/file.php');
> 
> // Your sandbox autoloader logic could be incorporated here:
> SPLSandBox::RegisterAutoLoadHandler();
> 
> 
> // But, for unit testing with mocks and stubs,
> // it might be better to use:
> SPLSandBox::RegisterNewHandler();
> 
> The NewHandler callback is called every time sandboxed code tries to
> instantiate an object with `new`.

Why not use the current autoload logic?

> // Example: Override what `new` returns to code running in the sandbox:
> function MyNewHandler(string $ClassName, array $aConstructorArgs){
> 
>if($ClassName === '\DateTime'){
>   return new FakeDate();
>}
>return new $ClassName($aConstructorArgs);
> }
> 
> 
> // Every time a sandboxed class calls a method, call this first:
> SPLSandBox::RegisterMethodCallHandler();
> 
> Useful for unit testing to monitor if the tested class is calling the
> methods it should be calling. Ignores visibility rules.
> 
> Could also allow for infinite recursion detection from the outside.

I think this is handled automatically now. 

> 
> 
> // The companion for static method calls, gets called
> // every time a method is called on a class statically:
> SPLSandBox::RegisterStaticMethodCallHandler();
> 
> 
> 
> // Each time a sandboxed loop iterates, call this first:
> // Allows the outer code to put limit breaks on the sandboxed code.
> SPLSandBox::RegisterLoopHandler();
> 
> The callback takes the type of loop, and the variables that make up the
> loop ($i for for(), $Key => $value for foreach(), etc)
> 
> 
> // If the sandboxed code calls echo, print, or
> // causes any output to occur (ie outside of  SPLSandBox::RegisterEchoHandler();
> 
> Could be used to make sure templates behave as desired, but perhaps
> even more useful, it lets you *fail* unit tests if any output occurs
> from a test that shouldn't produce output.
> 
> ie. Catch echo statements used in testing, or whitespace after a
> closing ?> tag.
> 
> 
> // If the sandbox code tries to use `exit` or `die`,
> // call this function instead:
> SPLSandBox::RegisterExitHandler();
> 
> You'll probably want to destroy the sandbox from the outside (see
> below), rather than letting sandboxed code halt the test framework or
> main application.
> 
> 
> // If sandboxed code throws

Re: [PHP-DEV] [Discussion] Sandbox API

2024-08-06 Thread Rob Landers


On Tue, Aug 6, 2024, at 10:41, Nick Lockheart wrote:
> 
> Sandbox: Security
> 
> A SandBox has two use cases:
> 
> 1. Unit Testing of code with mocks or stubs, and also, allowing testing
> with different environments.
> 
> 2. The secure running of 3rd party code inside a 1st party application.
> 
> 
> For the second use case, I will use a fictional blogging software
> called "Hot Blog" as the example.
> 
> Hot Blog is a very popular Open Source blogging platform. Hot Blog
> allows third party developers to write plugins.
> 
> While many Hot Blog plugin developers have the best of intentions, some
> of them are novice coders that make security mistakes.
> 
> So let's talk about how Hot Blog could benefit from using the new
> SandBox API.
> 
> By default, a SandBox instance is a blank slate. There's nothing inside
> of it, unless the SandBox is told to have something in it.
> 
> That means that sandboxed code that tries to read $_SESSION will find
> an empty array. Same with $_SERVER, $_POST, and $_GET.
> 
> That's by default. This allows the code that controls the sandbox to
> create custom access to application level resources.
> 
> Let's say that Hot Blog wants plugin developers to be able to access
> certain $_POST variables, but only *after* Hot Blog has checked the
> strings for multi-byte attack vulnerabilities.
> 
> To do this, Hot Blog creates a class called PluginAPI with a
> GetCleanPost method. This lets sandboxed plugins get $_POST data
> without being able to bypass Hot Blog's mandatory security check.
> (Remember, $_POST is empty inside the sandbox).
> 
> The code looks like this:
> 
> $oSandbox = new SPLSandBox();
> $oSandbox->MockClass('\HotBlog\PluginAPI','\HotBlog\PluginAPI');
> $oUserPlugin = $oSandbox->GetInstance('BobsMagicPlugin');
> $oUserPlugin->Run();
> 
> Because "Bob" has written his plugin as a Hot Blog plugin and knows
> that Hot Blog's rules require him to use
> \HotBlog\PluginAPI::GetCleanPost() to access a $_POST variable, he
> calls that instead of using $_POST.
> 
> Now, Hot Blog can impose mandatory security checks on incoming data
> making their application more secure.
> 
> Next, let's talk about includes. By default, if sandboxed code tries to
> include or require a file, a SandBoxAccessViolation is thrown.
> 
> Letting sandboxed code include whatever it wants defeats the point of a
> sandbox, at least for security use cases.
> 
> Of course, includes are useful, and plugins may need them. But the
> outer application should be able to control that access.
> 
> Enter SPLSandBox::RegisterIncludeHandler().
> 
> RegisterIncludeHandler accepts a callable.
> 
> The callable's signature is:
> 
> (int $IncludeType, string $FileName, string $FilePath)
> 
> Where:
> 
> $IncludeType is:
> 0 - require
> 1 - require_once
> 2 - include
> 3 - include_once
> 
> $FileName is the file without the path, and $FilePath is the path with
> trailing `/`.
> 
> If the sandbox should allow includes, the sandbox should have an
> Include Handler registered.
> 
> The SandBox API calls the include handler, if defined, when sandboxed
> code tries to include or require files.
> 
> Let's setup a function so our plugin authors can include files, but
> only from their own plugin directory:
> 
> // Sandbox setup for includes:
> 
> $oSandbox = new SPLSandBox();
> $oSandbox->RegisterIncludeHandler('HotBlogInclude');
> 
> $oUserPlugin = $oSandbox->GetInstance('BobsMagicPlugin');
> $oUserPlugin->Run();
> 
> 
> // Include Handler:
> 
> function HotBlogInclude($Type, $FileName, $FilePath){
>   
>if(file_exists($PluginDirectoy.$FileName)){
>   $oSandbox->Include($PluginDirectoy.$FileName);
>   return 0;
>}
>return 1; // error!
> }
> 
> In the above example, $FilePath contained the path that Bob requested
> with his include statement. But we ignored it! Bob is only allowed to
> include from his plugin's own directory, so we see if the file is in
> $PluginDirectoy instead.
> 
> If the file is in Bob's directory, we include it *into* the sandbox
> with SPLSandBox::Include(), making it available to Bob's code, but
> keeping the main application code clean of any registrations the
> include may cause.
> 
> 
> ** Back to Unit Testing **
> 
> For the Unit Testing use case, however, certain code under test may
> normally read from $_GET, and that shouldn't change under test.
> 
> In this next example, we are running a unit test on a FrontController
> class, and we want to see if it works with many different URL
> structures.
> 
> Normally, the web server will map example.com/a/b/c to $_GET vars, so
> the FrontController class expects something like:
> 
> $_GET = [
>'a' => 'Forum',
>'b' => 'Post'
>'c' => '123'
> ];
> 
> Let's make sure our FrontController is doing everything right with a
> battery of tests:
> 
> $aControllerTests = [
>['Forum','Post','123'],
>['Blog','Post','123'],
>['Article','acb'],
>['Cart','Product','723'],
>['Cart','Category','Jeans']
> ];
> 
> $aT

Re: [PHP-DEV] [RFC] Add Directive to Make All Namespaced Function Calls Global

2024-08-05 Thread Rob Landers


On Mon, Aug 5, 2024, at 20:49, Ilija Tovilo wrote:
> On Mon, Aug 5, 2024 at 3:33 PM Nick Lockheart  wrote:
> >
> > This is a different problem that could be solved by a sandbox API.
> 
> Not sure which case we were talking about then. ClockMock is what I've
> been referencing all along.
> 
> > > Well, ok. But then we're back to prefixing global calls, which
> > > defeats the purpose of the proposal.
> >
> > Global functions would only need a prefix `\` in the *very rare* cases
> > where local functions are set as the default. For most people, the \
> > would be omitted, as globals would be set as default for unqualified
> > function names.
> 
> Right. But apart from mocking, what are these cases? If performance
> were no longer an issue, "using global functions" just makes the
> language harder to use, removing the local fallback. "using local
> functions" may be useful in namespaced code making many calls to
> functions within the same namespace. In that case, it would probably
> be more useful to switch the lookup order back instead. If you want to
> pay zero performance penalty, you can prefix global calls with \.
> You'd need to do that with "using local functions" anyway.

So. Fun story. I’ve seen this technique used to patch out fgetcsv due to a 
memory leak, with a pure php polyfill, in at least four unrelated codebases. I 
believe the leak is still there too and now that I know so much more about zend 
strings, I can probably guess what the issue is as well.

I digress. The point is, there are code bases that use this technique to get 
around php issues or even “implement older versions” of core functions to 
retain backwards compatibility until the code can be updated to deal with the 
new core version.

> 
> As for mocking: If the code needs to change either way, why not make
> it testable in the first place, e.g. through dependency injection for
> time()? At least this only requires changing the calls that are
> mocked, instead of all the calls that aren't.

Have you ever worked on some legacy code where you aren’t really sure how it is 
working in the first place? Even something as simple as shimming out time() 
could cause race conditions in the overall system. Refactoring these systems is 
an art form all by itself and you attempt to add tests to understand the system 
end-to-end, long before you ever change a line of production code. 

> 
> The main benefit of the approach from ClockMock is that your code
> (probably) doesn't need to change. I do think that the entire approach
> is hacky, and probably worth solving on a language-level, at least if
> possible without adding limitations to the engine. A good first start
> would be to know what functions are commonly mocked with this
> approach.
> 
> Ilija
> 

Time functions are the most obvious ones, and then any function that changes 
between versions and breaks something would be the non-obvious ones. (Ex: 
counting null: https://3v4l.org/hmNiL) This allows you to upgrade php / support 
higher versions while slowly upgrading core function calls.

— Rob

Re: [PHP-DEV] [RFC] Transform exit() from a language construct into a standard function

2024-08-05 Thread Rob Landers


On Mon, Aug 5, 2024, at 19:03, Tim Düsterhus wrote:
> >>> If there is a function call to a function with a 'never' return
> >>> type, then that function will potentially throw, or exit.
> >>>
> >>> But that's not relevant for the analysis, as these userland
> >>> functions will have their own oparrays with their entry and exit
> >>> points.
> >>
> >> The compiler has the function table available. It is used to optimize
> >> specific functions into dedicated Opcodes. Thus you should be able to
> >> look up the function within the function table and then check its
> >> return type.
> > 
> > Yes, but functions that call exit are not required to have the 'never'
> > return type, so that's not useful.
> 
> The `exit()` function as implemented in Gina's PR has the `never` return 
> type, thus the situation is no worse than the current situation. Instead 
> of checking for the ZEND_EXIT opcode, you check for a function call and 
> if the function has the `never` return type (which `exit()` is 
> guaranteed to have), then you can treat it as a path ending there. Or 
> you can just hardcode a check for a call to the `exit()` function, as 
> outlined above.

Just to clarify: a never return type doesn't mean execution ends there.

throwing:
https://3v4l.org/s3KDM

infinite loop:
https://3v4l.org/Lpbtt

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-02 Thread Rob Landers


On Fri, Aug 2, 2024, at 18:51, Ilija Tovilo wrote:
> It also
> sparked some related ideas, like providing modules that lock
> namespaces and optimize multiple files as a singular unit. That said,
> such approaches would likely be significantly more complex than the
> approach proposed here (~30 lines of C code).

There was an entire thread about modules and packages and shenanigans not too 
long ago. It’s rather fascinating. Highly recommend participating or starting a 
new thread. It seems that people are interested in it, and want it. 

— Rob

Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local)

2024-08-02 Thread Rob Landers


On Fri, Aug 2, 2024, at 18:51, Ilija Tovilo wrote:
> Hi everyone
> 
> As you probably know, a common performance optimization in PHP is to
> prefix global function calls in namespaced code with a `\`. In
> namespaced code, relative function calls (meaning, not prefixed with
> `\`, not imported and not containing multiple namespace components)
> will be looked up in the current namespace before falling back to the
> global namespace. Prefixing the function name with `\` disambiguates
> the called function by always picking the global function.
> 
> Not knowing exactly which function is called at compile time has a
> couple of downsides to this:
> 
> * It leads to the aforementioned double-lookup.
> * It prevents compile-time-evaluation of pure internal functions.
> * It prevents compiling to specialized  opcodes for specialized
> internal functions (e.g. strlen()).
> * It requires branching for frameless functions [1].
> * It prevents an optimization that looks up internal functions by
> offset rather than by name [2].
> * It prevents compiling to more specialized argument sending opcodes
> because of unknown by-value/by-reference passing.
> 
> All of these are enabled by disambiguating the call. Unfortunately,
> prefixing all calls with `\`, or adding a `use function` at the top of
> every file is annoying and noisy. We recently got a feature request to
> change how functions are looked up [3]. The approach that appears to
> cause the smallest backwards incompatibility is to flip the order in
> which functions are looked up: Check in global scope first, and only
> then in local scope. With this approach, if we can find a global
> function at compile-time, we know this is the function that will be
> picked at run-time, hence automatically enabling the optimizations
> above. I created a PoC implementing this approach [4].
> 
> Máté has kindly benchmarked the patch, measuring an improvement of
> ~3.9% for Laravel, and ~2.1% for Symfony
> (https://gist.github.com/kocsismate/75be09bf6011630ebd40a478682d6c17).
> This seems quite significant, given that no changes were required in
> either of these two codebases.

So, what you’re saying is that symfony and laravel can get a performance 
increase by simply adding a \ in the right places? Why don’t they do that 
instead of changing the language?

> 
> There are a few noteworthy downsides:
> 
> * Unqualified calls to functions in the same namespace would be
> slightly slower, because they now involve checking global scope first.
> I believe that unqualified, global calls are much more common, so this
> change should still result in a net positive. It's also possible to
> avoid this cost by adding a `use function` to the top of the file.

For functions/classes in the same exact namespace, you don’t need a use 
statement. But after this change, you do in certain cases?

namespace Foo;

function array_sum($bar) {}

function baz($bar) {
  return array_sum($bar);
}

So, how do you use that function in the same file?

> * Introducing new functions in the global namespace could cause a BC
> break for unqualified calls, if the function happens to have the same
> name. This is unfortunate, but likely rare. Since new functions are
> only introduced in minor/major versions, this should be manageable,
> but must be considered for every PHP upgrade.

We can only see open source code when doing impact analysis. This means picking 
even a slightly “popular” name could go very poorly.

> * Some mocking libraries (e.g. Symfony's ClockMock [5]) intentionally
> declare functions called from some file in the files namespace to
> intercept these calls. This use-case would break. That said, it is
> somewhat of a fragile approach to begin with, given that it wouldn't
> work for fully qualified calls, or unnamespaced code.

See above. I’ve seen this “trick” used on many closed source projects. I’ve 
also seen it used when PHP has a bug and the workaround is to implement it in 
php like this. 

> 
> I performed a small impact analysis [6]. There are 484 namespaced
> functions shadowing global, internal functions in the top 1000
> composer packages. However, the vast majority (464) of these functions
> come from thecodingmachine/safe, whose entire purpose is offering
> safer wrappers around internal functions. Excluding this library,
> there are only 20 shadowing functions, which is surprisingly little.
> Furthermore, the patch would have no impact on users of
> thecodingmachine/safe, only on the library code itself.
> 
> As for providing a migration path: One approach might be to introduce
> an INI setting that performs the function lookup in both local and
> global scope at run-time, and informs the user about the behavioral
> change in the future. To mitigate it, an explicit `use function` would
> need to be added to the top of the file, or the call would need to be
> prefixed with `namespace\`. The impact analysis [6] also provides a
> script that looks for shadowing functions in your 

[PHP-DEV] Debugging tools

2024-07-28 Thread Rob Landers
Hello internals

I've always found tooling to be one of the biggest accelerators for 
understanding what is going on in a codebase, assuming you know how to use 
them. If you've ever tried modifying the language AST and friends, you know 
this can be extremely tricky at times.

For those that want to hack on the languages itself, I've been working on an 
AST debugging tool running on top of GDB. This allows you to hack on the 
language, recompile, and see how it affects the PHP compilation. Under the 
hood, it is really just automated breakpoints and outputting usefulish 
details... it's not too special. Most of the complexity comes in outputting 
something that makes sense to a human.

Here's a brief animation of it working: 
https://asciinema.org/a/u9oOUvX5eoha93YJWJ5gX2gCb (yes, I'm manually stepping 
through the code, but this sped up)

There's still a long way to go here, but it really helps to see the yacc rules 
matching and being applied, or AST being skipped and causing a memory leak. You 
can even break into a normal gdb session at any point, if needed.

I plan to open this up on GitHub within the next couple of weeks. There are 
some parts that are still a mess that I want to clean up before sharing the 
code with the world (and a few bugs in the AST view), as well as adding opcodes 
output.

So, far, I've used it to hunt down my own bugs and grok newer features, like 
property hooks (you can see that one here: 
https://asciinema.org/a/OPZzW2oBr2o8AuOu9YbsBBGap).

Does anything like this exist currently? Would anyone else find this useful?

— Rob

Re: [PHP-DEV] [RFC] [VOTE] Deprecations for PHP 8.4

2024-07-27 Thread Rob Landers
On Sun, Jul 28, 2024, at 00:14, Morgan wrote:
> On 2024-07-28 00:36, Rowan Tommins [IMSoP] wrote:
> > 
> > 
> > On 27 July 2024 00:58:17 BST, Morgan  wrote:
> >>
> >> I'm not talking about the MD5 or SHA1 algorithms or whether they should or 
> >> shouldn't be used. I'm just talking about the functions themselves. md5(), 
> >> md5_file(), sha1(), and sha1_file(). They only exist because there wasn't 
> >> the generic hash algorithm extension when they were created.
> > 
> > I understand what is being claimed (and you're not the only one claiming 
> > it), I'm just not convinced it's true.
> 
> I'm just looking at the manual's version information about when the 
> functions were introduced. Seems pretty unambiguous: md5, sha1, hash: 
> versions 3, 4, and 5 (via PECL).
> 
> > I think they have standalone functions for the same reason we added 
> str_contains and str_starts_with - because it's convenient to have 
> straightforward functions for common use cases.
> > 
> Because there weren't any purpose-built functions that did the job, 
> forcing users to use other functions in expensive ways for what is 
> internally a pretty simple task. There is a purpose-built function for 
> hashing.
> 
> > The hash() function is like a 60-piece set of interchangeable screwdriver 
> > heads, which only professionals and enthusiasts need; md5() and sha1() are 
> > like the flat-head and Phillips screwdrivers that everyone has in a drawer 
> > somewhere.
> > 
> > The thing that always surprises me is that PHP *doesn't* have a standalone 
> > function for SHA-256, which is the only other I've ever used.
> > 
> 
> Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions 
> for both, and then when SHA4 comes along (as it inevitably will) another 
> standalone function for one of its variants?
> 
> 
> > To continue the analogy, we're missing a Pozidriv screwdriver, so people 
> > are misusing the Phillips one. The RFC is suggesting that we take away 
> > their flat-head and Phillips screwdrivers, and leave them with the 60-piece 
> > set, and no instructions.
> > 
> > My suggestion is we instead give them a Pozidriv screwdriver, and write 
> > some tips on how to use it correctly.
> > 
> Or leave them them the 60-piece set (which includes flat-head and 
> Phillips screwdrivers, so they're not being taken away), and write some 
> tips on how to use it correctly.
> 
> > Regards,
> > Rowan Tommins
> > [IMSoP]
> 

I'd love to see a "hashing" namespace and all of these given their own 
functions with docblocks and manual pages instead of the current generic "god 
of hash" page which doesn't even list the hash functions available; you have to 
click on hash_algos and then look at the var_dump of hash algorithms. From 
there, you can google each one and try to understand what each one is good at 
and why you would use murmur3a over murmer3f, then try to figure out which one 
is the version that is compatible with javascript but not compatible with c# or 
maybe the other way around... (I recently got to go on that ride).

If we are going to deprecate the standalone functions (see the sha1 page, which 
at least links to a page about the sha1 algorithm, or the md5 rfc, which links 
to the md5 rfc) we should seriously invest in documenting these hashing 
algorithms and explaining them. In the very least, link to their respective 
RFCs.

— Rob

Re: [PHP-DEV] [RFC] Improve language coherence for the behaviour of offsets and containers

2024-07-27 Thread Rob Landers
On Thu, Jul 4, 2024, at 15:52, Gina P. Banyard wrote:
> Hello internals,
> 
> I would like to formally open the discussion on an RFC I've been working on 
> for the past year:
> https://wiki.php.net/rfc/container-offset-behaviour
> 
> As DokuWiki is a bit of a faff at times, the Markdown sources are available 
> on GitHub:
> https://github.com/Girgias/php-rfcs/blob/master/container-offset-behaviour.md
> 
> The implementation is basically done, other than some mysterious JIT issues 
> that I haven't been able to pinpoint yet.
> 
> 
> Best regards,
> 
> Gina P. Banyard
> 

I apparently was sitting on a draft in this thread:

Hey Gina,

Is there potentially a 12th type of operation? It is implied multiple times, 
but not spelled out: "offset exists" (i.e., "array_key_exists()" for arrays). 
The recent discussion on pattern matching makes it seem like there is 
potentially a huge difference between "an offset that exists" vs. "an offset 
with no value"; whether or not that is the case, I believe it is an operation 
on a container? Maybe. I believe you are the subject-matter-expert on this, by 
this point, for sure; so you would know. In any case, why or why wouldn't it be 
considered an operation?

— Rob

Re: [PHP-DEV] [RFC] Working With Substrings

2024-07-27 Thread Rob Landers
On Sat, Jul 27, 2024, at 15:26, Christoph M. Becker wrote:
> On 15.02.2023 at 06:18, Rowan Tommins wrote:
> 
> > On 15 February 2023 02:35:42 GMT, Thomas Hruska  
> > wrote:
> >
> >> On 2/14/2023 2:02 PM, Rowan Tommins wrote:
> >>
> >> I thought about that but didn't know how well it would be received nor, 
> >> perhaps more importantly, the direction it should take (i.e. a formal Zend 
> >> type in the engine, extending the existing zend_string type, a class, some 
> >> combination, or something else entirely).  All of the more advanced 
> >> options I came up with would have required some code changes to the PHP 
> >> source itself with a new data type being the most involved and probably 
> >> the most controversial.
> >
> > My instinct was that it could just be a built-in class, with an internal 
> > pointer to a zend_string that's completely invisible to userland. Something 
> > like how the SimpleXML and DOM objects just point into a libxml parse 
> > result.
> >
> > Then to add to existing functions requires changing an argument type from 
> > string to string|Buffer, rather than adding new arguments.
> >
> > No change to the type system needed, internally or externally, just some 
> > code to unwrap the pointer. But perhaps I'm being naive and 
> > oversimplifying, as I don't have a deep understanding of the engine.
> >
> >> I'm not entirely sure what the next step here should be.  Should I go 
> >> research the above, or go back and develop/test and then propose something 
> >> concrete in an OO direction and gather feedback at that point, or should 
> >> we hash it out a bit more here on the list to get a more specific 
> >> direction to go in?
> >
> > Well, those were just my thoughts; maybe someone else will come along 
> > shortly with a very different take.
> 
> I'm very late on this discussion, but I think it is an interesting
> topic, and maybe , which I
> had written long ago just to check some assumptions, can serve as POC.
> It is certainly possible to have such a string buffer class without
> having to patch the engine; it could even be made available as PECL
> extension (first).
> 
> Note that this StringBuilder uses `smart_str`s[1] what might be a good
> idea or not.  But certainly you could use some other internal handling;
> interoperability with `zend_string`s[2] requires to copy the char arrays
> in most cases anyway, since these have a fixed length, and if these
> copies are reduced to a minimum (i.e. the new class has enough
> flexibility to work without casting to and from string), that should be
> bearable.
> 
> Not sure if that would work for the "gd imageexportpixels() and
> imageimportpixels()" RFC[3], but it might be worth investigating.
> 
> [1]
> 
> [2]
> 
> [3] 
> 
> Cheers,
> Christoph
> 

Huh, I am also very late and somewhat poignant, last weekend, I managed to 
refactor all zend_strings to contain a char* instead of char[1] and the char* 
pointed to the memory just after the pointer. It increased zend_string by a few 
bytes on a 64bit machine, but would allow for some nice optimizations, such as 
zend_strings sharing memory (effectively removing the need for the current 
interned strings implementation). I ended up ditching it because it would break 
literally every extension that does its own allocations instead of calling 
zend_string_alloc|init() and it was also hard to manage when copying strings, 
which also some core extensions do instead of calling core zend_string_* 
functions. Needless to say, "vanilla php" worked fine and all tests passed.

I did submit a small part of my refactoring here: 
https://github.com/php/php-src/pull/15054 but even something that simple didn't 
seem well received. So, I won't continue this approach.

But, fwiw, I wouldn't advise changing zend_strings too much, many extensions 
appear to do one of two things: their own allocations and/or their own copying 
and/or their own freeing.

— Rob

Re: [PHP-DEV] Explicit callee defaults

2024-07-27 Thread Rob Landers
On Sat, Jul 27, 2024, at 11:50, Bilge wrote:
> On 27/07/2024 10:41, Rob Landers wrote:
>> 
>> This seems like a case for code generation
> I don't understand how this has anything to do with code generation. I 
> understand what composer-attribute-collector is doing, and see no application 
> for it (or something like it) here. Could you explain a bit further?

In your example you show how to write code that does what you want, but also 
that it is annoying to write by-hand. Thus, generating the code instead of 
writing it by-hand might be better than a new language feature; and there are 
many things that would benefit from that.

— Rob

Re: [PHP-DEV] Explicit callee defaults

2024-07-27 Thread Rob Landers


On Fri, Jul 26, 2024, at 23:54, Bilge wrote:
> Hi Internals,
> 
> New RFC idea just dropped. When writing a function, we can specify defaults 
> for its parameters, and when calling a function we can leverage those 
> defaults *implicitly* by not specifying those arguments or by "jumping over" 
> some of them using named parameters. However, we cannot *explicitly* use the 
> defaults. But why would we want to?
> 
> Sometimes we want to effectively *inherit* the defaults of a function we're 
> essentially just proxying. One way to do that is copy and paste the entire 
> method signature, but if the defaults of the proxied method change, we're now 
> overriding them with our own, which is not what we wanted to do. It is 
> possible, in a roundabout way, to avoid specifying the optional parameters by 
> filtering them out (as shown in the next example). The final possibility is 
> to use reflection and literally query the default value for each optional 
> parameter, which is the most awkward and verbose way to inherit defaults.
> 
> Let's use a concrete example for clarity.
> 
> function query(string $sql, int $limit = PHP_INT_MAX, int $offset = 0);
> 
> 
> 
> function myQuery(string $sql, ?int $limit = null, ?int $offset = null) {
> query(...array_filter(func_get_args(), fn ($arg) => $arg !== null));
> }
> 
> 
> In this way we are able to filter out the null arguments to inherit the 
> callee defaults, but this code is quite ugly. Moreover, it makes the 
> (sometimes invalid) assumption that we're able to use `null` for all the 
> optional arguments.
> 
> In my new proposal for *explicit *callee defaults, it would be possible to 
> use the `default` keyword to expressly use the default value of the callee in 
> that argument position. For example, the above implementation for myQuery() 
> could be simplified to the following.
> 
> 
> 
> function myQuery(string $sql, ?int $limit = null, ?int $offset = null) {
> query($sql, $limit ?? default, $offset ?? default);
> }
> 
> 
> Furthermore, it would also be possible to "jump over" optional parameters 
> *without* using named parameters.
> 
> json_decode($json, true, default, JSON_THROW_ON_ERROR);
> 
> This proposal is built on the assumption that it is possible to specify that 
> PHP should only accept the `default` expression in method and function call 
> contexts. For example, it would not be valid to return `default` from a 
> function and substitute it that way; my proposal is to only permit `default` 
> in literal function calling contexts. My knowledge of internals is 
> insufficient (read: non-existent) to know whether or not this restriction is 
> possible to implement, but if it is, I think this is a good idea. What do you 
> think?
> 
> 
> 
> Cheers,
> Bilge
> 
> 

This seems like a case for code generation — and an RFC that provides hooks for 
code generation would probably be better IMHO.

There are a couple of neat tools out there doing this and hooking into 
composer, like 
https://packagist.org/packages/olvlvl/composer-attribute-collector

There are many things that could benefit from this, such as DI containers, 
scanning for attributes, generating efficient serializers/deserializers, etc. 

— Rob

Re: [PHP-DEV] Explicit callee defaults

2024-07-27 Thread Rob Landers


On Sat, Jul 27, 2024, at 02:04, Mike Schinkel wrote:
> > On Jul 26, 2024, at 6:42 PM, Christoph M. Becker  wrote:
> > 
> > I have only skimmed your suggestion, but it sounds quite similar to
> > .
> 
> I would really love to hear from some of those who voted "no" ~9 years why 
> they did so, and if they still feel the same.
> 
> > On Jul 26, 2024, at 5:54 PM, Bilge  wrote:
> > When writing a function, we can specify defaults for its parameters, and 
> > when calling a function we can leverage those defaults implicitly by not 
> > specifying those arguments or by "jumping over" some of them using named 
> > parameters. However, we cannot explicitly use the defaults. But why would 
> > we want to?
> > 
> > Sometimes we want to effectively inherit the defaults of a function we're 
> > essentially just proxying. One way to do that is copy and paste the entire 
> > method signature, but if the defaults of the proxied method change, we're 
> > now overriding them with our own, which is not what we wanted to do. It is 
> > possible, in a roundabout way, to avoid specifying the optional parameters 
> > by filtering them out (as shown in the next example). The final possibility 
> > is to use reflection and literally query the default value for each 
> > optional parameter, which is the most awkward and verbose way to inherit 
> > defaults.
> > 
> > Let's use a concrete example for clarity.
> > 
> > function query(string $sql, int $limit = PHP_INT_MAX, int $offset = 0);
> > 
> > function myQuery(string $sql, ?int $limit = null, ?int $offset = null) {
> > query(...array_filter(func_get_args(), fn ($arg) => $arg !== null));
> > }
> > 
> 
> Yes, I run into that issue all the time.  So much so that I adopted an 
> "optional args" pattern which itself has many downsides but in my code it has 
> had fewer downsides than doing it other ways.
> 
> So for your example I would write it like this (though I hate having to 
> double-quote the names of the optional args):
> function query(string $sql, array $args = []);
> 
> function myQuery(string $sql, array $args = []) {
> query($sql, $args);
> }
> 
> myQuery("SELECT * FROM table;", ["offset" => 100]));
> 
> This is very flexible and a pattern that served me well for 10+ years. 
> 
> But of course this has none of the type safety nor code-completion benefits 
> of named parameters, which I would sorely like to have.
> 
> 
> > In this way we are able to filter out the null arguments to inherit the 
> > callee defaults, but this code is quite ugly. Moreover, it makes the 
> > (sometimes invalid) assumption that we're able to use `null` for all the 
> > optional arguments.
> > 
> > In my new proposal for explicit callee defaults, it would be possible to 
> > use the `default` keyword to expressly use the default value of the callee 
> > in that argument position. For example, the above implementation for 
> > myQuery() could be simplified to the following.
> > 
> > function myQuery(string $sql, ?int $limit = null, ?int $offset = null) {
> > query($sql, $limit ?? default, $offset ?? default);
> > }
> > 
> > Furthermore, it would also be possible to "jump over" optional parameters 
> > without using named parameters.
> > 
> 
> To me, while nice I feel it is very much like only a quarter step in the 
> right direction.
> 
> I do not meaning to highjack your thread; feel free to ignore this if you 
> feel committed to pursing only your stated approach and are not interested in 
> what I would like to see.  But I mention in hopes you and others see value in 
> a more complete approach:
> 
> To begin, we can already currently do this (although I do not see many people 
> doing it in PHP; fyi this approach is the norm for many Go programmers):
> 
> class QueryArgs {
>public function __construct(
>  public ?int $limit = 100,
>  public ?int $offset = 0,
>){}
> }
> 
> function myQuery(string $sql, QueryArgs $args = new QueryArgs()) {
>query($sql, $args);
> }
> 
> myQuery("SELECT * FROM table;", new QueryArgs(offset:100));
> 
> I can only give my reasons for not doing it myself in PHP and speculate that 
> others may have similar reasons and/or have never even considered it:
> 
> 1.  Not having an easy and consistent way to declare the "args" class in the 
> same file where the functions that use it are declared which keeps it 
> out-of-sight and makes it harder to keep updated than it needs to be.

There’s nothing stopping you from doing that, except your autoloader. If you 
wanted to have every class ending with Arg load the same class without arg; so 
QueryArg and Query are usually in the same file (but can be in separate files 
too), you could do something like this:



Re: [PHP-DEV] [RFC] Asymmetric Visibility, v2

2024-07-26 Thread Rob Landers
On Fri, Jul 26, 2024, at 16:27, Larry Garfield wrote:
> On Fri, Jul 26, 2024, at 12:58 PM, Rob Landers wrote:
> 
> >> And now that I see it spelled out more, I do agree that while it appears a 
> >> bit more verbose, and this "(set)" looks odd at first, having all the 
> >> visibility upfront is a lot clearer than having to read through the hooks 
> >> to see what visibility applies.
> >
> > On a large property hook that potentially span hundreds of lines, I'd 
> > rather only need to scroll up to the "set =>" to see how it is set vs. 
> > going all the way back up to the property itself.
> 
> If someone has a property hook that is hundreds of lines long, their code is 
> already crap and there's no hope for them.
> 
> --Larry Garfield
> 

True, but we don't always get to choose what code we work on.

— Rob

Re: [PHP-DEV] [RFC] [VOTE] Deprecations for PHP 8.4

2024-07-26 Thread Rob Landers


On Fri, Jul 26, 2024, at 08:44, Rowan Tommins [IMSoP] wrote:
> 
> 
> On 25 July 2024 23:54:53 BST, Nick Lockheart  wrote:
> >Doesn't password_hash() handle this automatically? The result of the
> >password_hash() function includes the hash and the algorithm used to
> >hash it. That way password_verify() magically works with the string
> >that came from password_hash().
> 
> For password hashing, you are always retrieving the hash for a specific user, 
> and then making a yes/no decision about it. Indeed, it's an explicit aim that 
> an attacker can't take a password and quickly scan a captured database for 
> matching hashes.

You’d be surprised how many projects get this wrong and claim it isn't a 
security issue. If you can get the hashes, you likely have the ability to run 
arbitrary sql commands and since password_hash stores the salt right in the 
hash, you just need to crack one easy to guess password -- or just run 
password_hash on your machine ... then copy it to whatever user you want to 
login as. Very few php projects salt the passwords with something 
application/user specific (see: symfony's legacy password implementation which 
does, and new one which does not; and yes I reported it, and yes, it "isn't a 
security issue") to prevent this from happening.

There are other bad defaults, such as pdo_mysql allowing more than one sql 
statement (but all other drivers not -- and mysqli is also not)... making it 
even easier to open yourself up to getting hacked if you use pdo with mysql; 
allowing a single injection to be used to insert/update or even drop tables.

Security is something hard to get right, for any language and framework. PHP 
isn't an exception here; you have to pay attention to what you are doing and 
think like an attacker, every step of the way.

> 
> For other uses of hashes, though, the opposite is true: you want to search 
> for matching hashes. For instance, when you store a file in git, it 
> calculates the SHA1 hash of its content to use as a lookup key. If that key 
> already exists in the local database, it assumes the content is the same.
> 
> That also demonstrates another difference: hashes are often shared between 
> applications, where they need to be using an agreed algorithm. If a package 
> manager requires SHA1 hashes of each file, you can't just substitute SHA256 
> hashes without some other agreed changes.
> 
> Tempting though a "secure_hash" function is, I don't think it's practical for 
> a lot of the places hashing is used.

I think we can borrow from a recent RFC to return more than one thing:

secure_hash($data, $algorithm = null): [$algorithm, $hash, $updated_algorithm, 
$updated_hash];

if you pass in an algorithm, it has to have been considered "secure" within the 
last two major versions*, it also returns an optional "updated" part, where it 
can be used to update the hash in your database, if needed.

— Rob

Re: [PHP-DEV] [RFC] [VOTE] Deprecations for PHP 8.4

2024-07-26 Thread Rob Landers
On Fri, Jul 26, 2024, at 15:02, Tim Düsterhus wrote:
> HI
> 
> On 7/26/24 14:50, Rob Landers wrote:
> >>> $_SESSION['token'] = md5(uniqid(mt_rand(), true));
> >>
> >> *Exactly* the md5-uniqid construction that is called out as unsafe in
> >> the RFC and used in a security context.
> > 
> > In regards to hashing, this is likely fine; for now. There still isn't an 
> > arbitrary pre-image attack on md5 (that I'm aware of). Can you create a 
> > random file with a matching hash? Yes, in a few seconds, on modern 
> > hardware. But you cannot yet make it have arbitrary contents in our 
> > lifetime. The NSA probably has something like this though, but if so, this 
> > isn't widely known.
> 
> Neither collision-, nor pre-image resistance is relevant here. The 
> attack vector is a brute force attack / an attacker guessing the token 
> rather than the token's contents.

You do realize that GUID and md5 hashes are the same size? One does not simply 
"guess" a GUID or an md5 hash. gravatar used md5 until a couple of years ago, 
and had millions? billions? of emails addresses and zero collisions.

> > That being said, this is just randomly creating a random id without leaking 
> > it's internal construction, no different than putting an md5 in a UUID-v8. 
> > The real issue here is the use of uniqid() and rand(), making it quite 
> > likely (at scale, at least) that a session id will overlap with another 
> > session id.
> 
> The point is that it showcases a fundamental misunderstanding of what 
> MD5 (or really any other hash algorithm) does for you. The application 
> of the MD5 does not make the token more random or more unique or 
> whatever positive adjective you would like to use. It would be equally 
> strong (or rather weak) if the output of `uniqid(mt_rand(), true)` was 
> used directly.

Yes, it does, but probably not how you think. It would be much weaker to leak 
the internal construction (uniqid(mt_rand(), true)) because then someone could 
literally guess a working id if they knew when the id was generated (depending 
on the size of mt_rand, rate limits, etc).

By wrapping it in an md5, it is literally unguessable how it is constructed, 
but the construction is still crap in this case.

> 
> As per Kerckhoffs's principle, the security of the algorithm must not 
> rely on the attacker not knowing how it's implemented. Given how 
> prevalent constructions like the above are, an attacker could make an 
> educated guess about how it looks like and match their own token against 
> a precomputed table to find out if it matches.

In this example, an ID is being constructed. If it needs uniqueness, the ID is 
being constructed incorrectly, but if you could argue that a GUID would fit the 
bill here, md5 has more "entropy" than a GUIDv4. But due to how the md5 is 
constructed, it actually has less entropy. So, I think we both can agree that 
the construction is crap. However, the usage of md5 doesn't matter here. If it 
really bothers you, craft a GUIDv8 from it.

But to Kerckhoffs's principle, that is in regards to encryption ... this is not 
encryption.

— Rob

Re: [PHP-DEV] [RFC] Asymmetric Visibility, v2

2024-07-26 Thread Rob Landers


On Fri, Jul 26, 2024, at 13:36, Jordi Boggiano wrote:
> On 21.07.2024 11:21, Rob Landers wrote:
>> 
>> On Sat, Jul 20, 2024, at 23:51, Larry Garfield wrote:
>>> On Sat, Jul 20, 2024, at 7:22 AM, Rodrigo Vieira wrote:
>>> > Will the alternative syntax on hook not even be put to a vote?
>>> 
>>> It was, a year and a half ago when Aviz was first proposed.  The preference 
>>> was split, but leaned toward the prefix-style syntax.  So we went with 
>>> that.  I don't think we'll ever get everyone to want the same syntax, but 
>>> we're using the one that was both somewhat more popular, and (as discussed 
>>> in the RFC) arguably superior.
>>> 
>>> As the "comments in yield from" thread has shown, *any* even slight change 
>>> to PHP's syntax will require work from static analysis tools.  That's the 
>>> nature of the problem space, regardless of the syntax specifics.
>>> 
>>> --Larry Garfield
>>> 
>> 
>> Just to play devil’s advocate, it was also before we had property hooks who 
>> advertised itself as a way to “wrap and guard access to object properties” 
>> but we are simply ignoring their existence here.
>> 
> Just to compare them, because my initial gut feel was to say "yes please just 
> put this together with hooks"..
> 
> As far as I understand these would be the two options?
> 
>  class C {
>  public protected(set) $answer { get => 42; set => { $this->answer = 
> $value * 2; }
>  public private(set) $publicReadOnly;
>  }
> 
> 
>  class C {
>  public $answer { get => 42; protected set => { $this->answer = 
> $value * 2; }
>  public $publicReadOnly { private set; }
>  }
> 
> And now that I see it spelled out more, I do agree that while it appears a 
> bit more verbose, and this "(set)" looks odd at first, having all the 
> visibility upfront is a lot clearer than having to read through the hooks to 
> see what visibility applies.

On a large property hook that potentially span hundreds of lines, I'd rather 
only need to scroll up to the "set =>" to see how it is set vs. going all the 
way back up to the property itself.

Re: [PHP-DEV] [RFC] [VOTE] Deprecations for PHP 8.4

2024-07-26 Thread Rob Landers
On Fri, Jul 26, 2024, at 13:58, Tim Düsterhus wrote:
> Hi
> 
> On 7/26/24 08:35, Peter Stalman wrote:
> > How prevalent is this exactly? PHP 4 ended support in 2008.  I think
> > putting warning labels on these things in the docs is enough, but we can't
> > go around locking up every kitchen knife just because there are some idiots
> > out there who read a book from the 50s about the war.
> 
> I just Googled "PHP tutorial" and found https://www.phptutorial.net/ as 
> the second search result, which considers itself to be "the modern PHP 
> tutorial".
> 
> I've clicked at the CSRF section 
> (https://www.phptutorial.net/php-tutorial/php-csrf/) and what do I find:
> 
> > $_SESSION['token'] = md5(uniqid(mt_rand(), true));
> 
> *Exactly* the md5-uniqid construction that is called out as unsafe in 
> the RFC and used in a security context.

In regards to hashing, this is likely fine; for now. There still isn't an 
arbitrary pre-image attack on md5 (that I'm aware of). Can you create a random 
file with a matching hash? Yes, in a few seconds, on modern hardware. But you 
cannot yet make it have arbitrary contents in our lifetime. The NSA probably 
has something like this though, but if so, this isn't widely known.

That being said, this is just randomly creating a random id without leaking 
it's internal construction, no different than putting an md5 in a UUID-v8. The 
real issue here is the use of uniqid() and rand(), making it quite likely (at 
scale, at least) that a session id will overlap with another session id.

— Rob

  1   2   3   >