Hey Larry,

On 1.12.2025 22:36:21, Larry Garfield wrote:
Hi folks.  Ilija and I would like to present our latest RFC endeavor, pattern 
matching:

https://wiki.php.net/rfc/pattern-matching

You may note the date on the RFC is from 2020.  Yes, we really have had this 
one in-progress for 5 years. :-)  (Though it was inactive for many of those 
years, in fairness.)  Pattern matching was intended as the next follow up to 
Enums, as it's a stepping stone toward full ADT support.  However, we also feel 
it has enormous benefit on its own for simplifying complex comparisons.

This RFC has been through numerous iterations, including a full implementation 
rewrite just recently that made a number of features much easier.  We have 
therefore included two patterns that were previously slated for later inclusion 
but turned out to be trivially easy in the new approach.  (Variable pinning and 
numeric comparison.)

Nonetheless, there are two outstanding questions on which we are looking for 
feedback.

Naturally given the timing, we will not be calling a vote until at least late 
January, regardless of how the discussion goes.  So, plenty of time to express 
your support. :-)


Thanks for bringing pattern matching up for discussion again.

----


I'd like to note that the class-access is very ugly.

// Shorthand
if ($p is Point(:$z, x: 3, :$y)) {
    print "x is 3 and y is $y and z is $z.";
}

The RFC gives as reasoning that the colon prefix is needed for support of positional parameters in ADTs. Sure. That's fine to anticipate these.

But what's not fine is using an inconsistent syntax for variable bindings across different contexts. In arrays binding is just a bare variable. In objects it suddenly needs a colon? What.

Also, a colon is very prone to being missed in the future with ADTs. Point::2D($y, $x) vs Point::2D(:$y, :$x). Means something completely different, but if you mess up just having the colon there or not, is a serious problem.

Can we instead find some solution, which satisfies both and still delivers consistency?


An earlier iteration of the RFC had the following very nice construction:

$p is Point&{ $z, x: 3, $y }

This just worked. It's a Point class, and then it matches the properties of the object. Nice.


This also works for future ADTs. Move::Forward&{ $amount }. Then, if there's a desire to actually *positionally* match an object. Then it's logical to use a parenthesized expression, for a tuple. I.e.:

$move is Move::Forward($a)

Where $a is assigned the first value passed to Move::Forward.


Similarly for destructuring without class name no longer works:

$json = json_decode($myInput);
if ($json is stdClass(type: "store", :$value)) {
   // why do I need to know/specify that it's a stdclass?! I'm just interested in the properties.
}

vs.

if ($json is { type: "store", $value }) {
   //
}


This satisfies the requirements of keep the language clear and intuitive:
- Any standalone variable is bound. No weird colon shenanigans. The syntax is consistent. - Positional binding is quite intuitively using parenthesis - you construct the enum with Foo::Bar($var) and you read it back on the right hand side with Foo::Bar($var).
- It naturally allows destructuring without class name.
- It makes it hard to accidentally write something totally different to what was meant.

(Also, it's likely more intuitive to users from other languages, like rust, which also has {} for named stuff and () for positional stuff.)


Further this particular syntax works nicely with a future scope of object destructuring, akin to array destructuring. As an example:

function addVec(Point $p, Vec $v) {
  Point(:$px, :$py) = $p; // I already know this is a Point, why do I need to repeat it. It also looks ugly and quite a bit like a left-hand function call. Like... assigning something to a returned reference?   // Or would you do {$px, $py} for object destructuring? Well that's now truly inconsistent.
  Vec(:$vx, :$vy) = $v;
  return new Point($px + $vx, $py + $vy);
}

vs.

function addVec(Point $p, Vec $v) {
  {$px, $py} = $p; // Plain and simple. Perfectly straightforward.
  {$vx, $vy} = $v;
  return new Point($px + $vx, $py + $vy);
}


I've also heard a consideration about "Foo::Bar & { $var }" being ambiguous with respect to "is Foo::Bar now a const or an ADT class". This may be resolved in the VM. I don't consider this a major issue, and is simply something which can be disambiguated at optimizer-time or run-time, depending on what type of symbol it is.


----


I'm deeply unsatisfied by the handling of object properties:

"Note that matching against a property's value implies reading that property's value", "If the property is uninitialized, an error will be thrown." and "If the property is undefined and none of the above apply, it will evaluate to null and a Warning will be issued."

This is wildly inconsistent with arrays:

"Of particular note, the pattern matching approach automatically handles array_key_exists() checking. That means a missing array element will not trigger a warning, whereas with a traditional if ($foo['bar'] === 'baz') approach missing values must be accounted for by the developer manually."


Sure, a pattern match will read an objects property. Just like it reads an arrays entry. I assume the goal is "let's warn when an object property is typoed". But it just makes for two tiers. arrays get key_exists(), properties do not get property_exists(). I welcome surprises.

From my point of view, pattern matching is an "is" operation. Thus it ought expressing isset-like semantics. I.e. the approach for arrays is correct, and should be mirrored to objects.

I definitely think the approach of "let's warn about typos" is laudable, but consistency is important.

It also means that uninitialized properties forcibly throw. It also has subtle ordering implications on the semantics, given that the implementation internally short-circuits. E.g. (assuming something like "class ResponseOrError { string $type; Exception $e; string $response; }"):

if ($obj is ResponseOrError { type: "error", exception: $e }) { throw $e; }

does not throw if $exception is uninitialized. and $type is not error. But $obj is ResponseOrError { exception: $e, type: "error" } will certainly throw.

It further means that there needs to be some internal checked and you cannot simply write:

if ($obj is ResponseOrError { exception: $e }) { throw $e; }


This is bad design and takes a lot of flexibility, just for being typo-safe.

There are better approaches towards typo-safety, e.g. in future (PHP 9) we could change isset() and all other similar checks (coalesce and this proposal) to immediately throw when a property is checked for existence, whose name does not exist on a class which is not marked #[\AllowDynamicProperties].
We should make use of that instead of shoe-horning this into this proposal.


----


Open questions:

- match() "is" placement:
I prefer match() is {}  rather than an "is" inside the construct. Simpler to me, but I think either choice is fine.

- Positional array enforcement:
It's relatively simple to intentionally get positional arrays via array_values(). I also don't think it's unexpected. That's just how PHP's arrays work. Enforcing positional arrays however will be quite surprising if e.g. an entry was removed:

$a = [1, 2, 3];
unset($a[1]);
if ($a is [1, 3]) {
   // huh? It's [1, 2 => 3], not [1, 3].
}


Thanks,

Bob

Reply via email to