Hey Larry,
On 1.12.2025 22:36:21, Larry Garfield wrote:
Hi folks. Ilija and I would like to present our latest RFC endeavor, pattern
matching:
https://wiki.php.net/rfc/pattern-matching
You may note the date on the RFC is from 2020. Yes, we really have had this
one in-progress for 5 years. :-) (Though it was inactive for many of those
years, in fairness.) Pattern matching was intended as the next follow up to
Enums, as it's a stepping stone toward full ADT support. However, we also feel
it has enormous benefit on its own for simplifying complex comparisons.
This RFC has been through numerous iterations, including a full implementation
rewrite just recently that made a number of features much easier. We have
therefore included two patterns that were previously slated for later inclusion
but turned out to be trivially easy in the new approach. (Variable pinning and
numeric comparison.)
Nonetheless, there are two outstanding questions on which we are looking for
feedback.
Naturally given the timing, we will not be calling a vote until at least late
January, regardless of how the discussion goes. So, plenty of time to express
your support. :-)
Thanks for bringing pattern matching up for discussion again.
----
I'd like to note that the class-access is very ugly.
// Shorthand
if ($p is Point(:$z, x: 3, :$y)) {
print "x is 3 and y is $y and z is $z.";
}
The RFC gives as reasoning that the colon prefix is needed for support
of positional parameters in ADTs. Sure. That's fine to anticipate these.
But what's not fine is using an inconsistent syntax for variable
bindings across different contexts. In arrays binding is just a bare
variable. In objects it suddenly needs a colon? What.
Also, a colon is very prone to being missed in the future with ADTs.
Point::2D($y, $x) vs Point::2D(:$y, :$x). Means something completely
different, but if you mess up just having the colon there or not, is a
serious problem.
Can we instead find some solution, which satisfies both and still
delivers consistency?
An earlier iteration of the RFC had the following very nice construction:
$p is Point&{ $z, x: 3, $y }
This just worked. It's a Point class, and then it matches the properties
of the object. Nice.
This also works for future ADTs. Move::Forward&{ $amount }. Then, if
there's a desire to actually *positionally* match an object. Then it's
logical to use a parenthesized expression, for a tuple. I.e.:
$move is Move::Forward($a)
Where $a is assigned the first value passed to Move::Forward.
Similarly for destructuring without class name no longer works:
$json = json_decode($myInput);
if ($json is stdClass(type: "store", :$value)) {
// why do I need to know/specify that it's a stdclass?! I'm just
interested in the properties.
}
vs.
if ($json is { type: "store", $value }) {
//
}
This satisfies the requirements of keep the language clear and intuitive:
- Any standalone variable is bound. No weird colon shenanigans. The
syntax is consistent.
- Positional binding is quite intuitively using parenthesis - you
construct the enum with Foo::Bar($var) and you read it back on the right
hand side with Foo::Bar($var).
- It naturally allows destructuring without class name.
- It makes it hard to accidentally write something totally different to
what was meant.
(Also, it's likely more intuitive to users from other languages, like
rust, which also has {} for named stuff and () for positional stuff.)
Further this particular syntax works nicely with a future scope of
object destructuring, akin to array destructuring. As an example:
function addVec(Point $p, Vec $v) {
Point(:$px, :$py) = $p; // I already know this is a Point, why do I
need to repeat it. It also looks ugly and quite a bit like a left-hand
function call. Like... assigning something to a returned reference?
// Or would you do {$px, $py} for object destructuring? Well that's
now truly inconsistent.
Vec(:$vx, :$vy) = $v;
return new Point($px + $vx, $py + $vy);
}
vs.
function addVec(Point $p, Vec $v) {
{$px, $py} = $p; // Plain and simple. Perfectly straightforward.
{$vx, $vy} = $v;
return new Point($px + $vx, $py + $vy);
}
I've also heard a consideration about "Foo::Bar & { $var }" being
ambiguous with respect to "is Foo::Bar now a const or an ADT class".
This may be resolved in the VM. I don't consider this a major issue, and
is simply something which can be disambiguated at optimizer-time or
run-time, depending on what type of symbol it is.
----
I'm deeply unsatisfied by the handling of object properties:
"Note that matching against a property's value implies reading that
property's value", "If the property is uninitialized, an error will be
thrown." and "If the property is undefined and none of the above apply,
it will evaluate to null and a Warning will be issued."
This is wildly inconsistent with arrays:
"Of particular note, the pattern matching approach automatically handles
array_key_exists() checking. That means a missing array element will not
trigger a warning, whereas with a traditional if ($foo['bar'] === 'baz')
approach missing values must be accounted for by the developer manually."
Sure, a pattern match will read an objects property. Just like it reads
an arrays entry.
I assume the goal is "let's warn when an object property is typoed". But
it just makes for two tiers. arrays get key_exists(), properties do not
get property_exists(). I welcome surprises.
From my point of view, pattern matching is an "is" operation. Thus it
ought expressing isset-like semantics. I.e. the approach for arrays is
correct, and should be mirrored to objects.
I definitely think the approach of "let's warn about typos" is laudable,
but consistency is important.
It also means that uninitialized properties forcibly throw. It also has
subtle ordering implications on the semantics, given that the
implementation internally short-circuits. E.g. (assuming something like
"class ResponseOrError { string $type; Exception $e; string $response; }"):
if ($obj is ResponseOrError { type: "error", exception: $e }) { throw $e; }
does not throw if $exception is uninitialized. and $type is not error.
But $obj is ResponseOrError { exception: $e, type: "error" } will
certainly throw.
It further means that there needs to be some internal checked and you
cannot simply write:
if ($obj is ResponseOrError { exception: $e }) { throw $e; }
This is bad design and takes a lot of flexibility, just for being typo-safe.
There are better approaches towards typo-safety, e.g. in future (PHP 9)
we could change isset() and all other similar checks (coalesce and this
proposal) to immediately throw when a property is checked for existence,
whose name does not exist on a class which is not marked
#[\AllowDynamicProperties].
We should make use of that instead of shoe-horning this into this proposal.
----
Open questions:
- match() "is" placement:
I prefer match() is {} rather than an "is" inside the construct.
Simpler to me, but I think either choice is fine.
- Positional array enforcement:
It's relatively simple to intentionally get positional arrays via
array_values(). I also don't think it's unexpected. That's just how
PHP's arrays work. Enforcing positional arrays however will be quite
surprising if e.g. an entry was removed:
$a = [1, 2, 3];
unset($a[1]);
if ($a is [1, 3]) {
// huh? It's [1, 2 => 3], not [1, 3].
}
Thanks,
Bob