On 03/04/2024 00:01, Ilija Tovilo wrote:
Data classes are classes with a single additional > zend_class_entry.ce_flags flag. So unless customized, they behave as
> classes. This way, we have the option to tweak any behavior we would > like, but we don't need to. > > Of course, this will still require an analysis of what behavior we > might want to tweak.

Regardless of the implementation, there are a lot of interactions we will want to consider; and we will have to keep considering new ones as we add to the language. For instance, the Property Hooks RFC would probably have needed a section on "Interaction with Data Classes".

On the other hand, maybe having two types of objects to consider each time is better than having to consider combinations of lots of small features.


On a practical note, a few things I've already thought of to consider:

- Can a data class have readonly properties (or be marked "readonly data class")? If so, how will they behave? - Can you explicitly use the "clone" keyword with an instance of a data class? Does it make any difference?
- Tied into that: can you implement __clone(), and when will it be called?
- If you implement __set(), will copy-on-write be triggered before it's called?
- Can you implement __destruct()? Will it ever be called?



Consider this example, which would  > work with the current approach: > > 
$shapes[0]->position->zero!();

I find this concise example confusing, and I think there's a few things to unpack here...


Firstly, there's putting a data object in an array:

$numbers = [ new Number(42) ];
$cow = $numbers;
$cow[0]->increment!();
assert($numbers !== $cow);

This is fairly clearly equivalent to this:

$numbers = [ 42 ];
$cow = $numbers;
$cow[0]++;
assert($numbers !== $cow);

CoW is triggered on the array for both, because ++ and ->increment!() are both clearly modifications.


Second, there's putting a data object into another data object:

$shape = new Shape(new Position(42,42));
$cow = $shape;
$cow->position->zero!();
assert($shape !== $cow);

This is slightly less obvious, because it presumably depends on the definition of Shape. Assuming Position is a data class:

- If Shape is a normal class, changing the value of $cow->position just happens in place, and the assertion fails

- If Shape is a readonly class (or position is a readonly property on a normal class), changing the value of $cow->position shouldn't be allowed, so this will presumably give an error

- If Shape is a data class, changing the value of $shape->position implies a "mutation" of $shape itself, so we get a separation before anything is modified, and the assertion passes

Unlike in the array case, this behaviour can't be resolved until you know the run-time type of $shape.


Now, back to your example:

$shapes = [ new Shape(new Position(42,42)) ];
$cow = $shapes;
$shapes[0]->position->zero!(); assert($cow !== $shapes);

This combines the two, meaning that now we can't know whether to separate the array until we know (at run-time) whether Shape is a normal class or a data class.

But once that is known, the whole of "->position->zero!()" is a modification to $shapes[0], so we need to separate $shapes.


Without such a class-wide marker, you'll need to remember to add the
special syntax exactly where applicable.

$shapes![0]!->position!->zero();


The array access doesn't need any special marker, because there's no ambiguity. The ambiguous call is the reference to ->position: in your current proposal, this represents a modification *if Shape is a data class, and is itself being modified*. My suggestion (or really, thought experiment) was that it would represent a modification *if it has a ! in the call*.

So if Shape is a readonly class:

$shapes[0]->position->!zero();
// Error: attempting to modify readonly property Shape::$position

$shapes[0]->!position->!zero();
// OK; an optimised version of:
$shapes[0] = clone $shapes[0] with [
    'position' =>  (clone $shapes[0]->position with ['x'=>0,'y'=>0])
];

If ->! is only allowed if the RHS is either a readonly property or a mutating method, then this can be reasoned about statically: it will either error, or cause a CoW separation of $shapes. It also allows classes to mix aspects of "data class" and "normal class" behaviour, which might or might not be a good idea.


This is mostly just a thought experiment, but I am a bit concerned that code like this is going to be confusingly ambiguous:

$item->shape->position->zero!();

What is going to be CoW cloned, and what is going to be modified in place? I can't actually know without knowing the definition behind both $item and $item->shape. It might even vary depending on input.


Regards,

--
Rowan Tommins
[IMSoP]

Reply via email to