On Fri, May 28, 2021 at 11:01 PM Mike Schinkel <m...@newclarity.net> wrote:

> Hi Nikita,
>
> Thank you for taking the time to explain in detail.
>
> One more question below.
>
> -Mike
>
> On May 28, 2021, at 10:31 AM, Nikita Popov <nikita....@gmail.com> wrote:
>
> On Fri, May 28, 2021 at 3:11 AM Mike Schinkel <m...@newclarity.net> wrote:
>
>> > On May 26, 2021, at 7:44 PM, Hendra Gunawan <the.liquid.me...@gmail.com>
>> wrote:
>> >
>> > Hello.
>> >
>> >>
>> >> Yes, but Nikita wrote this note about technical limitations at the
>> bottom of the repo README:
>> >>
>> >> Due to technical limitations, it is not possible to create mutable
>> APIs for
>> >> primitive types. Modifying $self within the methods is not possible (or
>> >> rather, will have no effect, as you'd just be changing a copy).
>> >>
>> >
>> > If it is solved, this is a great accomplishment for PHP. But I think
>> > scalar object is not going anywhere in the near future. If you are not
>> > convinced, please take a look
>> >
>> https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.
>>
>> Nikita's comment actually causes me more questions, not fewer.
>>
>> Nikita says "We need to know that $a[$b][$c is an array in order to
>> determine that the call should be performed by-reference. However, we
>> already need to convert $a, $a[$b] and $a[$b][$c] into references before we
>> know about that."
>>
>> How then are we able to do the following?:
>>
>> $a[$b][$c][] = 1;
>>
>
> In this case, we're clearly performing a write operation on the array. If
> you want to know the technical details, the compiler will convert this into
> a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is
> for "write", which will perform all the necessary special handling, such as
> copy-on-write separation and auto-vivification.
>
> How also can we do this:
>>
>> byref($a[$b][$c]);
>> function byref(&$x) {
>>     $x[]= 2;
>> }
>>
>> See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD>
>>
>
> This is a more complex case. In this case the compiler doesn't know in
> advance whether the argument is passed by value or by reference. What
> happens here is:
>
> 1. INIT_FCALL determines that we're calling byref().
> 2. CHECK_FUNC_ARG for the first arg determines that this argument is
> passed by-reference for this function.
> 3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R
> or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined.
>
> I assume that in both my examples $a[$b][$c] would be considered an
>> "lvalue"[1] and can be a target of assignment triggered by either the
>> assignment operator or calling the function and passing to a by-ref
>> parameter.
>>
>> [1]
>> https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values
>>
>> So is there a reason that -> on an array could not trigger the same?  Is
>> Nikita saying that the performance of those calls performed by-reference
>> would not matter because they are always being assigned, at least in the
>> former case, but to do so with array expressions would be problematic?
>> (Ignoring there is no code in the wild that currently uses the -> operator,
>> or does that matter?)
>>
>
> Note that the byref($a[$b][$c]) case only works because we know which
> function is being called at the time the argument is passed. If you have
> $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or
> by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the
> argument by-value or by-reference. But we can only know that once we have
> already evaluated $a[$b][$c] and found out that it is indeed an array.
>
> The only way around this is to *always* perform a for-write fetch of
> $a[$b][$c], even though we don't know that the end result is going to be an
> array. However, doing so would pessimize the performance of code operating
> on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch
> $some_huge_shared_array for write, we'll be required to perform a full
> duplication of the array in preparation for a possible future write. If it
> turns out that $some_huge_shared_array[0] is actually an object, or that
> $some_huge_shared_array[0] is an array and the performed operation is
> by-value, then we have performed this copy unnecessarily.
>
> I don't believe this is acceptable.
>
> I ask honestly to understand, and not as a rhetorical question.
>>
>> Additionally, if the case of updating an array variable is not a problem
>> but updating an array expression is a problem then why not just limit the
>> -> operator to only work on expressions for immutable methods and require
>> variables for mutable methods?  I would think should be easy enough to
>> throw an error for those specific "methods" that would be mutable, such as
>> shift() and unshift() if $a[$b][$c]->shift('foo') were called?
>>
>
> There are externalities associated even with the simple $x->foo() case,
> though they are less severe. They primarily involve reduced ability to
> analyze code in opcache.
>
>
> In either case, this limitation does not seem reasonable to me from a
> language design perspective. If $a->push($b) works, then $a[$k]->push($b)
> can reasonably be expected to work as well.
>
>
>> Or maybe just completely limit using the -> operator on array variables.
>> Don't work on any array expressions for consistency. There is already
>> precedence in PHP for operators that work on variables and not on
>> expressions:  ++, --, and &.
>>
>> IF we can get a thumbs up from Nikita that one of these would actually be
>> possible then I think the next step should be to write up a list of
>> proposed array methods that would be implemented to support the -> operator
>> with arrays and put them in an RFC, and to flesh out any edge cases.
>>
>
> The only correct way to resolve this issue is to not support mutable
> operations.
>
>
> I don't think I agree that this is the only correct way, but I respect
> your position of authority on the matter.
>
> I don't think there's much need for mutable operations. sort() and
> shuffle() would be best implemented by returning a new array instead.
> array_push() is redundant with $array[]. array_shift() and array_unshift()
> should never be used.
>
>
> Why do you say array_shift() and array_unshift() should never be used?
> When I wrote the above questions the use-case I was thinking about most was
> $a->unshift($value) as I use array_unshift() more than most of the other
> array functions.
>
> Do you mean that these if applied as "methods" to an array should not be
> use immutably — meaning in-place is bad but returning an array value that
> has been shifted would be okay — or do you have some other reason you
> believe that shifting an array is bad?  Note the reason I have used them in
> the past is when I need to pass an array to a function written by someone
> else that expects the array to be ordered.
>
> Also, what about very large arrays?  I assume — which could be a bad
> assumption — that PHP internally can be more efficient about how it handles
> array_unshift() instead of just duplicating the large array so as to add an
> element at the beginning?
>

Arrays only support efficient push/pop operations. Performing an
array_shift() or array_unshift() requires going through the whole array to
reindex all the keys, even though you're only adding/removing one element.
In other words, array_shift() and array_unshift() are O(n) operations, not
O(1) as one would intuitively expect. If you use shift/unshift as common
operations, you're better off using a different data-structure or
construction approach.

Regards,
Nikita

Reply via email to