Hi Nikita, Thank you for taking the time to explain in detail.
One more question below. -Mike > On May 28, 2021, at 10:31 AM, Nikita Popov <[email protected]> wrote: > > On Fri, May 28, 2021 at 3:11 AM Mike Schinkel <[email protected] > <mailto:[email protected]>> wrote: > > On May 26, 2021, at 7:44 PM, Hendra Gunawan <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hello. > > > >> > >> Yes, but Nikita wrote this note about technical limitations at the bottom > >> of the repo README: > >> > >> Due to technical limitations, it is not possible to create mutable APIs for > >> primitive types. Modifying $self within the methods is not possible (or > >> rather, will have no effect, as you'd just be changing a copy). > >> > > > > If it is solved, this is a great accomplishment for PHP. But I think > > scalar object is not going anywhere in the near future. If you are not > > convinced, please take a look > > https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181 > > <https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181>. > > Nikita's comment actually causes me more questions, not fewer. > > Nikita says "We need to know that $a[$b][$c is an array in order to determine > that the call should be performed by-reference. However, we already need to > convert $a, $a[$b] and $a[$b][$c] into references before we know about that." > > > How then are we able to do the following?: > > $a[$b][$c][] = 1; > > In this case, we're clearly performing a write operation on the array. If you > want to know the technical details, the compiler will convert this into a > sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is for > "write", which will perform all the necessary special handling, such as > copy-on-write separation and auto-vivification. > > How also can we do this: > > byref($a[$b][$c]); > function byref(&$x) { > $x[]= 2; > } > > See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD> <https://3v4l.org/aPvTD > <https://3v4l.org/aPvTD>> > > This is a more complex case. In this case the compiler doesn't know in > advance whether the argument is passed by value or by reference. What happens > here is: > > 1. INIT_FCALL determines that we're calling byref(). > 2. CHECK_FUNC_ARG for the first arg determines that this argument is passed > by-reference for this function. > 3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R or > to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined. > > I assume that in both my examples $a[$b][$c] would be considered an > "lvalue"[1] and can be a target of assignment triggered by either the > assignment operator or calling the function and passing to a by-ref > parameter. > > [1] > https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values > > <https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values> > > So is there a reason that -> on an array could not trigger the same? Is > Nikita saying that the performance of those calls performed by-reference > would not matter because they are always being assigned, at least in the > former case, but to do so with array expressions would be problematic? > (Ignoring there is no code in the wild that currently uses the -> operator, > or does that matter?) > > Note that the byref($a[$b][$c]) case only works because we know which > function is being called at the time the argument is passed. If you have > $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or > by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the > argument by-value or by-reference. But we can only know that once we have > already evaluated $a[$b][$c] and found out that it is indeed an array. > > The only way around this is to *always* perform a for-write fetch of > $a[$b][$c], even though we don't know that the end result is going to be an > array. However, doing so would pessimize the performance of code operating on > objects. Consider $some_huge_shared_array[0]->foo(). If we fetch > $some_huge_shared_array for write, we'll be required to perform a full > duplication of the array in preparation for a possible future write. If it > turns out that $some_huge_shared_array[0] is actually an object, or that > $some_huge_shared_array[0] is an array and the performed operation is > by-value, then we have performed this copy unnecessarily. > > I don't believe this is acceptable. > > I ask honestly to understand, and not as a rhetorical question. > > Additionally, if the case of updating an array variable is not a problem but > updating an array expression is a problem then why not just limit the -> > operator to only work on expressions for immutable methods and require > variables for mutable methods? I would think should be easy enough to throw > an error for those specific "methods" that would be mutable, such as shift() > and unshift() if $a[$b][$c]->shift('foo') were called? > > There are externalities associated even with the simple $x->foo() case, > though they are less severe. They primarily involve reduced ability to > analyze code in opcache. > > In either case, this limitation does not seem reasonable to me from a > language design perspective. If $a->push($b) works, then $a[$k]->push($b) can > reasonably be expected to work as well. > > Or maybe just completely limit using the -> operator on array variables. > Don't work on any array expressions for consistency. There is already > precedence in PHP for operators that work on variables and not on > expressions: ++, --, and &. > > IF we can get a thumbs up from Nikita that one of these would actually be > possible then I think the next step should be to write up a list of proposed > array methods that would be implemented to support the -> operator with > arrays and put them in an RFC, and to flesh out any edge cases. > > The only correct way to resolve this issue is to not support mutable > operations. I don't think I agree that this is the only correct way, but I respect your position of authority on the matter. > I don't think there's much need for mutable operations. sort() and shuffle() > would be best implemented by returning a new array instead. array_push() is > redundant with $array[]. array_shift() and array_unshift() should never be > used. Why do you say array_shift() and array_unshift() should never be used? When I wrote the above questions the use-case I was thinking about most was $a->unshift($value) as I use array_unshift() more than most of the other array functions. Do you mean that these if applied as "methods" to an array should not be use immutably — meaning in-place is bad but returning an array value that has been shifted would be okay — or do you have some other reason you believe that shifting an array is bad? Note the reason I have used them in the past is when I need to pass an array to a function written by someone else that expects the array to be ordered. Also, what about very large arrays? I assume — which could be a bad assumption — that PHP internally can be more efficient about how it handles array_unshift() instead of just duplicating the large array so as to add an element at the beginning? -Mike
