Hi internals,

> > > I've created a new RFC https://wiki.php.net/rfc/deque to add a `final 
> > > class Deque`
> > >
> > > This is based on the `Teds\Deque` implementation I've worked on
> > > for the https://github.com/TysonAndre/pecl-teds PECL.
> > >
> > > While `SplDoublyLinkedList` and its subclass `SplQueue`/`SplStack` exist 
> > > in the SPL, they have several drawbacks
> > > that are addressed by this RFC to add a `Deque` class (to use instead of 
> > > those):
> > >
> > > 1. `SplDoublyLinkedList` is internally represented by a doubly linked 
> > > list,
> > >    making it use roughly twice as much memory as the proposed `Deque`
> > > 2. `push`/`pop`/`unshift`/`shift` from `SplDoublyLinkedList` are slower 
> > > due to
> > >    needing to allocate or free the linked list nodes.
> > > 3. Reading values in the middle of the `SplDoublyLinkedList` is 
> > > proportional to the length of the list,
> > >    due to needing to traverse the linked list nodes.
> > > 4. `foreach` Iteration behavior cannot be understood without knowing what 
> > > constructed the
> > >    `SplDoublyLinkedList` instance or set the flags.
> > >
> > > It would be useful to have an efficient `Deque` container in the standard 
> > > library
> > > to provide an alternative without those drawbacks,
> > > as well as for the following reasons:
> > >
> > > 1. To save memory in applications or libraries that may need to store 
> > > many lists of values or run for long periods of time.
> > >    Notably, PHP's `array` type will never release allocated capacity.
> > >    See 
> > > https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
> > > 2. To provide a better alternative to `SplDoublyLinkedList`, `SplStack`, 
> > > and `SplQueue`
> > >    for use cases that require stacks or queues.
> > > 3. As a more efficient option than `array` and `SplDoublyLinkedList`
> > >    as a queue or `Deque`, especially for `unshift`.
> > >
> > > A `Deque` is more efficient than an `array` when used as a queue, more 
> > > readable, and easier to use correctly.
> > > While it is possible to efficiently remove elements from the start of an 
> > > `array` (in terms of insertion order) (though this makes 
> > > reset()/array_key_first() inefficient),
> > > it is very inefficient to prepend elements to the start of a large 
> > > `array` due to needing to either copy the array
> > > or move all elements in the internal array representation,
> > > and an `array` would use much more memory than a `Deque` when used that 
> > > way (and be slower).
> > >
> > > There are also several pitfalls to using an array as a queue for larger 
> > > queue sizes,
> > > some of which are not obvious and discovered while writing the benchmarks.
> > > (Having a better (double-ended) queue datastructure (`Deque`) than the 
> > > `SplDoublyLinkedList`
> > > would save users from needing to write code with these pitfalls):
> > >
> > > 1. `array_key_first()` and reset()`takes time proportional to the number 
> > > of elements `unset` from the start of an array,
> > >    causing it to unexpectedly be extremely slow (quadratic time) after 
> > > unsetting many elements at the start of the queue.
> > >    (when the array infrequently runs out of capacity, buckets are moved 
> > > to the front)
> > > 2. `reset()` or `end()` will convert a variable to a reference,
> > >    and php is less efficient at reading or writing to reference.
> > >    Opcache is also less efficient at optimizing uses of variables using 
> > > references.
> > > 3. More obviously, `array_unshift` and `array_shift` will take time 
> > > proportional to the number of elements in the array
> > >    (to reindex and move existing/remaining elements).
> >
> > I plan to start voting on https://wiki.php.net/rfc/deque on Friday, 
> > February 4th.
> >
> > Several changes have been made to https://wiki.php.net/rfc/deque#changelog
> > after the feedback in https://externals.io/message/116100
> >
> > - The class is now named `Collections\Deque`
> > - The api documentation in https://wiki.php.net/rfc/deque#proposal was 
> > expanded for methods.
> > - Benchmarks were updated.
> > - Like other standard datastructures, iteration over the deque is now over 
> > the original object (instead of creating a copy),
> >   and mutating the deque will be reflected in `$iterator->current()` (and 
> > moving the end with push()/pop() will affect where iteration ends).
> > - Iteration will account for calls to shift/unshift moving the start of the 
> > deque.
> >   the offsets will be corrected and values won't be skipped or iterated 
> > over multiple times.
> >   (no matter how many iterators were created by `Deque->getIterator()`)
> >   See https://wiki.php.net/rfc/deque#iteration_behavior
> > - The get()/set() methods were removed, after feedback in 
> > https://externals.io/message/116100#116214
> >
> > A WebAssembly demo is available at 
> > https://tysonandre.github.io/php-rfc-demo/deque/
> 
> I've updated the RFC https://wiki.php.net/rfc/deque yet again (no 
> implementation changes). I now plan to start voting on Saturday, February 5th.
> 
> I've also updated the WebAssembly demo at 
> https://tysonandre.github.io/php-rfc-demo/deque/ to include a benchmark to 
> illustrate the differences in performance of shift/unshift at various 
> deque/array sizes.
> A more detailed section on the performance of unshift/shift compared to array 
> and existing data structures was added to the RFC.
> 
> I've also added a discussion section for the request to `return $this` 
> (https://externals.io/message/116100#116967), removed duplicated quotes, and 
> clarified some parts of the RFC.

There's more changes I found this morning while working on test cases for 
unrelated datastructures, so I'll be postponing the vote.

1. The maximum `Collections\Deque` size is probably going to be reduced to be 
the same as the maximum array size.
   For 64-bit php builds, that's roughly 2 billion (2147483648 elements, using 
at least 32 gigabytes of memory for array or Deque),
   which was more memory than I'd assumed (16 bytes per `zval`), and needing 
that much is expected to be a rare use case that can be revisited, e.g. if 
array size limits are raised.

   Collections would need to be converted to/from arrays for serialization, 
unserialization, debug output (var_export, json_encode, etc.), etc.,
   so having a much larger maximum size didn't seem to have a compelling use 
case.

2. I found and proposed potential improvements for detection of infinite 
recursion in var_export/debug_zval_dump/json_encode affecting userland classes, 
existing spl classes, etc.,
   which seemed worth finishing discussing before starting the vote.

3. Miscellaneous refactorings

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to