Re: list.pop(0) vs. collections.dequeue

Alf P. Steinbach Sat, 23 Jan 2010 01:42:40 -0800

* Steve Howell:

On Jan 23, 12:32 am, "Alf P. Steinbach" <al...@start.no> wrote:

* Steve Howell:

On Jan 23, 12:13 am, Terry Reedy <tjre...@udel.edu> wrote:

Challenge yes, mock no.
Part of writing good basic data structures is not adding needless
complication from featuritis and not penalizing 99.99% of access to
satify a .01% need better satisfied another way.

I would like to challenge your assertion that advancing ob_item
instead of doing memmove during list_ass_slice would impact the
performance of list accesses in any way.  It would only slow down
operations that add/insert items into the list by, and then only by a
single conditional statement, and those add/insert operations are
already O(N) to begin with.

I'm sorry, no, the last part is incorrect.

Appending to a 'list' can currently be constant time, if OS reallocation is
constant time (as the string '+' optimization relies on).


That's true, but it's also irrelevant, as the pop optimization would
happen in a branch of code that never gets executed during list
appending:

        if (d < 0) { /* Delete -d items */
                /* ADD CODE HERE TO AVOID memmove
                   when ilow == 0 (i.e. ihigh + d == 0)
                */
                memmove(&item[ihigh+d], &item[ihigh],
                        (Py_SIZE(a) - ihigh)*sizeof(PyObject *));
                list_resize(a, Py_SIZE(a) + d);
                item = a->ob_item;
        }

With the pop optimization it can no longer be constant time without risking an
accumulation of unused memory, a memory leak, although it can be amortized
constant time, at the cost of wasting some percentage of memory.


list_resize already overallocates memory to allow room for growth.
Whenever you did an append to the list that would force a realloc, you
could first check to see if there is unused stuff at the front and do
the memmove then and reclaim the unfreed memory.  So instead of doing
a paying for memmove on every pop [at front], you only pay for it when
the list goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

Oh. If 'list' already uses a doubling buffer then the only overhead from theoptimization would, AFAICS, be a single add in every indexing. Which might beacceptable (at least it sounds very reasonable in the context of Python).

Re terminology: I write "doubling buffer" to mean increase of buffer size by afactor. It's often 2, but might be e.g. 1.5, whatever. The point of using aconstant factor is to avoid quadratic time for loops doing appending, i.e. theconstant factor size increase yields amortized constant time per append.

The sizes that you quote above, on the other hand, look like rather arbitrarybuffer size increases where the size to increase by is limited to a certainsmall range. With copying or moving of everything at each buffer size increasethat would not yield amortized constant time. It yield linear time, andquadratic time for a loop doing appends.

But, anyway, if 'list' already uses a doubling buffer then the only overheadfrom the pop optimization would, AFAICS, be a single add in every indexing.

On the third & gripping hand, however, a proof-of-concept actual implementation(patch) would be needed to ensure that one doesn't overlook any showstopper orserious problem, and to provide timings. And the special case would have to bedocumented as a special case. Hm, it would be nice if the Python docs offeredcomplexity (time) guarantees in general...



Cheers,

- Alf
--
http://mail.python.org/mailman/listinfo/python-list

Re: list.pop(0) vs. collections.dequeue

Reply via email to