* Steve Howell:
On Jan 23, 12:32 am, "Alf P. Steinbach" <al...@start.no> wrote:
* Steve Howell:

On Jan 23, 12:13 am, Terry Reedy <tjre...@udel.edu> wrote:
Challenge yes, mock no.
Part of writing good basic data structures is not adding needless
complication from featuritis and not penalizing 99.99% of access to
satify a .01% need better satisfied another way.
I would like to challenge your assertion that advancing ob_item
instead of doing memmove during list_ass_slice would impact the
performance of list accesses in any way.  It would only slow down
operations that add/insert items into the list by, and then only by a
single conditional statement, and those add/insert operations are
already O(N) to begin with.
I'm sorry, no, the last part is incorrect.

Appending to a 'list' can currently be constant time, if OS reallocation is
constant time (as the string '+' optimization relies on).


That's true, but it's also irrelevant, as the pop optimization would
happen in a branch of code that never gets executed during list
appending:

        if (d < 0) { /* Delete -d items */
                /* ADD CODE HERE TO AVOID memmove
                   when ilow == 0 (i.e. ihigh + d == 0)
                */
                memmove(&item[ihigh+d], &item[ihigh],
                        (Py_SIZE(a) - ihigh)*sizeof(PyObject *));
                list_resize(a, Py_SIZE(a) + d);
                item = a->ob_item;
        }


With the pop optimization it can no longer be constant time without risking an
accumulation of unused memory, a memory leak, although it can be amortized
constant time, at the cost of wasting some percentage of memory.

list_resize already overallocates memory to allow room for growth.
Whenever you did an append to the list that would force a realloc, you
could first check to see if there is unused stuff at the front and do
the memmove then and reclaim the unfreed memory.  So instead of doing
a paying for memmove on every pop [at front], you only pay for it when
the list goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

Oh. If 'list' already uses a doubling buffer then the only overhead from the optimization would, AFAICS, be a single add in every indexing. Which might be acceptable (at least it sounds very reasonable in the context of Python).

Re terminology: I write "doubling buffer" to mean increase of buffer size by a factor. It's often 2, but might be e.g. 1.5, whatever. The point of using a constant factor is to avoid quadratic time for loops doing appending, i.e. the constant factor size increase yields amortized constant time per append.

The sizes that you quote above, on the other hand, look like rather arbitrary buffer size increases where the size to increase by is limited to a certain small range. With copying or moving of everything at each buffer size increase that would not yield amortized constant time. It yield linear time, and quadratic time for a loop doing appends.

But, anyway, if 'list' already uses a doubling buffer then the only overhead from the pop optimization would, AFAICS, be a single add in every indexing.

On the third & gripping hand, however, a proof-of-concept actual implementation (patch) would be needed to ensure that one doesn't overlook any showstopper or serious problem, and to provide timings. And the special case would have to be documented as a special case. Hm, it would be nice if the Python docs offered complexity (time) guarantees in general...


Cheers,

- Alf
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to