--- On Mon, 1/25/10, Benjamin Peterson <benja...@python.org> wrote:

> From: Benjamin Peterson <benja...@python.org>
> Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
> To: "Steve Howell" <showel...@yahoo.com>
> Cc: python-dev@python.org
> Date: Monday, January 25, 2010, 3:15 PM
> 2010/1/25 Steve Howell <showel...@yahoo.com>:
> >> From: Raymond Hettinger <raymond.hettin...@gmail.com>
> >>
> >> On Jan 25, 2010, at 12:36 PM, Steve Howell wrote:
> >>
> >> >
> >> > Deque does not support all the operations
> that list
> >> does.  It is also roughly twice as slow for
> accessing
> >> elements (I've measured it).
> >>
> >>
> >> ISTM that apps that want to insert or pop from the
> front of
> >> list are also apps that don't care about accessing
> arbitrary
> >> elements in the middle using the position index.
>  When
> >> lists are growing or shrinking from the front, the
> meaning
> >> of the i-th element changes.   So, it doesn't
> >> make sense for an application to track indices of
> objects in
> >> such a list.
> >>
> >>    i = s.find('abc')
> >>    s.pop(0)
> >>    print s[i]    # i no longer
> >> points at 'abc'
> >>
> >
> > I am not going to directly address your point, but I'd
> like to give a examples of code that uses pop(0) from the
> standard library.
> >
> > If you look at the code for
> multiprocessing/connection.py, you will see that
> PipeListener creates _handle_queue as an ordinary Python
> list, and in line 317 it uses pop(0) to pop the first handle
> off the top of the queue.
> >
> > Why does that code not use a deque?  In hindsight, it
> probably should.  But to make the change now, it's not a
> simple matter of fixing just PipeListener, because
> PipeListener passes off _handle_queue to Finalize, which
> also expects a list.
> >
> > In order to understand why Finalize expects a list,
> you need to look at how it uses args, and here is one
> example usage:
> >
> > res = self._callback(*self._args, **self._kwargs)
> >
> > Ok, so now you need to know what self._callback is
> doing, so now you have to trace through all callers of
> Finalize are passing in for their args.
> >
> > So what seems like a trivial matter--switching over a
> list to a deque--actually requires a lot of thinking.
> >
> > It turns out that the callback for PipeListener just
> iterates through the remaining handles and closes them.  So
> a deque would not break that code.
> >
> > If you look at difflib.py, it also does pop(0) in a
> loop.  Why doesn't it use a deque?  Simplicity, maybe?
> >
> > codecs.py also deletes from the top of the list:
> >
> > del self.linebuffer[0]
> 
> Yes, but in either of these cases is there an excellent
> performance
> improvement to be gained and is it worth the complexity of
> your
> optimization? I think not.
> 

I am starting to think that the optimization would be drowned out by the cost 
of processing each line, unless you had some combination of the following:
 
 1) a pretty large list (plausible)
 2) a very inexpensive operation that you were applying to each line (rare)
 3) a really slow memmove implementation (extremely doubtful)

The complexity of the optimization does not phase me for some reason.  If 
ruthless simplicity were the only goal, then I'd also simplify/remove some of 
this code:

    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SIZE(self) = newsize;
        return 0;
    }

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     */
    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

In my mind, though, the complexity within CPython does not have to leak up to 
the Python level.



_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to