Re: list.pop(0) vs. collections.dequeue

Alf P. Steinbach Sat, 23 Jan 2010 08:42:38 -0800

* Steven D'Aprano:

On Sat, 23 Jan 2010 09:57:04 -0500, Roy Smith wrote:
In article <hje979$kc...@news.eternal-september.org>,
 "Alf P. Steinbach" <al...@start.no> wrote:
But it would IMHO have been better if it wasn't called "list", which
brings in the wrong associations for someone used to other languages.
+1.

When I first started using Python (back in the 1.4 days), I assumed a
list was a singly-linked list.
Why would you do that? I can think of at least eight differentimplementations of the abstract list data structure:
constant-size array
variable-size array
variable-size array with amortised O(1) appends
singly-linked list
singly-linked list with CDR coding
doubly-linked list
skip list
indexable skip list
One can reasonably disregard constant-sized arrays as a possibility,given that Python lists aren't fixed size (pity the poor Pascal andFortran coders who are stuck with static arrays!), but the rest are allreasonable possibilities.

A linked list implementation would yield O(n) indexing. A great many loops ine.g. Python libraries code now having linear time would then get quadratic time,O(n^2). Those libraries would then be effectively unusable without extensiverewriting: one version for ordinary Python and one for 'list-as-list' Pythons...


Thus, the linked list implementations are IMO *not* reasonable.

And the reason is precisely the implied complexity guarantees, especially onindexing -- which could reasonably be O(log n), but not worse than that.

Why assume one specific implementation in theabsence of documentation promising certain performance characteristics?

Oddly enough, I was going to write in the above paragraph, "like a C++
STL list", until I happened to glance at the STL docs and refreshed my
memory that an STL list is doubly-linked.  Which just goes to show that
making assumptions based on names is a bad idea.


Exactly :)

So, we're right back to my statement earlier in this thread that the
docs are deficient in that they describe behavior with no hint about
cost. Given that, it should be no surprise that users make incorrect
assumptions about cost.


There are quite a few problems with having the documentation specify cost:

(1) Who is going to do it? Any volunteers?

This problem must have been addressed at each time the documentation for someversion of Python was written or updated.

(2) Big-oh notation can be misleading, especially for naive users, orthose whose intuition for what's fast has been shaped by other languages.Big-oh doesn't tell you whether something is fast or slow, only how itscales -- and sometimes not even then.


It's how things scale that are of interest. :-)

Big-oh tells you an upper asymptotic limit.

That's sufficient for e.g. the C++ standard -- which, by the way, constitutesa concrete example of the practicality of specifying complexity.

(3) Having documented a particular performance, that discouragesimplementation changes. Any would-be patch or new implementation not onlyhas to consider that the functional behaviour doesn't change, but thatthe performance doesn't either.
In practice the Python developers are unlikely to make an implementationchange which leads to radically worse performance, particularly forcritical types like list and dict. But in other cases, they might chooseto change big-oh behaviour, and not wish to be tied down by documentationof the cost of operations.

Say that there was an O(log n) documented worst complexity for 'list' indexing.Above you have described it as "reasonable" to break that, having O(n)complexity... But in light of my comments on that, and especially thinking a bitabout maintainance of two or more! versions of various libraries, don't youagree that it would be Just Bad(TM)?

(4) How much detail is necessary? What about degenerate cases? E.g. dictlookup in CPython is typically O(1) amortised, but if all the keys hashto the same value, it falls to O(N).

From N1745, the Technical Report 1 on C++ library extensions (will be part ofthe C++0x standard), table 21 specifying general requirements of unorderedassociative containers:


expression:      b.find(k)
return type:     iterator;
assertion:       Returns an iterator pointing to an element with key equivalent
                 to k, or b.end() if no such element exists.
complexity:      Average case O(1), worst case O(b.size()).

(5) Should the language guarantee such degenerate behaviour? Who decideswhich costs are guaranteed and which are not?

I think the C++ standard (the latest draft of C++0x is freely available as PDFfrom the commitee pages) provides good guidance in this regard. :-)

(6) Such performance guarantees should be implementation specific, notlanguage specific. CPython is only one implementation of the language outof many.

Disagree Very Strongly. An implementation may offer stricter guarantees. Butwhat matters regarding e.g. avoiding having to maintain two or three or umpteenversions of a library, is the set of language level complexity guarantees.



Cheers,

- Alf
--
http://mail.python.org/mailman/listinfo/python-list

Re: list.pop(0) vs. collections.dequeue

Reply via email to