Re: set and dict iteration

Ian Kelly Mon, 27 Aug 2012 12:23:54 -0700

On Thu, Aug 23, 2012 at 10:49 AM, Aaron Brady <castiro...@gmail.com> wrote:
> The patch for the above is only 40-60 lines.  However it introduces two new 
> concepts.


Is there a link to the patch?

> The first is a "linked list", a classic dynamic data structure, first 
> developed in 1955, cf. http://en.wikipedia.org/wiki/Linked_list .  Linked 
> lists are absent in Python, including the standard library and CPython 
> implementation, beyond the weak reference mechanism and garbage collector.  
> The "collections.deque" structure shares some of the linked list interface 
> but uses arrays.
>
> The second is "uncounted references".  The uncounted references are 
> references to "set iterators" exclusively, exist only internally to "set" 
> objects, and are invisible to the rest of the program.  The reason for the 
> exception is that iterators are unique in the Python Data Model; iterators 
> consist of a single immutable reference, unlike both immutable types such as 
> strings and numbers, as well as container types.  Counted references could be 
> used instead, but would be consistently wasted work for the garbage 
> collector, though the benefit to programmers' peace of mind could be 
> significant.
>
> Please share your opinion!  Do you agree that the internal list resolves the 
> inconsistency?  Do you agree with the strategy?  Do you agree that uncounted 
> references are justified to introduce, or are counted references preferable?

This feature is a hard sell as it is; I think that adding uncounted
references into the mix is only going to make that worse.  May I
suggest an alternate approach?  Internally tag each set or dict with a
"version", which is just a C int.  Every time the hash table is
modified, increment the version.  When an iterator is created, store
the current version on the iterator.  When the iterator is advanced,
check that the iterator version matches the dict/set version.  If
they're not equal, raise an error.

This should add less overhead than the linked list without any
concerns about reference counting.  It does introduce a small bug in
that an error condition could be "missed", if the version is
incremented a multiple of 2**32 or 2**64 times between iterations --
but how often is that really likely to occur?  Bearing in mind that
this error is meant for debugging and not production error handling,
you could even make the version a single byte and I'd still be fine
with that.

Cheers,
Ian
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: set and dict iteration

Reply via email to