On Sun, Mar 05, 2017 at 07:19:43AM +0000, Elliot Gorokhovsky wrote:
> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous
I sometimes need to know if a list is homogenous, but unfortunately
checking large lists for a common type in pure Python is quote slow.
Here is a radical thought... why don't lists track their common type
themselves? There's only a few methods which can add items:
- append
- extend
- insert
- __setitem__
Suppose we gave lists a read-only attrribute, __type_hint__, which
returns None for hetrogeneous lists and the type for homogeneous lists. Adding
an item to the list does as follows:
- if the list is empty, adding an item sets __type_hint__ to type(item);
- if the list is not empty, adding an item tests whether type(item)
is identical to (not a subclass) of __type_hint__, and if not, sets
__type_hint__ to None;
- removing an item doesn't change the __type_hint__ unless the list
becomes empty, in which case it is reset to None;
- if the internal allocated space of the list shrinks, that triggers
a recalculation of the __type_hint__ if it is currently None.
(There's no need to recalculate the hint if it is not None.)
Optional: doing a list.sort() could also recalculate the hint.
The effect will be:
- if __type_hint__ is a type object, then you can be sure that
the list is homogeneous;
- if the __type_hint__ is None, then it might still be
homogeneous, but it isn't safe to assume so.
Not only could sorting take advantage of the type hint without needing
to do a separate O(N) scan of the list, but so could other code. I know
I would be interested in using this. I have a fair amount of code that
has to track the type of any items seen in a list, and swap to a "type
agnostic but slow" version if the list is not homogeneous. I could
probably replace that with some variation of:
if thelist.__type_hint__ is None:
process_slow(thelist)
else:
process_fast(thelist)
At the very least, I'd be interested in experimenting with this.
Thoughts?
--
Steve
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/