I said:
> I agree with Collin Winter: losing __cmp__ is a loss  (see 
> http://oakwinter.com/code/).

Steven Bethard said:
>Why can't this just be supplied with a mixin?  Here's a recipe
>providing the appropriate mixins if you want to define a __key__
>function:
>    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

That _works_ from a functional perspective, and if Python3 fails to include 
direct support for __cmp__, then I think providing a built-in mixin is 
necessary.

But mixins for comparison are a BIG LOSER for sort performance if your 
fundamental operator is a cmp-like function. Sorting is completely dominated by 
comparison time, and the mixin is a big lose for performance.  Basically, sorts 
always involve an inner loop that does comparisons, so any time comparison is 
slow, you're potentially dooming the whole program to a slow inner loop.  A 
mixin-calling-cmp model doubles the function call work; it has to find the 
mixin, call it, which eventually has to find and call the final cmp operator.

I did a test (see below), and the mixin using a simulated cmp took 50% MORE 
time to sort a list using Python 2.5 (see code below) than when __cmp__ is used 
directly (as you CAN do in Python 2.5).  A few tests with varying list size 
lead me to believe that this isn't linear - as the lists get longer the % 
performance hit increases.  In other words, it's a LOT slower, and as the size 
increases it gets far worse.   That kind of nasty performance hit will probably 
lead people to write lots of code that duplicates comparison functionality in 
each __lt__, __gt__, etc.  When the comparisons THEMSELVES are nontrivial, that 
will result in lots of duplicate, potentially-buggy code.  All of which is 
avoidable if __cmp__ is directly supported, as it ALREADY is in Python 1 and 2.

In addition, even IF the performance wasn't a big deal (and I think it is), I 
believe __cmp__ is the better basic operator in most cases.  As a style issue, 
I strongly prefer __cmp__ unless I have a specific need for comparisons which 
are atypical, e.g., where sometimes both __lt__ and __ge__ will return false 
given the same data (IEEE floats do this if you need exactly-IEEE-specified 
behavior of NaNs, etc.).  By preferring __cmp__ I eliminate lots of duplicate 
code, and once it's right, it's always right for ALL comparisons.  Sometimes 
__lt__ and friends are absolutely needed, e.g., when __lt__(x,y)==__gt__(x,y) 
for some values of x,y, but most of the time I find that they're an indicator 
of bad code and that __cmp__ should have been used instead.   Direct support of 
__cmp__ is a GOOD thing, not a wart or obsolete feature.

Adding a standard comparison mixin in a library is probably a good idea as 
well, but restoring __cmp__ is in my mind more important.  I can write my own 
mixin, but working around a failure to call __cmp__ gives a big performance hit.

--- David A. Wheeler


========================================
Here's my quick test code, in two files cmptest and cmptest2.
The whitespace may be munged by my mailer or the list, sorry if it is.

==== cmptest2 ====

#!/usr/bin/env python2

# cmp-test2.py

import timeit


time1 = timeit.Timer("x = sorted(list)", """
import cmptest
import random
randomlist = [random.randrange(1,100000) for x in range(100000)]
list = [cmptest.NumberWithCmp(x) for x in randomlist]
""")
time2 = timeit.Timer("x = sorted(list)", """
import cmptest
import random
randomlist = [random.randrange(1,100000) for x in range(100000)]
list = [cmptest.NumberMixinCmp(x) for x in randomlist]
""")

finaltime1 = time1.timeit(3)
finaltime2 = time2.timeit(3)

print finaltime1
print finaltime2



====== cmptest ======

#!/usr/bin/env python2

# cmp-test.py

import random
import timeit

class NumberWithCmp(object):
    "Store 'x' for comparison"
    def __init__(self, data):
       self.x = data
    def __str__(self):
        return str(self.x)
    def __cmp__(self, other):
        if self.x == other.x: return 0
        return (-1 if self.x < other.x else 1)

# Mixin, similar to 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

class ComparisonMixin(object):
    "Implement <, >, etc. by invoking a 'cmp' function."
    def __lt__(self, other):
        return self.cmp(other) < 0
    def __le__(self, other):
        return self.cmp(other) <= 0
    def __gt__(self, other):
        return self.cmp(other) > 0
    def __ge__(self, other):
        return self.cmp(other) >= 0

class NumberMixinCmp(ComparisonMixin):
    def __init__(self, data):
       self.x = data
    def __str__(self):
        return str(self.x)
    def cmp(self, other):
        if self.x == other.x: return 0
        return (-1 if self.x < other.x else 1)



_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to