On 3/12/2011 8:28 PM, Steven D'Aprano wrote:
Fredrik Johansson wrote:

Consider sorting a list of pairs representing fractions. This can be
done easily in Python 2.x with the comparison function lambda
(p,q),(r,s): cmp(p*s, q*r). In Python 2.6, this is about 40 times
faster than using fractions.Fraction as a key function.


[steve@sylar ~]$ python2.7 -m timeit -s "L = [(1,2), (3,4), (0,5),
(9,100), (3,7), (2,8)]" "sorted(L, lambda (p,q),(r,s): cmp(p*s, q*r))"
10000 loops, best of 3: 25.1 usec per loop

[steve@sylar ~]$ python2.7 -m timeit -s "L = [(1,2), (3,4), (0,5),
(9,100), (3,7), (2,8)]" -s "from fractions import Fraction" "sorted(L,
key=lambda t: Fraction(*t))"
1000 loops, best of 3: 236 usec per loop


So for a short list, I get a factor of ten difference. For a longer
list, I'd expect the key function to win out. Much to my astonishment,
it doesn't -- I get similar results regardless of the size of L.

Size of L key/cmp
========== =========
6 9.4
600 13.9
60000 7.0
6000000 6.7

Interesting. Comparing two Fractions must do the same thing as the cmp function, compare two products, but it does so with a *lot* more overhead:

571     def _richcmp(self, other, op):
572         """Helper for comparison operators, for internal use
574         Implement comparison between a Rational instance and
575         either another Rational instance or a float `other`. If
576         `other` is not a Rational instance or a float, return
577         NotImplemented. `op` should be one of the six standard
578         comparison operators.
580         """
581         # convert other to a Rational instance where reasonable.
582         if isinstance(other, numbers.Rational):
583             return op(self._numerator * other.denominator,
584                       self._denominator * other.numerator)
585         if isinstance(other, float):
586             if math.isnan(other) or math.isinf(other):
587                 return op(0.0, other)
588             else:
589                 return op(self, self.from_float(other))
590         else:
591             return NotImplemented
592
593     def __lt__(a, b):
594         """a < b"""
595         return a._richcmp(b, operator.lt)

For this example, and any with suitably restricted values, key=float would work and should beat the cmp version. But of course, someone will have a use case for which that will not work. (But then, one could do a near linear scan to properly re-sort slices with equal float keys.)

Hmm. As I remember, one rationale for deleting cmp is that nlogn slow cmp() calls are slower than n slow key() calls, but that only matters if the nlogn '<' comparisions of stored keys are fast compared to cmp calls (as tends to be the case when the keys are builtin class instances).

But in this case, they are much slower. To be faster, one would need something like "key=lambda p,q:p*(lcm//q)", where lcm is the least common multiple of of all the q's (denominators). For the example above, lcm = 700. But for longer lists, it could be outrageously large.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to