[issue20481] Clarify type coercion rules in statistics module

Wolfgang Maier Sun, 02 Feb 2014 14:05:02 -0800

Wolfgang Maier added the comment:

Hi Oscar,
well, I haven't used sympy much, and I have no experience with the others, but 
in light of your comment I quickly checked sympy and gmpy2.
You are right about them still not using the numbers ABCs, however, on your 
advise I also checked how the current statistics module implementation handles 
their numeric types and the answer is: it doesn't, and this is totally 
independent of the _coerce_types issue.
For sympy: the problem lies with statistics._exact_ratio, which cannot convert 
sympy numeric types to a numerator/denominator tuple (a prerequisite for _sum)
For gmpy2: the problem occurs just one step further down the road. 
gmpy2.Rationals have numerator and denominator properties, so _exact_ratio 
knows how to handle them, but the elements of the returned tuple are of type 
gmpy2.mpz (gmpy2's integer equivalent) and when _sum tries to convert the tuple 
into a Fraction you get:


TypeError: both arguments should be Rational instances

which is precisely because the mpz type is not integrated into the numbers 
tower.

This last example is very illustrative I think because it shows that already 
now the standard library (the fractions module in this case) requires numeric 
types to comply with the numeric tower, so statistics would not be without 
precedent, and I think this is totally justified:
after all this is the standard library (can't believe I'm saying this since I 
really got into this sort of by accident) and third party libraries should seek 
compatibility, but the standard library just needs to be self-consistent.

I guess using ABCs over a duck-typing approach when coercing types, in fact, 
offers a huge advantage for third party libraries since they only need to 
register their types with the numbers ABC to achieve compatibility, while they 
need to consider complicated subclassing schemes with the current approach (of 
course, I am only talking about compatibility with _coerce_types here, which is 
the focus of this issue. Other parts of statistics may impose further 
restrictions as we've just seen for _sum).

Finally, regarding speed. The fundamental difference between the current 
implementation and my proposed change is that the current version calls 
_coerce_types for every number in the input sequence, so performance is 
critical here, but in my version _coerce_types gets called only once and then 
operates on a really small set of input types, so it is absolutely not the 
time-critical step in the overall performance of _sum.
For this very reason I made no effort at all to optimize the code, but just 
tried to keep it as simple and clear as possible.

This, in fact, is IMHO the second major benefit of my proposal for 
_coerce_types (besides making its result order-independent). Read the current 
code for _coerce_types, then the proposed one. Try to consider all their 
ramifications and side-effects and decide which one's easier to understand and 
maintain.

Best,
Wolfgang

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20481>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20481] Clarify type coercion rules in statistics module

Reply via email to