[issue20481] Clarify type coercion rules in statistics module

2014-02-08 Thread Nick Coghlan

Nick Coghlan added the comment:

Claiming to commit before 3.4rc1 (as I believe Steven's SSH key still needs to 
be added)

--
assignee: stevenjd - ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 5db74cd953ab by Nick Coghlan in branch 'default':
Close #20481: Disallow mixed type input in statistics
http://hg.python.org/cpython/rev/5db74cd953ab

--
nosy: +python-dev
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-08 Thread Nick Coghlan

Nick Coghlan added the comment:

OK, I committed a slight variant of Steven's patch:

- I moved the issue 20389 doc changes to a separate patch that I uploaded over 
there

- Steven had marked one of the tests as potentially obsolete, I just removed it 
entirely. If we decide we eventually want it back, it's still there in the VCS 
history :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-08 Thread Oscar Benjamin

Oscar Benjamin added the comment:

 Close #20481: Disallow mixed type input in statistics

If I understand correctly the reason for hastily pushing this patch
through is that it's the safe option: disallow mixing types as a quick
fix for soon to be released 3.4. If we want to allow mixing types then
that can be a new feature for 3.5.

Is that correct?

If so should the discussion about what to do in 3.5 take place in this
issue (i.e. reopen for 3.5) or should it be a new issue? issue20499
would benefit from clarity about the statistics module policy for
mixing types.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-08 Thread Nick Coghlan

Nick Coghlan added the comment:

Yes, a new RFE to sensibly handle mixed type input in 3.5 would make sense (I 
did something similar for the issue where we removed the special casing of 
Counter for 3.4).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-07 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Attached is a patch which:

- documents that mixed types are not currently supported;

- changes the behaviour of _sum to raise TypeError on mixed input 
  types (mixing int and other is allowed, but nothing else);

- updates the tests;

- adds some documentation changes for issue 20389.

--
Added file: http://bugs.python.org/file33978/stats.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Nick Coghlan

Nick Coghlan added the comment:

I think it's also acceptable at this point for the module docs to just say that 
handling of mixed type input is undefined and implementation dependent, and 
recommend doing map(int, input_data), map(float, input_data), map(Decimal, 
input_data) or map(Fraction, input_data) to ensure getting a consistent 
answer for mixed type input.

I believe it would also be acceptable for the module to just fail immediately 
as soon as it detects an input type that differs from the type of the first 
value rather than attempting to guess the most appropriate behaviour.

This close to 3.4rc1 (Sunday 9th February), I don't think we want to be 
committing to *any* particular implementation of type coercion.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Oscar Benjamin

Oscar Benjamin added the comment:

I was working on the basis that we were talking about Python 3.5.

But now I see that it's a 3.4 release blocker. Is it really that urgent?

I think the current behaviour is very good at handling a wide range of types.
It would be nice to consistently report errors for incompatible types but it
can also just be documented as a thing that users shouldn't do.

If there were a situation where it silently returned a highly inaccurate
value I would consider that urgent but I don't think there is.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Nick Coghlan

Nick Coghlan added the comment:

Changing the behaviour is not urgent - documenting that it may change
in the future is essential.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Wolfgang Maier

Wolfgang Maier added the comment:

Hi Nick and Oscar,
my patch including tests is ready. What else should I say than that I think it 
is ok, but of course with only days remaining and still no feedback from 
Steven, I'm not sure there is any chance for it going into 3.4 still.
Anyway, before uploading I wanted to ask you what the best procedure is:
upload the module and its tests as one diff or separately so that it is easier 
to see where the current module version fails?
Best,
Wolfgang

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Wolfgang,

Thanks for the patch, I have some concerns about it, but the basic idea 
does look reasonable. However, I've been convinced that supporting mixed 
types at all needs more careful thought. Under the circumstances, I'm 
more concerned about making sure we don't lock in any behaviour we'll 
regret, so for 3.4 I'm going to follow Nick's advice and reject 
mixed-type calculations.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Wolfgang Maier

Wolfgang Maier added the comment:

Hi Steven,
sounds reasonable, still here's the patch in diff version.
Its rules for type coercion are detailed in _coerce_types's docstring.
Questions and comments are welcome.
Best,
Wolfgang

--
keywords: +patch
Added file: http://bugs.python.org/file33910/statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Oscar Benjamin

Oscar Benjamin added the comment:

It's not as simple as registering with an ABC. You also need to provide the
interface that the ABC represents:

 import sympy
 r = sympy.Rational(1, 2)
 r
1/2
 r.numerator
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'Half' object has no attribute 'numerator'

AFAIK there are no plans by any third part libraries to increase their
inter-operability with the numeric tower.

My point is that in choosing what types to accept and how to coerce them you
should focus on actual practical benefits rather than theoretical ones. If it
can be done in a way that means it works for more numeric types then that's
great. But when I say works I mean that it should ideally achieve the best
possible accuracy for each type.

If that's not possible then it might be simplest to just document how it works
for combinations of the std lib types (and perhaps subclasses thereof) and
then say that it will fall back on coercing to float for anything else. This
approach is simpler to document and for end-users to understand. It also has
the benefit that it will work for all non std lib types (that I'm aware of)
without pretending to achieve more accuracy than it can.

 import sympy, fractions, gmpy
 fractions.Fraction(sympy.Rational(1, 2))
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.7/fractions.py, line 148, in __new__
raise TypeError(argument should be a string 
TypeError: argument should be a string or a Rational instance
 float(sympy.Rational(1, 2))
0.5
 fractions.Fraction(gmpy.mpq(1, 2))
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.7/fractions.py, line 148, in __new__
raise TypeError(argument should be a string 
TypeError: argument should be a string or a Rational instance
 float(gmpy.mpq(1, 2))
0.5

Coercion to float via __float__ is well supported in the Python ecosystem.
Consistent support for getting exact integer ratios is (unfortunately) not.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Wolfgang Maier

Wolfgang Maier added the comment:

 there are currently two strict requirements for any numeric type to be usable 
 with statistics._sum:

I meant *three* of course (remembered one only during writing).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Wolfgang Maier

Wolfgang Maier added the comment:

Just to make sure that this discussion is not getting on the wrong track,
there are currently two strict requirements for any numeric type to be usable 
with statistics._sum:

(1) the type has to provide either
- numerator/denominator properties or
- an as_integer_ratio method or
- an as_tuple method that mimicks the Decimal method of that name

this requirement comes from statistics._exact_ratio.

This is where, for example, the sympy numeric types fail because they do not 
provide any of these interfaces.



For completeness, this is the code of this function:

def _exact_ratio(x):
Convert Real number x exactly to (numerator, denominator) pair.

 _exact_ratio(0.25)
(1, 4)

x is expected to be an int, Fraction, Decimal or float.

try:
try:
# int, Fraction
return (x.numerator, x.denominator)
except AttributeError:
# float
try:
return x.as_integer_ratio()
except AttributeError:
# Decimal
try:
return _decimal_to_ratio(x)
except AttributeError:
msg = can't convert type '{}' to numerator/denominator
raise TypeError(msg.format(type(x).__name__)) from None
except (OverflowError, ValueError):
# INF or NAN
if __debug__:
# Decimal signalling NANs cannot be converted to float :-(
if isinstance(x, Decimal):
assert not x.is_finite()
else:
assert not math.isfinite(x)
return (x, None)

*

(2) Essentially, the numerator and the denominator returned by _exact_ratio 
have to be valid arguments for the Fraction constructor.
This is a consequence of this block of code in _sum:

for d, n in sorted(partials.items()):
total += Fraction(n, d)

Of note, Fraction(n, d) requires both arguments to be members of 
numbers.Rational and this is where, for example, the gmpy.mpq type fails.

(3) The type's constructor has to work with a Fraction argument.
This is because _sum tries to

return T(total) # where T is the coerced type and total the calculated sum 
as a Fraction
The gmpy.mpq type, for example, fails at this step again.


ACTUALLY: THIS SHOULD BE RAISED AS ITS OWN ISSUE HERE as soon as the coercion 
part is settled because it means that _sum may succeed with certain mixed input 
types even though some of the same types may fail, when they are the only type.


IMPORTANTLY, neither requirement has anything to do with the module's type 
coercion, which is the topic of this discussion.

IN ADDITION, the proposed patch involving _coerce_types adds the following 
(soft) requirement:
in order to be able to coerce a sequence of numbers of mixed numeric types 
without loss of precision all involved types need to be integrated into the 
hierarchy of numeric abstract base classes as defined in the numbers module 
(otherwise all types are coerced to float).

From this it should be clear that the compatibility bottleneck is not in this 
patch, but in other parts of the module.

What is more, if a custom numeric type implements the numerator/denominator 
properties, then it is simple enough to register it as a virtual subclass of 
numbers.Rational or .Integral to enable lossless coercion in the presence of 
mixed types.

Example with gmpy2 numeric type:

 from gmpy2 import mpq # mpq is gmpy's Fraction equivalent
 import numbers

 numbers.Rational.register(type(mpq()))
 r = mpq(7,5)
 type(r)
class 'mpq'

 issubclass(type(r), numbers.Rational)
True

= the mpq type could now be coerced correctly even in mixed input types 
situations, but remains incompatible with _sum for reasons (2) and (3).


SUMMARY: Making _sum work with custom types is very complicated, BUT:
this is not due to type coercion as it is discussed here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Oscar Benjamin

Oscar Benjamin added the comment:

I agree that supporting non-stdlib types is in some ways a separate issue from
how to manage coercion with mixed stdlib types. Can you provide a complete
patch (e.g. hg diff  coerce_types.patch).
http://docs.python.org/devguide/

There should probably also be tests added for situations where the current
implementation behaves undesirably. Something like these ones:
http://hg.python.org/cpython/file/a97ce3ecc96a/Lib/test/test_statistics.py#l1445

Note that when I said non-stdlib types can be handled by coercing to float I
didn't mean that the output should be coerced to float but rather the input
should be coerced to float because __float__ is the most consistent interface
available on third party numeric types.

Once the input numbers are converted to float statistics._sum can handle them
perfectly well. In this case I think the output should also be a float so that
it's clear that precision may have been lost. If the precision of float is not
what the user wants then the documentation can point them toward
Fraction/Decimal.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Wolfgang Maier

Wolfgang Maier added the comment:

 Once the input numbers are converted to float statistics._sum can handle
 them perfectly well. In this case I think the output should also be a float so
 that it's clear that precision may have been lost. If the precision of float 
 is not
 what the user wants then the documentation can point them toward
 Fraction/Decimal.

Ah, I'm getting it now. That is actually a very interesting thought. Still I 
don't think this should be part of this issue discussion, but I'll think about 
it and file a new enhancement issue if I have an idea.


As for providing the complete patch, I can do that I guess, but formulating the 
tests may take a while. I'll try to do it though, if Steven thinks it's worth 
it. After all he's the one who'd have to approve it and the patch does alter 
his design quite a bit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-02 Thread Steven D'Aprano

Changes by Steven D'Aprano steve+pyt...@pearwood.info:


--
assignee:  - stevenjd

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-02 Thread Oscar Benjamin

Oscar Benjamin added the comment:

Wolfgang have you tested this with any third party numeric types from
sympy, gmpy2, mpmath etc.?

Last I checked no third party types implement the numbers ABCs e.g.:

 import sympy, numbers
 r = sympy.Rational(1, 2)
 r
1/2
 isinstance(r, numbers.Rational)
False

AFAICT testing against the numbers ABCs is just a slow way of testing
against the stdlib types:

$ python -m timeit -s 'from numbers import Integral' 'isinstance(1, Integral)'
10 loops, best of 3: 2.59 usec per loop
$ python -m timeit -s 'from numbers import Integral' 'isinstance(1, int)'
100 loops, best of 3: 0.31 usec per loop

You can at least make it faster using a tuple:

$ python -m timeit -s 'from numbers import Integral' 'isinstance(1,
(int, Integral))'
100 loops, best of 3: 0.423 usec per loop

I'm not saying that this is necessarily a worthwhile optimisation but
rather that the numbers ABCs are in practice not really very useful
(AFAICT).

I don't know how well the statistics module currently handles third
party numeric types but if the type coercion is to be changed then
this something to look at. The current implementation tries to
duck-type to some extent and yours uses ABCs but does either approach
actually have any practical gain for interoperability with non-stdlib
numeric types? If not then it would be simpler just to explicitly
hard-code exactly how it works for the powerset of stdlib types.

OTOH if it could be made to do sensible things with non-stdlib types
then that would be great. Falling back on float isn't a bad choice but
if it can be made to do exact things for exact types (e.g.
sympy.Rational) then that would be great. Similarly mpmath.mpf
provides multi-precision floats. It would be great to be able to take
advantage of that higher precision rather than downgrade everything to
float.

This is in general a hard problem though so I don't think it's
unreasonable to make restrictions about what types it can work with -
achieving optimal accuracy for all types without restrictions is
basically impossible.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-02 Thread Wolfgang Maier

Wolfgang Maier added the comment:

Hi Oscar,
well, I haven't used sympy much, and I have no experience with the others, but 
in light of your comment I quickly checked sympy and gmpy2.
You are right about them still not using the numbers ABCs, however, on your 
advise I also checked how the current statistics module implementation handles 
their numeric types and the answer is: it doesn't, and this is totally 
independent of the _coerce_types issue.
For sympy: the problem lies with statistics._exact_ratio, which cannot convert 
sympy numeric types to a numerator/denominator tuple (a prerequisite for _sum)
For gmpy2: the problem occurs just one step further down the road. 
gmpy2.Rationals have numerator and denominator properties, so _exact_ratio 
knows how to handle them, but the elements of the returned tuple are of type 
gmpy2.mpz (gmpy2's integer equivalent) and when _sum tries to convert the tuple 
into a Fraction you get:

TypeError: both arguments should be Rational instances

which is precisely because the mpz type is not integrated into the numbers 
tower.

This last example is very illustrative I think because it shows that already 
now the standard library (the fractions module in this case) requires numeric 
types to comply with the numeric tower, so statistics would not be without 
precedent, and I think this is totally justified:
after all this is the standard library (can't believe I'm saying this since I 
really got into this sort of by accident) and third party libraries should seek 
compatibility, but the standard library just needs to be self-consistent.

I guess using ABCs over a duck-typing approach when coercing types, in fact, 
offers a huge advantage for third party libraries since they only need to 
register their types with the numbers ABC to achieve compatibility, while they 
need to consider complicated subclassing schemes with the current approach (of 
course, I am only talking about compatibility with _coerce_types here, which is 
the focus of this issue. Other parts of statistics may impose further 
restrictions as we've just seen for _sum).

Finally, regarding speed. The fundamental difference between the current 
implementation and my proposed change is that the current version calls 
_coerce_types for every number in the input sequence, so performance is 
critical here, but in my version _coerce_types gets called only once and then 
operates on a really small set of input types, so it is absolutely not the 
time-critical step in the overall performance of _sum.
For this very reason I made no effort at all to optimize the code, but just 
tried to keep it as simple and clear as possible.

This, in fact, is IMHO the second major benefit of my proposal for 
_coerce_types (besides making its result order-independent). Read the current 
code for _coerce_types, then the proposed one. Try to consider all their 
ramifications and side-effects and decide which one's easier to understand and 
maintain.

Best,
Wolfgang

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-01 Thread Nick Coghlan

New submission from Nick Coghlan:

I haven't completely following the type coercion discussion on python-ideas. 
but the statistics module at least needs a docs clarification (to explain that 
the current behaviour when mixing input types is not fully defined, especially 
when Decimal is involved), and potentially a behavioural change to disallow 
certain type combinations where the behaviour may change in the future (see
https://mail.python.org/pipermail/python-ideas/2014-February/025214.html for 
example)

Either option seems reasonable to me (with a slight preference for the latter), 
but it's at least clear that we need to avoid locking ourselves into the exact 
coercion behaviour of the current implementation indefinitely.

--
components: Library (Lib)
messages: 209934
nosy: gregory.p.smith, larry, ncoghlan, oscarbenjamin, stevenjd, wolma
priority: release blocker
severity: normal
stage: needs patch
status: open
title: Clarify type coercion rules in statistics module
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20481] Clarify type coercion rules in statistics module

2014-02-01 Thread Wolfgang Maier

Wolfgang Maier added the comment:

Thanks Nick for filing this!
I've been working on modifications to statistics._sum and 
statistics._coerce_types that together make the module's behaviour independent 
of the order of input types (by making the decision based on the set of input 
types) and, specifically, disallow certain combinations involving Decimal 
reliably.
Specifically, my modified version, like the original, returns the input type if 
there is only one; for mixed input types, differently than the original, it 
tries to return the most basic representative of the most narrow number type 
from the numbers tower (i.e., int if all input types are Integral, Fraction if 
all are Rational, float with all Real); Decimal is treated in the following 
way: if Decimal is the only input type or if the input types consist of only 
Decimal and Integral, Decimal is returned; my two versions of _coerce_types 
differ in that the first raises TypeError with any other combination involving 
Decimal, the second allows combinations of Decimal with float (returning float) 
and raises TypeError with all others.
I sent the first version to Steven D'Aprano and Oscar Benjamin for review.

--
Added file: http://bugs.python.org/file33862/_sum_coerce.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20481
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com