[Python-ideas] Heterogeneous numeric data in statistics library

Steven D'Aprano Thu, 12 May 2022 07:21:36 -0700

Users of the statistics module, how often do you use it with 
heterogeneous data (mixed numeric types)?


Currently most of the functions try hard to honour homogeneous data, 
e.g. if your data is Decimal or Fraction, you will (usually) get Decimal 
or Fraction results:

>>> statistics.variance([Decimal('0.5'), Decimal(2)/3, Decimal(5)/2])
Decimal('1.231481481481481481481481481')
>>> statistics.variance([Fraction(1, 2), Fraction(2, 3), Fraction(5, 2)])
Fraction(133, 108)

With mixed types, the functions usually try to coerce the values into a 
sensible common type, honouring subclasses:

>>> class MyFloat(float):
...     def __repr__(self):
...             return "MyFloat(%s)" % super().__repr__()
... 
>>> statistics.mean([1.5, 2.25, MyFloat(1.0), 3.125, 1.75])
MyFloat(1.925)

but that's harder than you might expect and the extra complexity causes 
some significant performance costs. And not all combinations are 
supported (Decimal is particularly difficult).

If you are a user of statistics, how important to you is the ability to 
**mix** numeric types, in the same data set?

Which combinations do you care about?

Would you be satisfied with a rule that said that the statistics 
functions expect homogeneous data and that the result of calling the 
functions on mixed types is not guaranteed?



-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AGMUQK7DQOCWU2X7VBNTCA2F3AUMDJIW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Heterogeneous numeric data in statistics library

Reply via email to