New submission from Raymond Hettinger <[email protected]>:
The current code for mode() does a good deal of extra work to support its two
error outcomes (empty input and multimodal input). That latter case is
informative but doesn't provide any reasonable way to find just one of those
modes, where any of the most popular would suffice. This arises in nearest
neighbor algorithms for example. I suggest adding an option to the API:
def mode(seq, *, first_tie=False):
if tie_goes_to_first:
# CHOOSE FIRST x ∈ S | ∄ y ∈ S : x ≠ y ∧ count(y) > count(x)
return return Counter(seq).most_common(1)[0][0]
...
Use it like this:
>>> data = 'ABBAC'
>>> assert mode(data, first_tie=True) == 'A'
With the current API, there is no reasonable way to get to 'A' from 'ABBAC'.
Also, the new code path is much faster than the existing code path because it
extracts only the 1 most common using min() rather than the n most common which
has to sort the whole items() list. New path: O(n). Existing path: O(n log n).
Note, the current API is somewhat awkward to use. In general, a user can't
know in advance that the data only contains a single mode. Accordingly, every
call to mode() has to be wrapped in a try-except. And if the user just wants
one of those modal values, there is no way to get to it. See
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html for
comparison.
There may be better names for the flag. "tie_goes_to_first_encountered" seemed
a bit long though ;-)
----------
assignee: steven.daprano
components: Library (Lib)
messages: 334796
nosy: rhettinger, steven.daprano
priority: normal
severity: normal
status: open
title: Fix awkwardness of statistics.mode() for multimodal datasets
type: behavior
versions: Python 3.8
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35892>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com