Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

The attraction to having a first_tie=False flag is that it is backwards 
compatible with the current API.

However on further reflection, people would almost never want the existing 
behavior of raising an exception rather than returning a useful result.

So, we could make first_tie the default behavior and not have a flag at all.  
We would also add a separate multimode() function with a clean signature, 
always returning a list.  FWIW, MS Excel evolved to this solution as well (it 
has MODE.SGNL and MODE.MULT).

This would give us a clean, fast API with no flags:

    mode(Iterable) -> scalar
    multimode(Iterable) -> list  

ISTM this is an all round win:

* Fixes the severe usability problems with mode().  Currently, it can't be used 
without a try/except. The except-clause can't distinguish between an empty data 
condition and no unique mode condition, nor can the except clause access the 
dataset counts to get to a useful answer.

* It makes mode() memory friendly and faster.  Currently, mode() has to convert 
the counter items into a list() and run a full sort.  However, if we only need 
the first mode, we can feed the items iterator to max() and make only a single 
pass, never having to create a list.

However, there would be an incompatibility in the API change.  It is possible 
that a person really does want an exception instead of a value when there are 
multiple modes.  In that case, this would break their code so that they have to 
switch to multimode() as a remediation.  OTOH, it is also possible that people 
have existing code that uses mode() but their code has a latent bug because 
they weren't using a try/except and haven't yet encountered a multi-modal 
dataset. 

So here are some options:

* Change mode() to return the first tie instead of raising an exception.  This 
is a behavior change but leaves you with the cleanest API going forward.

* Add a Deprecation warning to the current behavior of mode() when it finds 
multimodal data.  Then change the behavior in Python 3.9.  This is 
uncomfortable for one release, but does get us to the cleanest API and best 
performance.

* Change mode() to have a flag where first_tie defaults to False.  This is 
backwards compatible, but it doesn't fix latent bugs, it leaves us with 
on-going API complexities, and the default case remains the least usable option.

* Leave mode() as-is and just document its limitations.

For any of those options, we should still add a separate multimode() function.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35892>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to