[issue41773] Clarify handling of infinity and nan in random.choices

Raymond Hettinger Sat, 12 Sep 2020 18:18:10 -0700


Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:


In general NaNs wreak havoc on any function that uses comparisons:

    >>> max(10, nan)
    10
    >>> max(nan, 10)
    nan

It isn't the responsibility of the functions to test for NaNs.  Likewise, it 
would not make sense to document the NaN behavior in every function that uses a 
comparison.  That would clutter the docs and it puts the responsibility in the 
wrong place.

It is the NaN itself that is responsible for the behavior you're seeing.  It is 
up to the NaN documentation to discuss its effects on downstream code.  For 
example, sorting isn't consistent when NaNs are present:

    >>> sorted([10, nan, 5])
    [10, nan, 5]
    >>> sorted([5, nan, 10])
[   5, nan, 10]

Also, a NaN is just one of many possible objects that create weird downstream 
effects.  For examples, sets have a partial ordering and would also create odd 
results with sorted(), bisect(), max(), min(), etc.

The situation with Infinity is similar.  It is a special object that has the 
unusual property that inf+1==inf, so bisection of cumulative sums will give the 
rightmost infinity.

Consider a population 'ABC' and weights of [5, 7, 2].  We have

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B         7      12                 5.0  <= X < 12.0
      C         2      14                 12.0 <= X < 14.0
              ------
               14

The selector X comes from:  random() * 14.0 which gives a range of: 0.0 <= X < 
14.0.

Now consider a population 'ABC' and weights of [5, 0, 2].  We have

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B         0      5                  5.0  <= X < 5.0
      C         2      7                  5.0  <= X < 7.0
              ------
                7

If X is 5.0, we have to choose C because B has a zero selection probabliity.  
So have to pick the rightmost (bottommost) range that has 5.0, giving the C.

Now, replace B's weight with float('Inf'):

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B        Inf     Inf                5.0  <= X < Inf
      C         2      Inf                Inf  <= X < Inf
              ------
               Inf

Since Inf+2 is Inf and Inf==Inf, the latter two ranges are undifferentiated.  
The selector itself in always Inf because "X = random() * inf" always gives 
inf.  Using the previous rule, we have to choose the rightmost Inf which is C.  
This is in fact what choices() does:

    >>> choices('ABC', [5, float('Inf'), 2])
    ['C']

It isn't an error.  That is the same behavior that bisect() has when searching 
for a infinite value (or any other object that universally compares larger than 
anything except itself):

    >>> bisect([10, 20, inf], inf)
    3
    >>> bisect([10, 20, 30], inf)
    3

When searching for infinity, you always get the rightmost insertion point even 
if the cuts points are finite.  Accordingly, it makes sense that if one of the 
weights is infinite, then the total infinite, and the selector is infinite, so 
you always get the rightmost value.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41773>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue41773] Clarify handling of infinity and nan in random.choices

Reply via email to