[issue47114] random.choice and random.choices have different distributions

Mark Bell Thu, 24 Mar 2022 16:57:00 -0700


New submission from Mark Bell <mark00b...@googlemail.com>:


The docstring for `random.choices` indicates that
```
import random
random.choices(population, k=1)
```
should produce a list containing one item, where each item of `population` has 
equal likelihood of being selected. However `random.choices` draws elements for 
its sample by doing `population[floor(random() * len(population)]` and so 
relies on floating point numbers. Therefore not each item is equally likely to 
be chosen since floats are not uniformly dense in [0, 1] and this problem 
becomes worse as `population` becomes larger. 

Note that this issue does not apply to `random.choice(population)` since this 
uses `random.randint` to choose a random element of `population` and performs 
exact integer arithmetic. Compare 
https://github.com/python/cpython/blob/main/Lib/random.py#L371 and 
https://github.com/python/cpython/blob/main/Lib/random.py#L490

Could `random.choices` fall back to doing `return [choice(population) for _ in 
_repeat(None, k)]` if no weights are given? Similarly, is it also plausible to 
only rely on `random.randint` and integer arithmetic if all of the (cumulative) 
weights given to `random.choices` are integers?

----------
components: Library (Lib)
messages: 415981
nosy: Mark.Bell
priority: normal
severity: normal
status: open
title: random.choice and random.choices have different distributions
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue47114>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue47114] random.choice and random.choices have different distributions

Reply via email to