Re: random.sample with large weighted sample-sets?

duncan smith Sun, 16 Feb 2014 09:42:06 -0800

On 16/02/14 16:35, Charles Allen wrote:

How efficient does this thing need to be?


You can always just turn it into a two-dimensional sampling problem by
thinking of the data as a function f(x=item), generating a random x=xr
in [0,x], then generating a random y in [0,max(f(x))].  The xr is
accepted if 0 < y <= max(f(xr)), or rejected (and another attempt made) if
y > max(f(xr)).

You can avoid rejection by constructing an alias table. A list can beconstructed such that each list element contains a pair of values and acutoff. e.g.


[("apple", 20), ("orange", 50), ("grape", 30)]

would become (using one particular algorithm)

[(("apple", "orange"), 0.6),
 (("orange", "apple"), 1.0),
 (("grape", "orange"), 0.9)]

Generate a random index, then select one of the values on the basis ofthe cutoff. For short enough lists you can generate a single 0-1 randomvariate, u, and use int(n*u) for the index and compare n*u - int(n*u) tothe cutoff, where n is the length of the list. It's still sampling withreplacement though.


Duncan


--
https://mail.python.org/mailman/listinfo/python-list

Re: random.sample with large weighted sample-sets?

Reply via email to