On 16/02/14 16:35, Charles Allen wrote:
How efficient does this thing need to be?

You can always just turn it into a two-dimensional sampling problem by
thinking of the data as a function f(x=item), generating a random x=xr
in [0,x], then generating a random y in [0,max(f(x))].  The xr is
accepted if 0 < y <= max(f(xr)), or rejected (and another attempt made) if
y > max(f(xr)).


You can avoid rejection by constructing an alias table. A list can be constructed such that each list element contains a pair of values and a cutoff. e.g.

[("apple", 20), ("orange", 50), ("grape", 30)]

would become (using one particular algorithm)

[(("apple", "orange"), 0.6),
 (("orange", "apple"), 1.0),
 (("grape", "orange"), 0.9)]

Generate a random index, then select one of the values on the basis of the cutoff. For short enough lists you can generate a single 0-1 random variate, u, and use int(n*u) for the index and compare n*u - int(n*u) to the cutoff, where n is the length of the list. It's still sampling with replacement though.

Duncan


--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to