Serhiy Storchaka added the comment:

Thank you Neil. It is interesting.

Vose's alias method has followed disadvantages (in comparison with the roulette 
wheel selection proposed above):

1. It operates with probabilities and uses floats, therefore it can be a little 
less accurate.

2. It consumes two random number (an integer and a float) for generating one 
sample. It can be fixed however (in the cost of additional precision lost).

3. While it has same time and memory O(n) cost for initialization, it has 
larger multiplication, Vose's alias method requires several times larger time 
and memory for initialization.

4. It requires more memory in process of generating samples.

However it has an advantage. It really has constant time cost to generate each 
sample.

Here are some benchmark results. "Roulette Wheel" is proposed above 
implementation. "Roulette Wheel 2" is its modification with normalized 
cumulative sums. It has twice more initialization time, but 1.5-2x faster 
generates each sample. "Vose's Alias" is an implementation of Vose's alias 
method directly translated from Java. "Vose's Alias 2" is optimized 
implementation which uses Python specific.

Second column is a size of distribution, third column is initialization time 
(in milliseconds), fourth column is time to generate each sample (in 
microseconds), fifth column is a number of generated samples after which this 
method will overtake "Roulette Wheel" (including initialization time).

Roulette Wheel          10     0.059     7.165         0
Roulette Wheel 2        10     0.076     4.105         5
Vose's Alias            10     0.129    13.206         -
Vose's Alias 2          10     0.105     6.501        69
Roulette Wheel         100     0.128     8.651         0
Roulette Wheel 2       100     0.198     4.630        17
Vose's Alias           100     0.691    12.839         -
Vose's Alias 2         100     0.441     6.547       148
Roulette Wheel        1000     0.719    10.949         0
Roulette Wheel 2      1000     1.458     5.177       128
Vose's Alias          1000     6.614    13.052         -
Vose's Alias 2        1000     3.704     6.531       675
Roulette Wheel       10000     7.495    13.249         0
Roulette Wheel 2     10000    14.961     6.051      1037
Vose's Alias         10000    69.937    13.830         -
Vose's Alias 2       10000    37.017     6.746      4539
Roulette Wheel      100000    73.988    16.180         0
Roulette Wheel 2    100000   148.176     8.182      9275
Vose's Alias        100000   690.099    13.808    259716
Vose's Alias 2      100000   391.367     7.095     34932
Roulette Wheel     1000000   743.415    19.493         0
Roulette Wheel 2   1000000  1505.409     8.930     72138
Vose's Alias       1000000  7017.669    13.798   1101673
Vose's Alias 2     1000000  4044.746     7.152    267507

As you can see Vose's alias method has very large initialization time. 
Non-optimized version will never overtake "Roulette Wheel" with small 
distributions (<100000), and even optimized version will never overtake 
"Roulette Wheel" with small distributions (<100000). Only with very large 
distributions Vose's alias method has an advantage (when you needs very larger 
number of samples).

Because for generating only one sample we need a method with fastest 
initialization we need "Roulette Wheel" implementation. And because large 
distributions are rare, I think there is no need in alternative implementation. 
In worst case for generating 1000000 samples from 1000000-elements distribution 
the difference between "Roulette Wheel" and "Vose's Alias 2" is a difference 
between 20 and 11 seconds.

----------
Added file: http://bugs.python.org/file31734/wcg_bench.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18844>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to