jian wang created DATAFU-21:
-------------------------------
Summary: Probability weighted sampling without reservoir
Key: DATAFU-21
URL: https://issues.apache.org/jira/browse/DATAFU-21
Project: DataFu
Issue Type: New Feature
Environment: Mac OS, Linux
Reporter: jian wang
This issue is used to track investigation on finding a weighted sampler without
using internal reservoir.
At present, the SimpleRandomSample has implemented a good acceptance-rejection
sampling algo on probability random sampling. The weighted sampler could
utilize the simple random sample with slight modification.
One slight modification is: the present simple random sample generates a
uniform random number lies between (0, 1) as the random variable to accept or
reject an item. The weighted sample may generate this random variable based on
the item's weight and this random number still lies between (0, 1) and each
item's random variable remain independent between each other.
Need further think the correctness of this solution and how to implement it in
an effective way.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)