[
https://issues.apache.org/jira/browse/DATAFU-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944358#comment-13944358
]
Xiangrui Meng commented on DATAFU-21:
-------------------------------------
Wow, I'm surprised to see you have made this far! I didn't check the details.
But since you know you need to solve a nonlinear equation for q1 and q2, where
data is distributed, you are probably on the right direction. You can
discretize the weights to compress the data while maintaining a certain level
of accuracy, then solve the relaxed inequality on a single node.
> Probability weighted sampling without reservoir
> -----------------------------------------------
>
> Key: DATAFU-21
> URL: https://issues.apache.org/jira/browse/DATAFU-21
> Project: DataFu
> Issue Type: New Feature
> Environment: Mac OS, Linux
> Reporter: jian wang
> Assignee: jian wang
>
> This issue is used to track investigation on finding a weighted sampler
> without using internal reservoir.
> At present, the SimpleRandomSample has implemented a good
> acceptance-rejection sampling algo on probability random sampling. The
> weighted sampler could utilize the simple random sample with slight
> modification.
> One slight modification is: the present simple random sample generates a
> uniform random number lies between (0, 1) as the random variable to accept or
> reject an item. The weighted sample may generate this random variable based
> on the item's weight and this random number still lies between (0, 1) and
> each item's random variable remain independent between each other.
> Need further think and experiment the correctness of this solution and how to
> implement it in an effective way.
--
This message was sent by Atlassian JIRA
(v6.2#6252)