[ 
https://issues.apache.org/jira/browse/DATAFU-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109228#comment-14109228
 ] 

jian wang commented on DATAFU-21:
---------------------------------

The simple R script to plot the weight to frequency graph:

q1_reducer 
<-read.table('/Users/wjian/project/datafu_apache_20140320/incubator-datafu/datafu-pig/src/main/java/datafu/experiment/sampling/uniform_sample_weight_distrib',
 header=FALSE)
q2_reducer 
<-read.table('/Users/wjian/project/datafu_apache_20140320/incubator-datafu/datafu-pig/src/main/java/datafu/experiment/sampling/uniform_sample_weight_distrib_baseline',
 header=FALSE)
plot(q1_reducer$V1, q1_reducer$V2, xlab="weight", ylab="frequency")
par(new=TRUE)
plot(q2_reducer$V1, q2_reducer$V2, xlab="weight", ylab="frequency", col="green")


> Probability weighted sampling without reservoir
> -----------------------------------------------
>
>                 Key: DATAFU-21
>                 URL: https://issues.apache.org/jira/browse/DATAFU-21
>             Project: DataFu
>          Issue Type: New Feature
>         Environment: Mac OS, Linux
>            Reporter: jian wang
>            Assignee: jian wang
>         Attachments: DATAFU-21.patch
>
>
> This issue is used to track investigation on finding a weighted sampler 
> without using internal reservoir. 
> At present, the SimpleRandomSample has implemented a good 
> acceptance-rejection sampling algo on probability random sampling. The 
> weighted sampler could utilize the simple random sample with slight 
> modification.
> One slight modification is:  the present simple random sample generates a 
> uniform random number lies between (0, 1) as the random variable to accept or 
> reject an item. The weighted sample may generate this random variable based 
> on the item's weight and this random number still lies between (0, 1) and 
> each item's random variable remain independent between each other.
> Need further think and experiment the correctness of this solution and how to 
> implement it in an effective way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to