Greetings. My wife is teaching an introductory stat class at UC Davis. The class emphasizes the use of simulations, rather than mathematics, to get insight into statistics, and R is the mandated tool. A student in the class recently inquired about different approaches to sampling from a binomial distribution. I've appended some code that exhibits the idea, the gist of which is that using sample(c(0, 1), ...) and rbinom(...) should give equivalent results.
The surprising (to me) result is that the two approaches DO give the same result, EXCEPT when the probability is exactly 0.5. See Appendix A for the code and Appendix B for the output. I don't think this issue is system-dependent, but I've put my session information in Appendix C. Another wrinkle in this is that if I omit the "prob" parameter from the call to sample, meaning to take the default value of 0.5, the two methods DO give the same result. Any thoughts about this? Thanks. --Mike Appendix A: some R code that exhibits the problem ================================================= ppp <- seq(0, 1, by = 0.01) result <- do.call(rbind, lapply(ppp, function(p) { set.seed(1) sampleRes <- sample(c(0, 1), size = 1, replace = TRUE, prob=c(1-p, p)) set.seed(1) rbinomRes <- rbinom(1, size = 1, prob = p) data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) })) result Appendix B: the output from the R code ====================================== prob equivalent 1 0.00 TRUE 2 0.01 TRUE 3 0.02 TRUE 4 0.03 TRUE 5 0.04 TRUE 6 0.05 TRUE 7 0.06 TRUE 8 0.07 TRUE 9 0.08 TRUE 10 0.09 TRUE 11 0.10 TRUE 12 0.11 TRUE 13 0.12 TRUE 14 0.13 TRUE 15 0.14 TRUE 16 0.15 TRUE 17 0.16 TRUE 18 0.17 TRUE 19 0.18 TRUE 20 0.19 TRUE 21 0.20 TRUE 22 0.21 TRUE 23 0.22 TRUE 24 0.23 TRUE 25 0.24 TRUE 26 0.25 TRUE 27 0.26 TRUE 28 0.27 TRUE 29 0.28 TRUE 30 0.29 TRUE 31 0.30 TRUE 32 0.31 TRUE 33 0.32 TRUE 34 0.33 TRUE 35 0.34 TRUE 36 0.35 TRUE 37 0.36 TRUE 38 0.37 TRUE 39 0.38 TRUE 40 0.39 TRUE 41 0.40 TRUE 42 0.41 TRUE 43 0.42 TRUE 44 0.43 TRUE 45 0.44 TRUE 46 0.45 TRUE 47 0.46 TRUE 48 0.47 TRUE 49 0.48 TRUE 50 0.49 TRUE 51 0.50 FALSE 52 0.51 TRUE 53 0.52 TRUE 54 0.53 TRUE 55 0.54 TRUE 56 0.55 TRUE 57 0.56 TRUE 58 0.57 TRUE 59 0.58 TRUE 60 0.59 TRUE 61 0.60 TRUE 62 0.61 TRUE 63 0.62 TRUE 64 0.63 TRUE 65 0.64 TRUE 66 0.65 TRUE 67 0.66 TRUE 68 0.67 TRUE 69 0.68 TRUE 70 0.69 TRUE 71 0.70 TRUE 72 0.71 TRUE 73 0.72 TRUE 74 0.73 TRUE 75 0.74 TRUE 76 0.75 TRUE 77 0.76 TRUE 78 0.77 TRUE 79 0.78 TRUE 80 0.79 TRUE 81 0.80 TRUE 82 0.81 TRUE 83 0.82 TRUE 84 0.83 TRUE 85 0.84 TRUE 86 0.85 TRUE 87 0.86 TRUE 88 0.87 TRUE 89 0.88 TRUE 90 0.89 TRUE 91 0.90 TRUE 92 0.91 TRUE 93 0.92 TRUE 94 0.93 TRUE 95 0.94 TRUE 96 0.95 TRUE 97 0.96 TRUE 98 0.97 TRUE 99 0.98 TRUE 100 0.99 TRUE 101 1.00 TRUE Appendix C: Session information =============================== > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.