> Duncan Murdoch: > > and you can see it in the original m with > > x <- sample(m, 1000000, replace = TRUE) > plot(density(x[x %% 2 == 0]))
OK. Thanks. I see there is a real problem. One option to fix it while mostly retaining backwards-compatibility would be to add extra bits from a second RNG call only when m is large - eg, larger than 2^27. That would retain reproducibility for most analyses of small to moderate size data sets. Of course, there would still be some small, detectable error for values a bit less than 2^27, but perhaps that's tolerable. (The 2^27 threshold could obviously be debated.) R Core made a similar decision in the case of sampling with replacement when implementing a new hashing algorithm that produces different results. It is enabled by default only when m > 1e7 and no more than half the values are to be sampled, as was noted: > Note that it wouldn't be the first time that sample() changes behavior > in a non-backward compatible way: > > https://stat.ethz.ch/pipermail/r-devel/2012-October/065049.html > > Cheers, > H. That incompatibility could have been avoided. A year ago I posted a fast hashing algorithm that produces the same results as the simple algorithm, here: https://stat.ethz.ch/pipermail/r-devel/2017-October/075012.html The latest version of this will be in the soon-to-be new release of pqR, and will of course enabled automatically whenever it seems desirable, for a considerable speed gain in many cases. Radford Neal ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel