Mersenne Twister generator is known to be sensitive to the algorithm used to generate its initial state. The initialization used in R generates the initial state in a way, which leaves linear dependencies mod 2 among the bits in the initial state. Since Mersenne Twister performs only operations, which are linear mod 2, these dependencies propagate to the output sequence.
An easy to see consequence of this may be demonstrated by the following script: pattern <- function(m=1400) { x <- runif(m) y <- matrix(nrow=m,ncol=32) for (j in 1:32) { y[,j] <- floor(2*x) x <- 2*x - y[,j] } u <- rep(0,times=32) u[c(3,7,10,14,21,25,32)] <- 1 c(y %*% u) %% 2 } RNGkind("default") # or set.seed() with any seed z <- pattern() abs(diff(z,lag=2)) # sequence with long constant subsequences It should be pointed out that it is indeed a consequence of the initialization. If e.g. runif(10000) is run after RNGkind/set.seed, then the effect disappears. Note that each row in matrix y used in function pattern() contains the bit representation of one of the numbers from runif(m). Different elements of z are derived from different rows in y and, hence, from different elements of runif(m). Consequently, they should mimic an i.i.d. sequence. However, abs(diff(z,lag=2)) allows to reject such an assumption easily. The pattern is even better visible graphically, for example using for (i in 1:20) { set.seed(i) z <- pattern() z <- abs(diff(z,lag=2)) if (i == 1) { plot(cumsum(2*z-1),type="l") } else { lines(cumsum(2*z-1)) } } The resulting curve is almost the same for all the seeds. I have a working patch, which solves this problem by adding a new generator called Mersenne-Twister-52, which is the standard Mersenne Twister with the following modifications: - It uses MRG32k5a by P.L'Ecuyer for generating the initial state (This generator works modulo odd primes and so does not generate dependencies of the kind to which Mersenne Twister is sensitive.) - Combines 26 bits of two consequtive numbers into a single number with 52 random bits (this explains its name) and adds a constant shift 2^-53 to guarantee that the result is always in (0,1). Combining the two changes together allows to keep the current Mersenne Twister implementation intact for backward compatibility and provides more reasons to add a new name than just a different seeding. In my opinion, there may be applications, which can benefit from more then 32 random bits in the numbers from runif(n). I would be pleased to send the patch to R-devel, if the proposed solution is of the sort, which could be considered. Petr Savicky. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel