Wow! Your reply amply demonstrates the power of understanding the math (or finding someone who does) before turning on the computer.
One last R question...how could I efficiently enumerate all distinguishable permutations of a vector of signs? vect <- c( -1, -1, 1, 1, 1) permn(vect) #length: factorial(length(vect)) ???? #length: choose(n, n1) Best Regards. --Dale On Thu, Feb 11, 2010 at 10:04 PM, Greg Snow <greg.s...@imail.org> wrote: > Try this: > > druns2 <- function(x, n1, n2) { > > if( x %% 2 ) { # odd x > choose( n1 - 1, n1 - (x+1)/2 ) * choose( n2 - 1, n2 - (x-1)/2 > ) / choose( n1+n2, n1 ) + > choose( n2 - 1, n2 - (x+1)/2 ) * choose( n1 - 1, n1 - (x-1)/2 > ) / choose( n1+n2, n2 ) > } else { # even x > choose( n1 - 1, n1 - x/2 ) * choose( n2 - 1, n2 - x/2 ) / > choose( n1 + n2, n1 ) + > choose( n2 - 1, n2 - x/2 ) * choose( n1 - 1, n1 - x/2 ) / > choose( n1 + n2, n2 ) > } > } > > exp.2 <- sapply( 2:7, druns2, n1=3, n2=4 ) > exp.2 - exp.r/factorial(7) > > > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -----Original Message----- >> From: Dale Steele [mailto:dale.w.ste...@gmail.com] >> Sent: Thursday, February 11, 2010 5:28 PM >> To: Greg Snow >> Cc: R-help@r-project.org >> Subject: Re: [R] Code find exact distribution for runs test? >> >> Thanks for this! My original query was probably unclear. I think you >> have answered a different, possibly more interesting question. My >> goal was to find an exact distribution of a total number of runs R in >> samples of two types of size (n1, n2) under the null hypothesis of >> randomness. >> >> The horribly inefficient code below generates results which match >> Table 10 in Wackerly, Mendehall and Scheaffer for the distribution of >> the total number of runs R in samples of size (n1, n2); P(R <= r), >> under the null hypothesis of randomness. Horribly inefficient because >> couldn't figure out how to generate only the distinguishable >> permutations of my sample vector. Hope this brute force approach >> illustrates what I am after. >> >> >> nruns <- function(x) { >> signs <- sign(x) >> runs <- rle(signs) >> r <- length(runs$lengths) >> return(r) >> } >> >> library(combinat) >> # for example (n1=3, n2=4) >> n1 <- 3; n2 <- 4; n <- n1+n2 >> vect <- rep( c(-1,1), c(3,4)) >> vect >> >> # ! 'nruns(vect)' generates factorial(7) permutations, only >> choose(7,3) are distinguishable >> >> exp.r <- table(unlist(permn(vect, nruns ))) >> cumsum(dist/factorial(7)) >> >> # Result agrees with Table 10, pg 814 (n1=3, n2=4) >> #in Wackerly, Mendenhall and Scheaffer, 2002, presumably done using >> # exact calculations. >> >> Thanks. --Dale >> >> On Thu, Feb 11, 2010 at 6:08 PM, Greg Snow <greg.s...@imail.org> wrote: >> > OK, I think I have it worked out for both cases: >> > >> > library(vcd) >> > >> > druns <- function(x, n) { # x runs in n data points, not vectorized >> > # based on sample >> median >> > >> > if( n%%2 ) stop('n must be even') >> > >> > if( x %% 2 ) { # odd x >> > choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1, n/2- >> (x-1)/2 )/ >> > choose(n,n/2) * 2 >> > } else { # even x >> > choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2-x/2 >> )/ >> > choose(n,n/2) * 2 >> > } >> > } >> > >> > ## population median >> > out1 <- replicate( 100000, {x <- rnorm(10); >> length(rle(sign(x))$lengths) } ) >> > >> > rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 ) >> > chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) ) >> > >> > >> > ## sample median >> > out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x); >> length(rle(sign(x))$lengths) } ) >> > >> > fit <- sapply( 2:10, druns, n=10 ) >> > >> > rootogram( table(out2), fit * 100000 ) >> > chisq.test( table(out2), p=fit ) >> > >> > >> > The tests only fail to prove me wrong, not a firm proof that I am >> correct. But given the simulation size I am at least pretty close. I >> can see why people worked out approximations before we had good >> computers, I would not want to calculate the above by hand. >> > >> > Enjoy, >> > >> > -- >> > Gregory (Greg) L. Snow Ph.D. >> > Statistical Data Center >> > Intermountain Healthcare >> > greg.s...@imail.org >> > 801.408.8111 >> > >> > >> >> -----Original Message----- >> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> >> project.org] On Behalf Of Greg Snow >> >> Sent: Thursday, February 11, 2010 12:19 PM >> >> To: Dale Steele; R-help@r-project.org >> >> Subject: Re: [R] Code find exact distribution for runs test? >> >> >> >> I am not an expert in this area, but here are some thoughts that may >> >> get you started towards an answer. >> >> >> >> First, there are 2 ways (possibly more) that can lead to the data >> for a >> >> runs test that lead to different theoretical distributions: >> >> >> >> 1. We have a true or hypothesized value of the median that we >> >> subtracted from the data, therefore each value has 50% probability >> of >> >> being positive/negative and all are independent of each other >> (assuming >> >> being exactly equal to the median is impossible or discarded). >> >> >> >> 2. We have subtracted the sample median from each sample value (and >> >> discarded any 0's) leaving us with exactly half positive and half >> >> negative and not having independence. >> >> >> >> In case 1, the 1st observation will always start a run. The second >> >> observation has a 50% chance of being the same sign (F) or different >> >> sign (S), with the probability being 0.5 for each new observation >> and >> >> them all being independent (under assumption of random selections >> from >> >> population with known/hypothesized median) and the number of runs >> >> equaling the number of S's, this looks like a binomial to me (with >> some >> >> '-1's inserted in appropriate places. >> >> >> >> In case 2, this looks like a hypergeometric distribution, there >> would >> >> be n!/((n/2)!(n/2)!) possible permutations, just need to compute how >> >> many of those permutations result in x runs to get the probability. >> >> There is probably a way to think about this in terms of balls and >> urns, >> >> but I have not worked that out yet. >> >> >> >> Hope this helps, >> >> >> >> -- >> >> Gregory (Greg) L. Snow Ph.D. >> >> Statistical Data Center >> >> Intermountain Healthcare >> >> greg.s...@imail.org >> >> 801.408.8111 >> >> >> >> >> >> > -----Original Message----- >> >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> >> > project.org] On Behalf Of Dale Steele >> >> > Sent: Wednesday, February 10, 2010 6:16 PM >> >> > To: R-help@r-project.org >> >> > Subject: [R] Code find exact distribution for runs test? >> >> > >> >> > I've been attempting to understand the one-sample run test for >> >> > randomness. I've found run.test{tseries} and run.test{lawstat}. >> >> Both >> >> > use a large sample approximation for distribution of the total >> number >> >> > of runs in a sample of n1 observations of one type and n2 >> >> observations >> >> > of another type. >> >> > >> >> > I've been unable to find R code to generate the exact distribution >> >> and >> >> > would like to see how this could be done (not homework). >> >> > >> >> > For example, given the data: >> >> > >> >> > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12, >> - >> >> 9, >> >> > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1) >> >> > >> >> > The Monte Carlo permutation approach seems to get me part way. >> >> > >> >> > >> >> > # calculate the number of runs in the data vector >> >> > nruns <- function(x) { >> >> > signs <- sign(x) >> >> > runs <- rle(signs) >> >> > r <- length(runs$lengths) >> >> > return(r) >> >> > } >> >> > >> >> > MC.runs <- function(x, nperm) { >> >> > RUNS <- numeric(nperm) >> >> > for (i in 1:nperm) { >> >> > RUNS[i] <- nruns(sample(x)) >> >> > } >> >> > cdf <- cumsum(table(RUNS))/nperm >> >> > return(list(RUNS=RUNS, cdf=cdf, nperm=nperm)) >> >> > } >> >> > >> >> > MC.runs(dtemp, 100000) >> >> > >> >> > Thanks. --Dale >> >> > >> >> > ______________________________________________ >> >> > R-help@r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide http://www.R-project.org/posting- >> >> > guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posting- >> >> guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.