Re: [R] Kolmogorov Smirnov Test

Kerry Thu, 11 Nov 2010 18:31:59 -0800

Thanks Ted and Greg. I had actually tried pnorm and after having
problems, thought maybe I was misunderstanding dnorm as a variable in
ks.test due to over- (more likely under) thinking it. I'm assuming now
that ks.test will consider my data in cumulative form (makes sense now
that I think about it, but I didn't want to assume any steps that the
R version of k-s test takes). I plan to explore the ideas and run the
simulations you sent in full over the weekend.


Thanks again!
Kerry

On Nov 11, 12:05 pm, Greg Snow <greg.s...@imail.org> wrote:
> Consider the following simulations (also fixing the pnorm instead of dnorm 
> that Ted pointed out and I missed):
>
> out1 <- replicate(10000, {
>         x <- rnorm(1000, 100, 3);
>         ks.test( x, pnorm, mean=100, sd=3 )$p.value
>         } )
>
> out2 <- replicate(10000, {
>         x <- rnorm(1000, 100, 3);
>         ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value
>         } )
>
> par(mfrow=c(2,1))
> hist(out1)
> hist(out2)
>
> mean(out1 <= 0.05 )
> mean(out2 <= 0.05 )
>
> In both cases the null hypothesis is true (or at least a meaningful 
> approximation to true) so the p-values should follow a uniform distribution.  
> In the case of out1 where the mean and sd are specified as part of the null 
> the p-values are reasonably uniform and the rejection rate is close to alpha 
> (should asymptotically approach alpha as the number of simulations 
> increases).  However looking at out2, where the parameters are set not by 
> outside knowledge or tests, but rather from the observed data, the p-values 
> are clearly not uniform and the rejection rate is far from alpha.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org801.408.8111begin_of_the_skype_highlighting              801.408.8111      end_of_the_skype_highlighting
>
>
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > project.org] On Behalf Of Kerry
> > Sent: Thursday, November 11, 2010 12:02 AM
> > To: r-h...@r-project.org
> > Subject: Re: [R] Kolmogorov Smirnov Test
>
> > Thanks for the feedback. My goal is to run a simple test to show that
> > the data cannot be rejected as either normally or uniformally
> > distributed (depening on the variable), which is what a previous K-S
> > test run using SPSS had shown. The actual distribution I compare to my
> > sample only matters that it would be rejected were my data multi-
> > modal. This way I can suggest the data is from the same population. I
> > later run PCA and cluster analyses to confirm this but I want an easy
> > stat to start with for the individual variables.
>
> > I didn't think I was comparing my data against itself, but rather
> > again a normal distribution with the same mean and standard deviation.
> > Using the mean seems necessary, so is it incorrect to have the same
> > standard deviation too? I need to go back and read on the K-S test to
> > see what the appropriate constraints are before bothering anyone for
> > more help. Sorry, I thought I had it.
>
> > Thanks again,
> > kbrownk
>
> > On Nov 11, 12:40 am, Greg Snow <greg.s...@imail.org> wrote:
> > > The way you are running the test the null hypothesis is that the data
> > comes from a normal distribution with mean=0 and standard deviation =
> > 1.  If your minimum data value is 0, then it seems very unlikely that
> > the mean is 0.  So the test is being strongly influenced by the mean
> > and standard deviation not just the shape of the distribution.
>
> > > Note that the KS test was not designed to test against a distribution
> > with parameters estimated from the same data (you can do the test, but
> > it makes the p-value inaccurate).  You can do a little better by
> > simulating the process and comparing the KS statistic to the
> > simulations rather than looking at the computed p-value.
>
> > > However you should ask yourself why you are doing the normality tests
> > in the first place.  The common reasons that people do this don't match
> > with what the tests actually test (see the fortunes on normality).
>
> > > --
> > > Gregory (Greg) L. Snow Ph.D.
> > > Statistical Data Center
> > > Intermountain Healthcare
> > > greg.s...@imail.org
> > > 801.408.8111
>
> > > > -----Original Message-----
> > > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > > > project.org] On Behalf Of Kerry
> > > > Sent: Wednesday, November 10, 2010 9:23 PM
> > > > To: r-h...@r-project.org
> > > > Subject: [R] Kolmogorov Smirnov Test
>
> > > > I'm using ks.test (mydata, dnorm) on my data. I know some of my
> > > > different variable samples (mydata1, mydata2, etc) must be normally
> > > > distributed but the p value is always < 2.0^-16 (the 2.0 can change
> > > > but not the exponent).
>
> > > > I want to test mydata against a normal distribution. What could I
> > be
> > > > doing wrong?
>
> > > > I tried instead using rnorm to create a normal distribution: y =
> > rnorm
> > > > (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
> > > > Then I ran the k-s: ks.test (mydata,y). Should this work?
>
> > > > One issue I had was that some of my data has a minimum value of 0,
> > but
> > > > rnorm ran as I have it above will potentially create negative
> > numbers.
>
> > > > Also some of my variables will likely be better tested against non-
> > > > normal distributions (uniform etc.), but if I figure I should learn
> > > > how to even use ks.test first.
>
> > > > I used to use SPSS but am really trying to jump into R instead, but
> > I
> > > > find the help to assume too heavy of statistical knowledge.
>
> > > > I'm guessing I have a long road before I get this, so any bits of
> > > > information that may help me get a bit further will be appreciated!
>
> > > > Thanks,
> > > > kbrownk
>
> > > > ______________________________________________
> > > > r-h...@r-project.org mailing list
> > > >https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guidehttp://www.R-project.org/posting-
> > > > guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
>
> > > ______________________________________________
> > > r-h...@r-project.org mailing
> > listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guidehttp://www.R-project.org/posting-
> > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> > ______________________________________________
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov Smirnov Test

Reply via email to