On 2010-06-23 12:05, Ralf B wrote:
Hi all,

I have two very large samples of data (10000+ data points) and would
like to perform normality tests on it. I know that p<  .05 means that
a data set is considered as not normal with any of the two tests. I am
also aware that large samples tend to lead more likely to normal
results (Andy Field, 2005).

I that depends on what you mean by 'tend to lead ...'


I have a few questions to ensure that I am using them right.

1) The Shapiro-Wilk test requires to provide mean and sd. Is is
correct to add here the mean and sd of the data itself (since I am
comparing to a normal distribution with the same parameters) ?

mySD<- sd(mydata$myfield)
myMean<- mean(mydata$myfield)
shapiro.test(rnorm(100, mean = myMean, sd = mySD))

I don't think that your understanding of the S-W test is correct.
You would just do:

 shapiro.test(mydata$myfield)

to test for Normality. However, shapiro.test() won't accept
sample sizes greater than 5000. So use ks.test. Or use a
graphical method: I like qq.plot in the 'car' package.


2) If I just want to test each distribution individually, I assume
that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct?

I don't understand this. What do you mean by 'test ... individually'?


3) If I simply want to know if normality exists or not, what should I
put for the parameter 'alternative' ? Does it actually matter?

alternative = c("two.sided", "less", "greater")

Leave it at the default 'two.sided' unless you have good
reason to suspect that the cdf lies above or below the Normal cdf.

  -Peter Ehlers


Thank you,
Ralf


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to