Re: [R] shapiro.test

Rolf Turner Fri, 21 Feb 2014 15:16:29 -0800

On 22/02/14 11:04, Rui Barradas wrote:

Hello,


Not answering directly to your question, if the sample size is a
documented problem with shapiro.test and you want a normality test, why
don't you use ?ks.test?

m <- mean(HP_TrinityK25$V2)
s <- sd(HP_TrinityK25$V2)

ks.test(HP_TrinityK25$V2, "pnorm", m, s)

Strictly speaking this is not a valid test. The KS test is used fortesting against a *completely specified* distribution. If there areparameters to be estimated, the null distribution is no longerapplicable. This may not be a "real" problem if the parameters are*well* estimated, as they would be in this instance (given that thesample size is over-large). I'm not sure about this.


The "Lilliefors" test is theoretically available in this context when

mu and sigma are estimated, but according to the Wikipedia article, theLilliefors distribution is not known analytically and the criticalvalues must be determined by Monte Carlo methods. There is a"LillieTest" function in the "DescTools" package which makes use of someapproximations to get p-values.

However I think that a better approach would be to use a chi-squaredgoodness of fit test whereby you can adjust for estimated parameterssimply by reducing the degrees of freedom. I believe that thechi-squared test is somewhat low in power, but with a very large samplethis should not be a problem.

The difficulty with the chi-squared test is that the choice of "bins" issomewhat arbitrary. I believe the best approach is to take the binboundaries to be the quantiles of the normal distribution (withparameters "m" and "s") corresponding to equispaced probabilities on[0,1], with the number of such probabilities being k+1 wherek = floor(n/5), n being the sample size. This makes the expected countsall equal to n/k >= 5 so that the chi-squared test is "valid". Thedegrees of freedom are then k-3 (k - 1 - #estimated parameters).

One last comment: I believe that it is generally considered thattesting for normality is a waste of time and a pseudo-intellectualexercise of academic interest at best.


cheers,

Rolf Turner



Hope this helps,

Rui Barradas

Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu:

Dear R users,
Please help with with this maybe basic question. I am trying to see if my
data is normal but is a large file and the test does not work.
I keep getting the message : "Error in shapiro.test(x = HP_TrinityK25$V2)
:  sample size must be between 3 and 5000"
thanks!

  shapiro.test(x=HP_TrinityK25$V2)
Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be
between 3
and 5000

##Note:
HP_TrinityK25= my file
HP_TrinityK25$V2= data in my file

    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shapiro.test

Reply via email to