On 2011-02-04 11:00, DB1984 wrote:

Hi Greg,

In addition to the reply above, to address your questions - I fully
appreciate that my understanding of the code is basic - this is my first
attempt at putting this together...

My starting point is a data frame with numeric and text columns, but I can
cut columns to make a fully numeric matrix if that is easier to handle.

"apply(y, 1, shapiro.test)" works for a second dataframe, yes. I guess that
I chose a bad example dataset for 'nt'!


The overall aim is to test the normality of the distribution of the values
in each row. I would then subset out the non-normal distributions to
interrogate further. The shapiro.test seems a simple first pass at this. I'd
like to move on to plotting residuals of a QQplot next, to see if that is
more or less sensitive at detecting non-normal distributions in the dataset.

If you would recommend an alternative approach, I'd appreciate the input,
thanks..

I don't know what your overall scientific aim is, but here's
something to ponder:

Suppose that you randomly sample 400,000 observations from a
NORMAL distribution and put these into a matrix of 20,000
rows by 20 columns and then perform your row-wise Normality
tests, storing the p-values.

If you now select those rows with p-value < 0.05, you will
get about ..... many rows. (Fill in your best guess.)

Question: what does that imply for your scientific analysis?

Answer: Normality testing may not be your best line of attack.

Peter Ehlers

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to