On 2011-02-04 11:00, DB1984 wrote:
Hi Greg, In addition to the reply above, to address your questions - I fully appreciate that my understanding of the code is basic - this is my first attempt at putting this together... My starting point is a data frame with numeric and text columns, but I can cut columns to make a fully numeric matrix if that is easier to handle. "apply(y, 1, shapiro.test)" works for a second dataframe, yes. I guess that I chose a bad example dataset for 'nt'! The overall aim is to test the normality of the distribution of the values in each row. I would then subset out the non-normal distributions to interrogate further. The shapiro.test seems a simple first pass at this. I'd like to move on to plotting residuals of a QQplot next, to see if that is more or less sensitive at detecting non-normal distributions in the dataset. If you would recommend an alternative approach, I'd appreciate the input, thanks..
I don't know what your overall scientific aim is, but here's something to ponder: Suppose that you randomly sample 400,000 observations from a NORMAL distribution and put these into a matrix of 20,000 rows by 20 columns and then perform your row-wise Normality tests, storing the p-values. If you now select those rows with p-value < 0.05, you will get about ..... many rows. (Fill in your best guess.) Question: what does that imply for your scientific analysis? Answer: Normality testing may not be your best line of attack. Peter Ehlers ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.