HI,
I guess your original dataset would have some list elements as empty. Clean<- structure(list(GRADE = c(1, 2, 3, 1.5, 1.75, 2, 0.5, 2, 3.5, 3.5, 3.75, 4), TERM = c(9L, 9L, 9L, 8L, 8L, 8L, 9L, 9L, 9L, 8L, 8L, 8L), INST_NUM = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L)), .Names = c("GRADE", "TERM", "INST_NUM"), class = "data.frame", row.names = c(NA, -12L)) lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) shapiro.test(x$GRADE)) #$`8.1` # Shapiro-Wilk normality test # #data: x$GRADE #W = 1, p-value = 1 # #$`9.1` # # Shapiro-Wilk normality test # #data: x$GRADE #W = 1, p-value = 1 ----------------------------------------------------- sapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) shapiro.test(x$GRADE)$p.value) #8.1 9.1 8.2 9.2 # 1 1 1 1 with(Clean, aggregate(GRADE,list(TERM,INST_NUM),FUN=shapiro.test)) #the output is a list, # Group.1 Group.2 x #1 8 1 1 #2 9 1 1 #3 8 2 1 #4 9 2 1 #Warning message: #In format.data.frame(x, digits = digits, na.encode = FALSE) : # corrupt data frame: columns will be truncated or padded with NAs library(plyr) ldply(dlply(Clean,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), summarize, pval=p.value) # TERM INST_NUM pval #1 8 1 1 #2 8 2 1 #3 9 1 1 #4 9 2 1 Now, consider this example: Clean1<- structure(list(GRADE = c(1, 2, 3, 1.5, 1.75, 2, 0.5, 2, 3.5, 3.5, 3.75, 4, 4.5, 4.25, 4.32), TERM = c(9L, 9L, 9L, 8L, 8L, 8L, 9L, 9L, 9L, 8L, 8L, 8L, 10L, 10L, 10L), INST_NUM = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("GRADE", "TERM", "INST_NUM"), class = "data.frame", row.names = c(NA, -15L)) lapply(split(Clean1,list(Clean1$TERM,Clean1$INST_NUM)),function(x) shapiro.test(x$GRADE)) #Error in shapiro.test(x$GRADE) : sample size must be between 3 and 5000 split(Clean1,list(Clean1$TERM,Clean1$INST_NUM))[[6]] ###0 rows #[1] GRADE TERM INST_NUM #<0 rows> (or 0-length row.names) lst1<-split(Clean1,list(Clean1$TERM,Clean1$INST_NUM)) lapply(lst1[lapply(lst1,nrow)>0], function(x) shapiro.test(x$GRADE)) #$`8.1` # # Shapiro-Wilk normality test # #data: x$GRADE #W = 1, p-value = 1 You could do this directly with: ldply(dlply(Clean1,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), summarize, pval=p.value) # TERM INST_NUM pval #1 8 1 1.0000000 #2 8 2 1.0000000 #3 9 1 1.0000000 #4 9 2 1.0000000 #5 10 1 0.5248807 ldply(dlply(Clean1,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), summarize, pval=p.value,stat1=statistic) # TERM INST_NUM pval stat1 #1 8 1 1.0000000 1.0000000 #2 8 2 1.0000000 1.0000000 #3 9 1 1.0000000 1.0000000 #4 9 2 1.0000000 1.0000000 #5 10 1 0.5248807 0.9393788 #or with(Clean1, aggregate(GRADE,list(TERM,INST_NUM),FUN=function(x) shapiro.test(x)$p.value)) Group.1 Group.2 x 1 8 1 1.0000000 2 9 1 1.0000000 3 10 1 0.5248807 4 8 2 1.0000000 5 9 2 1.0000000 #If you want both pvalue and statistic with(Clean1, aggregate(GRADE,list(TERM,INST_NUM),FUN=function(x) cbind(shapiro.test(x)$p.value,shapiro.test(x)$statistic)) ) # Group.1 Group.2 x.1 x.2 #1 8 1 1.0000000 1.0000000 #2 9 1 1.0000000 1.0000000 #3 10 1 0.5248807 0.9393788 #4 8 2 1.0000000 1.0000000 #5 9 2 1.0000000 1.0000000 Hope this helps. A.K. ________________________________ From: Robert Lynch <robert.b.ly...@gmail.com> To: arun <smartpink...@yahoo.com> Cc: R help <r-help@r-project.org> Sent: Tuesday, August 20, 2013 8:00 PM Subject: Re: [R] ave function I tried > lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) > shapiro.test(x$GRADE)) and I got >Error in shapiro.test(x$GRADE.) : sample size must be between 3 and 5000 I also tried with(Clean, aggregate(GRADE,list(TERM,INST_NUM),FUN=shapiro.test)) and got Group.1 Group.2 x 1 201001 689809 0.9546164 2 201201 689809 0.9521624 3 201301 689809 0.9106206 4 200701 994474 0.8862705 5 200710 994474 0.9176743 6 201203 1105752 0.9382688 . . . 72 201001 1759272 0.9291295 73 201101 1759272 0.9347072 74 201110 1897809 0.9395375 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs I am not sure how to interpret the output of the second. Thanks! On Tue, Aug 13, 2013 at 11:01 AM, arun <smartpink...@yahoo.com> wrote: Hi, >You could try: > lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) >shapiro.test(x$GRADE)) >A.K. > > > > > >----- Original Message ----- >From: Robert Lynch <robert.b.ly...@gmail.com> >To: r-help@r-project.org >Cc: >Sent: Tuesday, August 13, 2013 1:46 PM >Subject: [R] ave function > >I've written the following function >CoursePrep <- function (Source, SaveName) { > > > Clean$TERM <- as.factor(Clean$TERM) > > Clean$INST_NUM <- as.factor(Clean$INST_NUM) > Clean$zGrade <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = >scale)) > write.csv(Clean,paste(SaveName, "csv", sep ="."), row.names = FALSE) > return(Clean) >} > >which is all well and good, but I wan't to throw a shapiro.test in before I >normalize. that is I don't really understand quite how I did ( I got help) >what I wanted to in the >Clean$zGrade <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) >that code for the whole of Clean finds all sets of GRADE.'s that have the >same INST_NUM and TERM computes a mean, subtracts off the mean and divides >by the standard deviation. I would like to for each one of those sets of >grades to call shapiro.test() on the set, to see if it is normal *before* I >assume it is. > >I know the naive >with(Clean, shapiro.test( list(TERM, INST_NUM))) >doesn't work. >with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = >function(x)shapiro.test(x))) > >which returns >Error in shapiro.test(x) : sample size must be between 3 and 5000 >and I have checked that the sets selected are all of length between 3 and >5000. >using the following on my full data > >ClassSize <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = >function(x)length(x))) >> summary(ClassSize) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 22.0 198.0 241.0 244.4 279.0 466.0 > >here is some sample data >GRADE TERM INST_NUM >1, 9, 1 >2, 9, 1 >3, 9, 1 >1.5, 8, 2 >1.75, 8, 2 >2, 8, 2 >0.5, 9, 2 >2, 9, 2 >3.5, 9, 2 >3.5, 8, 1 >3.75, 8, 1 >4, 8, 1 > >and hopefully the code would test the following set of grades >(1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4) > >Thanks Robert > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.