Re: [R] anova planned comparisons/contrasts
Others have shown some approaches that work well for after you fit the model. Here is another approach starting with the model fit itself: tmp - c(control, glucose, fructose, gluc+fruct, sucrose) treatment - factor(rep( tmp, each=10 ), levels=tmp) length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68, 57, 58, 60, 59, 62, 60, 60, 57, 59, 61, 58, 61, 56, 58, 57, 56, 61, 60, 57, 58, 58, 59, 58, 61, 57, 56, 58, 57, 57, 59, 62, 66, 65, 63, 64, 62, 65, 65, 62, 67) cm1 - rbind( int=1/5, cntrl = c(4, -1,-1,-1,-1)/4, plus=c( 0, -1, -1, 3, -1 )/3, sug1=c( 0, -1, 1, 0, 0), sug2=c( 0, -1, -1, 0, 2)/2 ) cm2 - zapsmall(solve(cm1)) contrasts(treatment) - cm2[,-1] treatment sugars - data.frame( length=length, treatment=treatment ) fit - aov( length ~ treatment, data=sugars ) summary.lm(fit) summary(fit) summary(fit, split=list(treatment=list( 'Control vs. Sugars'=1, 'Gluc+Fruc vs. sugars'=2, 'Sugars'=3:4))) Is this more what you were looking for? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tyler Smith Sent: Thursday, November 22, 2007 12:38 PM To: [EMAIL PROTECTED] Subject: [R] anova planned comparisons/contrasts Hi, I'm trying to figure out how anova works in R by translating the examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a snag with planned comparisons, their box 9.4 and section 9.6. It's a basic anova design: treatment - factor(rep(c(control, glucose, fructose, gluc+fruct, sucrose), each = 10)) length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68, 57, 58, 60, 59, 62, 60, 60, 57, 59, 61, 58, 61, 56, 58, 57, 56, 61, 60, 57, 58, 58, 59, 58, 61, 57, 56, 58, 57, 57, 59, 62, 66, 65, 63, 64, 62, 65, 65, 62, 67) sugars - data.frame(treatment, length) The basic analysis is easy enough: anova(lm(length ~ treatment, sugars)) SR proceed to calculate planned comparisons between control and all other groups, between gluc+fruct and the remaining sugars, and among the three pure sugars. I can get the first two comparisons using the contrasts() function: contrasts(sugars$treatment) - matrix(c(-4, 1, 1, 1, 1, 0, -1, 3, -1, -1), 5, 2) summary(lm(length ~ treatment, sugars)) The results appear to be the same, excepting that the book calculates an F value, whereas R produces an equivalent t value. However, I can't figure out how to calculate a contrast among three groups. For those without access to Sokal and Rohlf, I've written an R function that performs the analysis they use, copied below. Is there a better way to do this in R? Also, I don't know how to interpret the Estimate and Std. Error columns from the summary of the lm with contrasts. It's clear the intercept in this case is the grand mean, but what are the other values? They appear to be the difference between one of the contrast groups and the grand mean -- wouldn't an estimate of the differences between the contrasted groups be more appropriate, or am I (likely) misinterpreting this? Thanks! Tyler MyContrast - function(Var, Group, G1, G2, G3=NULL) { ## Var == data vector, Group == factor ## G1, G2, G3 == character vectors of factor levels to contrast nG1 = sum(Group %in% G1) nG2 = sum(Group %in% G2) GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) G1Mean = mean(Var[Group %in% G1]) G2Mean = mean(Var[Group %in% G2]) if(is.null(G3)) MScontr = nG1 * ((G1Mean - GrandMean)^2) + nG2 * ((G2Mean - GrandMean)^2) else { nG3 = sum(Group %in% G3) G3Mean = mean(Var[Group %in% G3]) MScontr = (nG1 * ((G1Mean - GrandMean)^2) + nG2 * ((G2Mean - GrandMean)^2) + nG3 * ((G3Mean - GrandMean)^2))/2 } An - anova(lm(Var ~ Group)) MSwithin = An[3]['Residuals',] DegF = An$Df[length(An$Df)] Fval = MScontr / MSwithin Pval = 1 - pf(Fval, 1, DegF) return (list(MS_contrasts = MScontr, MS_within = MSwithin, F_value = Fval, P_value = Pval)) } ## The first two contrasts produce the same (+/- rounding error) ## p-values as obtained using contrasts() MyContrast(sugars$length, sugars$treatment, 'control', c(fructose, gluc+fruct, glucose, sucrose)) MyContrast(sugars$length, sugars$treatment, 'gluc+fruct', c(fructose, glucose, sucrose)) ## How do you do this in standard R? MyContrast(sugars$length, sugars$treatment, fructose, glucose, sucrose) __ R-help@r-project.org mailing list https
Re: [R] anova planned comparisons/contrasts
On 2007-11-22, Dieter Menne [EMAIL PROTECTED] wrote: Tyler Smith tyler.smith at mail.mcgill.ca writes: I'm trying to figure out how anova works in R by translating the examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a snag with planned comparisons, their box 9.4 and section 9.6. It's a basic anova design: how to do contrast It's easier to use function estimable in package gmodels, or glht in package multcomp. Isn't glht calculating unplanned comparisons, as opposed to the planned comparisons with contrasts() and summary(..., split= ...)? Can you do planned comparisons with glht, or unplanned comparisons with summary()? contrasts(warpbreaks$tension) - matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3) amod - aov(formula = breaks ~ tension, data = warpbreaks) ## this isn't right: summary(amod, split = list(tension = list('L vs M' =1, 'L vs H'=2, 'M vs H' = 3))) ## posthoc contrasts (three ways to do the same test, I think): summary(glht(amod, linfct = mcp(tension = 'Tukey'))) summary(glht(amod, linfct = mcp(tension = matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3 TukeyHSD(amod) Thanks, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova planned comparisons/contrasts
On 2007-11-22, Peter Alspach [EMAIL PROTECTED] wrote: Tyler For balanced data like this you might find aov() gives an output which is more comparable to Sokal and Rohlf (which I don't have): trtCont - C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5, 2)) sugarsAov - aov(length ~ trtCont, sugars) summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs others'=2))) model.tables(sugarsAov, type='mean', se=T) Thank you Peter, that's a big help! To confirm that I understand correctly, aov is identical to lm, but provides better summary information for balanced anova designs. As such, it is preferred to lm for balanced anova designs, but should be avoided otherwise. Is that correct? Also, it appears that C and contrasts serve pretty much the same purpose. Is there a context in which one is preferable to the other? Cheers, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova planned comparisons/contrasts
Tyler Smith tyler.smith at mail.mcgill.ca writes: I'm trying to figure out how anova works in R by translating the examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a snag with planned comparisons, their box 9.4 and section 9.6. It's a basic anova design: how to do contrast It's easier to use function estimable in package gmodels, or glht in package multcomp. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] anova planned comparisons/contrasts
Hi, I'm trying to figure out how anova works in R by translating the examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a snag with planned comparisons, their box 9.4 and section 9.6. It's a basic anova design: treatment - factor(rep(c(control, glucose, fructose, gluc+fruct, sucrose), each = 10)) length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68, 57, 58, 60, 59, 62, 60, 60, 57, 59, 61, 58, 61, 56, 58, 57, 56, 61, 60, 57, 58, 58, 59, 58, 61, 57, 56, 58, 57, 57, 59, 62, 66, 65, 63, 64, 62, 65, 65, 62, 67) sugars - data.frame(treatment, length) The basic analysis is easy enough: anova(lm(length ~ treatment, sugars)) SR proceed to calculate planned comparisons between control and all other groups, between gluc+fruct and the remaining sugars, and among the three pure sugars. I can get the first two comparisons using the contrasts() function: contrasts(sugars$treatment) - matrix(c(-4, 1, 1, 1, 1, 0, -1, 3, -1, -1), 5, 2) summary(lm(length ~ treatment, sugars)) The results appear to be the same, excepting that the book calculates an F value, whereas R produces an equivalent t value. However, I can't figure out how to calculate a contrast among three groups. For those without access to Sokal and Rohlf, I've written an R function that performs the analysis they use, copied below. Is there a better way to do this in R? Also, I don't know how to interpret the Estimate and Std. Error columns from the summary of the lm with contrasts. It's clear the intercept in this case is the grand mean, but what are the other values? They appear to be the difference between one of the contrast groups and the grand mean -- wouldn't an estimate of the differences between the contrasted groups be more appropriate, or am I (likely) misinterpreting this? Thanks! Tyler MyContrast - function(Var, Group, G1, G2, G3=NULL) { ## Var == data vector, Group == factor ## G1, G2, G3 == character vectors of factor levels to contrast nG1 = sum(Group %in% G1) nG2 = sum(Group %in% G2) GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) G1Mean = mean(Var[Group %in% G1]) G2Mean = mean(Var[Group %in% G2]) if(is.null(G3)) MScontr = nG1 * ((G1Mean - GrandMean)^2) + nG2 * ((G2Mean - GrandMean)^2) else { nG3 = sum(Group %in% G3) G3Mean = mean(Var[Group %in% G3]) MScontr = (nG1 * ((G1Mean - GrandMean)^2) + nG2 * ((G2Mean - GrandMean)^2) + nG3 * ((G3Mean - GrandMean)^2))/2 } An - anova(lm(Var ~ Group)) MSwithin = An[3]['Residuals',] DegF = An$Df[length(An$Df)] Fval = MScontr / MSwithin Pval = 1 - pf(Fval, 1, DegF) return (list(MS_contrasts = MScontr, MS_within = MSwithin, F_value = Fval, P_value = Pval)) } ## The first two contrasts produce the same (+/- rounding error) ## p-values as obtained using contrasts() MyContrast(sugars$length, sugars$treatment, 'control', c(fructose, gluc+fruct, glucose, sucrose)) MyContrast(sugars$length, sugars$treatment, 'gluc+fruct', c(fructose, glucose, sucrose)) ## How do you do this in standard R? MyContrast(sugars$length, sugars$treatment, fructose, glucose, sucrose) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova planned comparisons/contrasts
Tyler For balanced data like this you might find aov() gives an output which is more comparable to Sokal and Rohlf (which I don't have): trtCont - C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5, 2)) sugarsAov - aov(length ~ trtCont, sugars) summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs others'=2))) Df Sum Sq Mean Sq F valuePr(F) trtCont 4 1077.32 269.33 49.3680 6.737e-16 *** trtCont: control vs rest 1 832.32 832.32 152.5637 4.680e-16 *** trtCont: gf vs others 1 48.13 48.13 8.8228 0.004759 ** Residuals 45 245.505.46 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 model.tables(sugarsAov, type='mean', se=T) Tables of means Grand mean 61.94 trtCont trtCont control fructose gluc+fructglucosesucrose 70.1 58.2 58.0 59.3 64.1 Standard errors for differences of means trtCont 1.045 replic. 10 Since the two specified contrasts are orthogonal, the difference among the remaining three sugars can be obtained by substraction: sugarsSum - summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs others'=2))) # Sum Sq sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]) 196.8667 # Mean Sq (sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2 98.4 # F value ((sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2) / sugarsSum[[1]][4,3] 18.04277 # Pr(F) 1-pf(((sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2) / sugarsSum[[1]][4,3], 2, 45) 1.762198e-06 HTH Peter Alspach [mailto:[EMAIL PROTECTED] On Behalf Of Tyler Smith Subject: [R] anova planned comparisons/contrasts Hi, I'm trying to figure out how anova works in R by translating the examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a snag with planned comparisons, their box 9.4 and section 9.6. It's a basic anova design: treatment - factor(rep(c(control, glucose, fructose, gluc+fruct, sucrose), each = 10)) length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68, 57, 58, 60, 59, 62, 60, 60, 57, 59, 61, 58, 61, 56, 58, 57, 56, 61, 60, 57, 58, 58, 59, 58, 61, 57, 56, 58, 57, 57, 59, 62, 66, 65, 63, 64, 62, 65, 65, 62, 67) sugars - data.frame(treatment, length) The basic analysis is easy enough: anova(lm(length ~ treatment, sugars)) SR proceed to calculate planned comparisons between control and all other groups, between gluc+fruct and the remaining sugars, and among the three pure sugars. I can get the first two comparisons using the contrasts() function: contrasts(sugars$treatment) - matrix(c(-4, 1, 1, 1, 1, 0, -1, 3, -1, -1), 5, 2) summary(lm(length ~ treatment, sugars)) The results appear to be the same, excepting that the book calculates an F value, whereas R produces an equivalent t value. However, I can't figure out how to calculate a contrast among three groups. For those without access to Sokal and Rohlf, I've written an R function that performs the analysis they use, copied below. Is there a better way to do this in R? Also, I don't know how to interpret the Estimate and Std. Error columns from the summary of the lm with contrasts. It's clear the intercept in this case is the grand mean, but what are the other values? They appear to be the difference between one of the contrast groups and the grand mean -- wouldn't an estimate of the differences between the contrasted groups be more appropriate, or am I (likely) misinterpreting this? Thanks! Tyler [Code deleted] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.