Re: [R] anova planned comparisons/contrasts

2007-11-26 Thread Greg Snow
Others have shown some approaches that work well for after you fit the
model.  Here is another approach starting with the model fit itself:

tmp - c(control, glucose, fructose, 
  gluc+fruct, sucrose)

treatment - factor(rep( tmp, each=10 ), levels=tmp)

length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
62, 66, 65, 63, 64, 62, 65, 65, 62, 67)



cm1 - rbind( int=1/5, cntrl = c(4, -1,-1,-1,-1)/4,
plus=c( 0, -1, -1, 3, -1 )/3,
sug1=c( 0, -1, 1, 0, 0),
sug2=c( 0, -1, -1, 0, 2)/2 )

cm2 - zapsmall(solve(cm1))

contrasts(treatment) - cm2[,-1]

treatment

sugars - data.frame( length=length, treatment=treatment )

fit - aov( length ~ treatment, data=sugars )
summary.lm(fit)
summary(fit)
summary(fit, split=list(treatment=list(
  'Control vs. Sugars'=1, 'Gluc+Fruc vs. sugars'=2, 'Sugars'=3:4)))


Is this more what you were looking for?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tyler Smith
 Sent: Thursday, November 22, 2007 12:38 PM
 To: [EMAIL PROTECTED]
 Subject: [R] anova planned comparisons/contrasts
 
 Hi,
 
 I'm trying to figure out how anova works in R by translating 
 the examples in Sokal And Rohlf's (1995 3rd edition) 
 Biometry. I've hit a snag with planned comparisons, their box 
 9.4 and section 9.6. It's a basic anova design:
 
 treatment - factor(rep(c(control, glucose, fructose, 
 gluc+fruct, sucrose), each = 10))
 
 length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
 57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
 58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
 58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
 62, 66, 65, 63, 64, 62, 65, 65, 62, 67)
 
 sugars - data.frame(treatment, length)
 
 The basic analysis is easy enough:
 
 anova(lm(length ~ treatment, sugars))
 
 SR proceed to calculate planned comparisons between control 
 and all other groups, between gluc+fruct and the remaining 
 sugars, and among the three pure sugars. I can get the first 
 two comparisons using the
 contrasts() function:
 
 contrasts(sugars$treatment) - matrix(c(-4, 1, 1,  1,  1,
 0, -1, 3, -1, -1), 5, 2) 
 
 summary(lm(length ~ treatment, sugars))
 
 The results appear to be the same, excepting that the book 
 calculates an F value, whereas R produces an equivalent t 
 value. However, I can't figure out how to calculate a 
 contrast among three groups. For those without access to 
 Sokal and Rohlf, I've written an R function that performs the 
 analysis they use, copied below. Is there a better way to do 
 this in R?
 
 Also, I don't know how to interpret the Estimate and Std. 
 Error columns from the summary of the lm with contrasts. It's 
 clear the intercept in this case is the grand mean, but what 
 are the other values? They appear to be the difference 
 between one of the contrast groups and the grand mean -- 
 wouldn't an estimate of the differences between the 
 contrasted groups be more appropriate, or am I (likely) 
 misinterpreting this?
 
 Thanks!
 
 Tyler
 
 MyContrast - function(Var, Group, G1, G2, G3=NULL) {
   ## Var == data vector, Group == factor
   ## G1, G2, G3 == character vectors of factor levels to contrast
   
   nG1 = sum(Group %in% G1)
   nG2 = sum(Group %in% G2)
   GrandMean = mean(Var[Group %in% c(G1, G2, G3)])
   G1Mean = mean(Var[Group %in% G1])
   G2Mean = mean(Var[Group %in% G2])
 
   if(is.null(G3))
 MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
 nG2 * ((G2Mean - GrandMean)^2)
   else {
   nG3 = sum(Group %in% G3)
   G3Mean = mean(Var[Group %in% G3])
   MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
nG2 * ((G2Mean - GrandMean)^2) + 
nG3 * ((G3Mean - GrandMean)^2))/2
 }
   
   An - anova(lm(Var ~ Group))
   MSwithin = An[3]['Residuals',]
   DegF = An$Df[length(An$Df)]
   Fval = MScontr / MSwithin
   Pval = 1 - pf(Fval, 1, DegF)
   
   return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
  F_value = Fval, P_value = Pval)) }
 
 ## The first two contrasts produce the same (+/- rounding 
 error) ## p-values as obtained using contrasts() 
 MyContrast(sugars$length, sugars$treatment, 'control', 
 c(fructose, gluc+fruct, glucose,
 sucrose))
 MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
 c(fructose, glucose, sucrose))
 
 ## How do you do this in standard R?
 MyContrast(sugars$length, sugars$treatment, fructose, glucose,
 sucrose)
 
 __
 R-help@r-project.org mailing list
 https

Re: [R] anova planned comparisons/contrasts

2007-11-24 Thread Tyler Smith
On 2007-11-22, Dieter Menne [EMAIL PROTECTED] wrote:
 Tyler Smith tyler.smith at mail.mcgill.ca writes:

 I'm trying to figure out how anova works in R by translating the
 examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
 snag with planned comparisons, their box 9.4 and section 9.6. It's a
 basic anova design:
 

  how to do contrast

 It's easier to use function estimable in package gmodels, or glht in package
 multcomp.


Isn't glht calculating unplanned comparisons, as opposed to the
planned comparisons with contrasts() and summary(..., split= ...)? Can
you do planned comparisons with glht, or unplanned comparisons with
summary()? 

contrasts(warpbreaks$tension) - matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3)
amod - aov(formula = breaks ~ tension, data = warpbreaks)

## this isn't right:
summary(amod, split = list(tension = list('L vs M' =1, 'L vs H'=2, 
  'M vs H' = 3)))

## posthoc contrasts (three ways to do the same test, I think):
summary(glht(amod, linfct = mcp(tension = 'Tukey')))
summary(glht(amod, linfct = mcp(tension = 
matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3
TukeyHSD(amod)

Thanks,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova planned comparisons/contrasts

2007-11-23 Thread Tyler Smith

On 2007-11-22, Peter Alspach [EMAIL PROTECTED] wrote:
 Tyler 

 For balanced data like this you might find aov() gives an output which
 is more comparable to Sokal and Rohlf (which I don't have):

 trtCont - C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5,
 2))
 sugarsAov - aov(length ~ trtCont, sugars)
 summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs
 others'=2)))

 model.tables(sugarsAov, type='mean', se=T)

Thank you Peter, that's a big help! To confirm that I understand
correctly, aov is identical to lm, but provides better summary
information for balanced anova designs. As such, it is preferred to lm
for balanced anova designs, but should be avoided otherwise. Is that
correct?

Also, it appears that C and contrasts serve pretty much the same
purpose. Is there a context in which one is preferable to the other? 

Cheers,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova planned comparisons/contrasts

2007-11-23 Thread Dieter Menne
Tyler Smith tyler.smith at mail.mcgill.ca writes:

 I'm trying to figure out how anova works in R by translating the
 examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
 snag with planned comparisons, their box 9.4 and section 9.6. It's a
 basic anova design:
 

 how to do contrast

It's easier to use function estimable in package gmodels, or glht in package
multcomp.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] anova planned comparisons/contrasts

2007-11-22 Thread Tyler Smith
Hi,

I'm trying to figure out how anova works in R by translating the
examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
snag with planned comparisons, their box 9.4 and section 9.6. It's a
basic anova design:

treatment - factor(rep(c(control, glucose, fructose, 
  gluc+fruct, sucrose), each = 10))

length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
62, 66, 65, 63, 64, 62, 65, 65, 62, 67)

sugars - data.frame(treatment, length)

The basic analysis is easy enough:

anova(lm(length ~ treatment, sugars))

SR proceed to calculate planned comparisons between control and all
other groups, between gluc+fruct and the remaining sugars, and among
the three pure sugars. I can get the first two comparisons using the
contrasts() function:

contrasts(sugars$treatment) - matrix(c(-4, 1, 1,  1,  1,
0, -1, 3, -1, -1), 5, 2) 

summary(lm(length ~ treatment, sugars))

The results appear to be the same, excepting that the book calculates
an F value, whereas R produces an equivalent t value. However, I can't
figure out how to calculate a contrast among three groups. For those
without access to Sokal and Rohlf, I've written an R function that
performs the analysis they use, copied below. Is there a better way to
do this in R?

Also, I don't know how to interpret the Estimate and Std. Error
columns from the summary of the lm with contrasts. It's clear the
intercept in this case is the grand mean, but what are the other
values? They appear to be the difference between one of the contrast
groups and the grand mean -- wouldn't an estimate of the differences
between the contrasted groups be more appropriate, or am I (likely)
misinterpreting this?

Thanks!

Tyler

MyContrast - function(Var, Group, G1, G2, G3=NULL) {
  ## Var == data vector, Group == factor
  ## G1, G2, G3 == character vectors of factor levels to contrast
  
  nG1 = sum(Group %in% G1)
  nG2 = sum(Group %in% G2)
  GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) 
  G1Mean = mean(Var[Group %in% G1])
  G2Mean = mean(Var[Group %in% G2])

  if(is.null(G3))
MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
  nG2 * ((G2Mean - GrandMean)^2)
  else {
  nG3 = sum(Group %in% G3)
  G3Mean = mean(Var[Group %in% G3])
  MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
 nG2 * ((G2Mean - GrandMean)^2) + 
 nG3 * ((G3Mean - GrandMean)^2))/2
}
  
  An - anova(lm(Var ~ Group))
  MSwithin = An[3]['Residuals',]
  DegF = An$Df[length(An$Df)]
  Fval = MScontr / MSwithin
  Pval = 1 - pf(Fval, 1, DegF)
  
  return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
   F_value = Fval, P_value = Pval))
}

## The first two contrasts produce the same (+/- rounding error)
## p-values as obtained using contrasts()
MyContrast(sugars$length, sugars$treatment, 'control', 
  c(fructose, gluc+fruct, glucose,
  sucrose))
MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
  c(fructose, glucose, sucrose))

## How do you do this in standard R?
MyContrast(sugars$length, sugars$treatment, fructose, glucose,
  sucrose)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova planned comparisons/contrasts

2007-11-22 Thread Peter Alspach
Tyler 

For balanced data like this you might find aov() gives an output which
is more comparable to Sokal and Rohlf (which I don't have):

 trtCont - C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5,
2))
 sugarsAov - aov(length ~ trtCont, sugars)
 summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs
others'=2)))
   Df  Sum Sq Mean Sq  F valuePr(F)
trtCont 4 1077.32  269.33  49.3680 6.737e-16 ***
  trtCont: control vs rest  1  832.32  832.32 152.5637 4.680e-16 ***
  trtCont: gf vs others 1   48.13   48.13   8.8228  0.004759 ** 
Residuals  45  245.505.46   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 model.tables(sugarsAov, type='mean', se=T)
Tables of means
Grand mean
  
61.94 

 trtCont 
trtCont
   control   fructose gluc+fructglucosesucrose 
  70.1   58.2   58.0   59.3   64.1 

Standard errors for differences of means
trtCont
  1.045
replic.  10

Since the two specified contrasts are orthogonal, the difference among
the remaining three sugars can be obtained by substraction:

 sugarsSum - summary(sugarsAov,
   split=list(trtCont=list('control vs rest'=1, 'gf
vs others'=2)))
# Sum Sq
 sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2])
 
196.8667 
# Mean Sq
 (sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2
 
98.4 
# F value
 ((sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2) /
sugarsSum[[1]][4,3]
 
18.04277 
# Pr(F)
 1-pf(((sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2) /
sugarsSum[[1]][4,3], 2, 45)
 
1.762198e-06 

HTH 

Peter Alspach

 [mailto:[EMAIL PROTECTED] On Behalf Of Tyler Smith
 Subject: [R] anova planned comparisons/contrasts
 
 Hi,
 
 I'm trying to figure out how anova works in R by translating 
 the examples in Sokal And Rohlf's (1995 3rd edition) 
 Biometry. I've hit a snag with planned comparisons, their box 
 9.4 and section 9.6. It's a basic anova design:
 
 treatment - factor(rep(c(control, glucose, fructose, 
 gluc+fruct, sucrose), each = 10))
 
 length - c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
 57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
 58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
 58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
 62, 66, 65, 63, 64, 62, 65, 65, 62, 67)
 
 sugars - data.frame(treatment, length)
 
 The basic analysis is easy enough:
 
 anova(lm(length ~ treatment, sugars))
 
 SR proceed to calculate planned comparisons between control 
 and all other groups, between gluc+fruct and the remaining 
 sugars, and among the three pure sugars. I can get the first 
 two comparisons using the
 contrasts() function:
 
 contrasts(sugars$treatment) - matrix(c(-4, 1, 1,  1,  1,
 0, -1, 3, -1, -1), 5, 2) 
 
 summary(lm(length ~ treatment, sugars))
 
 The results appear to be the same, excepting that the book 
 calculates an F value, whereas R produces an equivalent t 
 value. However, I can't figure out how to calculate a 
 contrast among three groups. For those without access to 
 Sokal and Rohlf, I've written an R function that performs the 
 analysis they use, copied below. Is there a better way to do 
 this in R?
 
 Also, I don't know how to interpret the Estimate and Std. 
 Error columns from the summary of the lm with contrasts. It's 
 clear the intercept in this case is the grand mean, but what 
 are the other values? They appear to be the difference 
 between one of the contrast groups and the grand mean -- 
 wouldn't an estimate of the differences between the 
 contrasted groups be more appropriate, or am I (likely) 
 misinterpreting this?
 
 Thanks!
 
 Tyler
[Code deleted]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.