Re: [R] barplot problem

2007-11-20 Thread Tyler Smith
On 2007-11-20, jim holtman <[EMAIL PROTECTED]> wrote:
> Does something like this work for you?  You can vary the mgp parameter
> for placement of the label.
>

You can also use mtext to place the axis label separately:

barplot(rev(modelledprofile),horiz=TRUE,xlim=c(0,1),col="cornflowerblue",names.arg=rev(depthnames),las=1)
mtext("depth", side = 2, line = 3)

The line option sets the distance from the axis - higher numbers will
give you more space. See ?mtext for details.

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] anova planned comparisons/contrasts

2007-11-22 Thread Tyler Smith
Hi,

I'm trying to figure out how anova works in R by translating the
examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
snag with planned comparisons, their box 9.4 and section 9.6. It's a
basic anova design:

treatment <- factor(rep(c("control", "glucose", "fructose", 
  "gluc+fruct", "sucrose"), each = 10))

length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
62, 66, 65, 63, 64, 62, 65, 65, 62, 67)

sugars <- data.frame(treatment, length)

The basic analysis is easy enough:

anova(lm(length ~ treatment, sugars))

S&R proceed to calculate planned comparisons between control and all
other groups, between gluc+fruct and the remaining sugars, and among
the three pure sugars. I can get the first two comparisons using the
contrasts() function:

contrasts(sugars$treatment) <- matrix(c(-4, 1, 1,  1,  1,
0, -1, 3, -1, -1), 5, 2) 

summary(lm(length ~ treatment, sugars))

The results appear to be the same, excepting that the book calculates
an F value, whereas R produces an equivalent t value. However, I can't
figure out how to calculate a contrast among three groups. For those
without access to Sokal and Rohlf, I've written an R function that
performs the analysis they use, copied below. Is there a better way to
do this in R?

Also, I don't know how to interpret the Estimate and Std. Error
columns from the summary of the lm with contrasts. It's clear the
intercept in this case is the grand mean, but what are the other
values? They appear to be the difference between one of the contrast
groups and the grand mean -- wouldn't an estimate of the differences
between the contrasted groups be more appropriate, or am I (likely)
misinterpreting this?

Thanks!

Tyler

MyContrast <- function(Var, Group, G1, G2, G3=NULL) {
  ## Var == data vector, Group == factor
  ## G1, G2, G3 == character vectors of factor levels to contrast
  
  nG1 = sum(Group %in% G1)
  nG2 = sum(Group %in% G2)
  GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) 
  G1Mean = mean(Var[Group %in% G1])
  G2Mean = mean(Var[Group %in% G2])

  if(is.null(G3))
MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
  nG2 * ((G2Mean - GrandMean)^2)
  else {
  nG3 = sum(Group %in% G3)
  G3Mean = mean(Var[Group %in% G3])
  MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
 nG2 * ((G2Mean - GrandMean)^2) + 
 nG3 * ((G3Mean - GrandMean)^2))/2
}
  
  An <- anova(lm(Var ~ Group))
  MSwithin = An[3]['Residuals',]
  DegF = An$Df[length(An$Df)]
  Fval = MScontr / MSwithin
  Pval = 1 - pf(Fval, 1, DegF)
  
  return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
   F_value = Fval, P_value = Pval))
}

## The first two contrasts produce the same (+/- rounding error)
## p-values as obtained using contrasts()
MyContrast(sugars$length, sugars$treatment, 'control', 
  c("fructose", "gluc+fruct", "glucose",
  "sucrose"))
MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
  c("fructose", "glucose", "sucrose"))

## How do you do this in standard R?
MyContrast(sugars$length, sugars$treatment, "fructose", "glucose",
  "sucrose")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova planned comparisons/contrasts

2007-11-23 Thread Tyler Smith

On 2007-11-22, Peter Alspach <[EMAIL PROTECTED]> wrote:
> Tyler 
>
> For balanced data like this you might find aov() gives an output which
> is more comparable to Sokal and Rohlf (which I don't have):
>
>> trtCont <- C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5,
> 2))
>> sugarsAov <- aov(length ~ trtCont, sugars)
>> summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs
> others'=2)))

>> model.tables(sugarsAov, type='mean', se=T)

Thank you Peter, that's a big help! To confirm that I understand
correctly, aov is identical to lm, but provides better summary
information for balanced anova designs. As such, it is preferred to lm
for balanced anova designs, but should be avoided otherwise. Is that
correct?

Also, it appears that C and contrasts serve pretty much the same
purpose. Is there a context in which one is preferable to the other? 

Cheers,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple comparisons/tukey kramer

2007-11-23 Thread Tyler Smith

Thank you for your response. I think you have misunderstood what I'm
asking, though.

On 2007-11-23, Emmanuel Charpentier <[EMAIL PROTECTED]> wrote:
>
> - Tukey HSD will enable you to test the p(p-1)/2 pair differences one
> can create with p groups ;
> - Dunnett's procedure is made to compare (p-1) "treatments" to a common
> control ;
> - Scheffé's procedure is applicable to *any* ("reasonable") set of
> contrasts you can form ;
> - Newman-Keuls : aims to create separate subset of groups (but has
> serious conceptual and technical flaws ! Don't do that nunless you know
> what you're doing...).
> - etc ...
>
> You'll have to refer to the subject matter to make a choice.

Of course. I also have to know which function in R corresponds to which
test, which is my main question.

> Google ("multiple comparisons") will offer you some dubious and quite a
> few good references...

I have indeed found many dubious and a few good references to multiple
comparisons, both from google and r-site-search. Many posts in the
archive, including one made today in response to another question of
mine, point to glht as the appropriate function to use in R.

However, I don't know what exactly glht does, and the help file is
extremely terse. It offers the following options (in contrMat()):

 contrMat(n, type=c("Dunnett", "Tukey", "Sequen", "AVE", 
"Changepoint", "Williams", "Marcus", 
"McDermott"), base = 1)

The only reference to the source of these tests is:

 Frank Bretz, Alan Genz and Ludwig A. Hothorn (2001), On the
 numerical  availability of multiple comparison procedures.
 _Biometrical Journal_, *43*(5), 645-656.

This is a very technical paper, which as far as I can follow, is
primarily a discussion of the numerical methods involved in
calculating these contrasts, rather than the contrasts themselves. I
can't decide which one is appropriate without knowing what the
differences are. Dunnett seems pretty straightforward. Tukey, I think,
may refer to what is referred to as the Tukey-Kramer test in other
sources? Are any of them related to Scheffe? I have no idea. None of
them are related to Newman-Keuls, as several archive messages make
very clear that this is not a valid comparison to use, so R doesn't
implement it.

What I need is a reference to the tests implemented in glht, so I can
decide which one is appropriate for my data. Sequen, Changepoint et
al. may be common terms in some fields, but not in the references I'm
working from.

Thanks,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple comparisons/tukey kramer

2007-11-23 Thread Tyler Smith
Hi,

I'm trying to make sense of the options for multiple comparisons
options in R. I've found the following options:

pairwise.t.test, which provides standard t-tests, with options for
choosing an appropriate correction for multiple comparisons

TukeyHSD, which provides the usual Tukey test

glht(package multcomp), which provides a variety of options

>From the help list, it appears that glht is the preferred approach.
However, I don't understand what the options are. ?glht refers to a
very technical paper on the numerical computations involved, and I
couldn't find a description corresponding to the McDermott or AVE
options. I did notice that the Tukey option provides the same result
as TukeyHSD for balanced data. Is this the same as Tukey-Kramer?

As I understand it, there is no universal consensus as to which test
is best. TukeyHSD appears to be appropriate for balanced designs. I
have an unbalanced design to analyze. I can use glht, but can someone
tell me what each option is actually calculating? A reference to a
paper that describes the procedures would be great, but I'm afraid I
the one offered in ?glht[1] is beyond me.

Thanks,

Tyler

[1]  Frank Bretz, Alan Genz and Ludwig A. Hothorn (2001), On the
 numerical  availability of multiple comparison procedures.
 _Biometrical Journal_, *43*(5), 645-656.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova planned comparisons/contrasts

2007-11-24 Thread Tyler Smith
On 2007-11-22, Dieter Menne <[EMAIL PROTECTED]> wrote:
> Tyler Smith  mail.mcgill.ca> writes:
>
>> I'm trying to figure out how anova works in R by translating the
>> examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
>> snag with planned comparisons, their box 9.4 and section 9.6. It's a
>> basic anova design:
>> 
>
>  how to do contrast
>
> It's easier to use function estimable in package gmodels, or glht in package
> multcomp.
>

Isn't glht calculating unplanned comparisons, as opposed to the
planned comparisons with contrasts() and summary(..., split= ...)? Can
you do planned comparisons with glht, or unplanned comparisons with
summary()? 

contrasts(warpbreaks$tension) <- matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3)
amod <- aov(formula = breaks ~ tension, data = warpbreaks)

## this isn't right:
summary(amod, split = list(tension = list('L vs M' =1, 'L vs H'=2, 
  'M vs H' = 3)))

## posthoc contrasts (three ways to do the same test, I think):
summary(glht(amod, linfct = mcp(tension = 'Tukey')))
summary(glht(amod, linfct = mcp(tension = 
matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3
TukeyHSD(amod)

Thanks,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple comparisons/tukey kramer

2007-11-26 Thread Tyler Smith
On 2007-11-23, hadley wickham <[EMAIL PROTECTED]> wrote:
>>
>> What I need is a reference to the tests implemented in glht, so I can
>> decide which one is appropriate for my data. Sequen, Changepoint et
>> al. may be common terms in some fields, but not in the references I'm
>> working from.
>
> Have you read the vignette:
> http://cran.r-project.org/doc/vignettes/multcomp/multcomp.pdf
> ?

Thanks,

I have, and I've now reread it, as well as looking at the examples in
contrMat. I now understand that Tukey, Dunnett, Sequen et al. are
'convenience' terms, specifying all pair-wise, treatments vs control
etc., contrasts. I worked through the numerical example in Dunnett's
original paper, and also found a worked example of Tukey-Kramer
contrasts, and reproduced the results using 

glht(LM, linfct = mcp(fac = 'Dunnett'))
glht(LM, linfct = mcp(fac = 'Tukey'))

I'm still finding the language very difficult. I was taught these
methods using SS/MS language. Linear models were mentioned in passing,
but the focus was on using (and hopefully understanding) the formulas
for filling in the cells in an anova table. Has the linear model focus
now replaced the 'traditional' approach I learned? 

It's been a very frustrating week or two as I try and come to terms
with material I thought I already knew, so what I'm wondering is if
I'm doomed to repeat this process with every variation on Anova
designs I want to analyze in R unless I take the time to study linear
models. Today's task, for example, is figuring out how to convert
Sokal and Rohlf's two-level nested model II anova into R code, and
it's not going well.

Thanks,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Anova(car) SS digits

2007-11-29 Thread Tyler Smith
Hi,

When I use Anova(car) to produce type III SS, 'Sum Sq' is reported in
integers:

> Anova(bot.lm3, type ="III")
Anova Table (Type III tests)

Response: bottemp
 Sum Sq  DfF value  Pr(>F)
(Intercept)   45295   1 29436.4440 < 2e-16
fungroup  3   2 0.8259 0.44006
numsp.fun11   2 3.4503 0.03460
block   419   468.0017 < 2e-16
fungroup:numsp.fun6   4 1.0317 0.39340
fungroup:block9   8 0.7429 0.65346
numsp.fun:block  12   8 0.9501 0.47795
fungroup:numsp.fun:block 27  16 1.0920 0.36882
Residuals   205 133   

I have tried specifying digits = 7 in the call to Anova, but that
doesn't change anything. The output from another stats program for the
same data confirms that the ouptut above is correct, but is rounded to
the nearest integer. The dataset is too large to post here, but I can
confirm that I get the expected results from the examples in ?Anova,
as well as for type = 'II' on my own data. This output, and relevant
options() and sessionInfo are pasted below. What am I doing wrong?

Thanks,

Tyler

>  mod <- lm(conformity ~ fcategory*partner.status, data=Moore, 
+contrasts=list(fcategory=contr.sum, partner.status=contr.sum))
>  Anova(mod)
Anova Table (Type II tests)

Response: conformity
 Sum Sq Df F value   Pr(>F)
fcategory 11.61  2  0.2770 0.759564
partner.status   212.21  1 10.1207 0.002874
fcategory:partner.status 175.49  2  4.1846 0.022572
Residuals817.76 39 
>  Anova(mod, type="III")
Anova Table (Type III tests)

Response: conformity
 Sum Sq Df  F valuePr(>F)
(Intercept)  5752.8  1 274.3592 < 2.2e-16
fcategory  36.0  2   0.8589  0.431492
partner.status239.6  1  11.4250  0.001657
fcategory:partner.status  175.5  2   4.1846  0.022572
Residuals 817.8 39   

> Anova(bot.lm3, type ="II")
Anova Table (Type II tests)

Response: bottemp
 Sum Sq  Df F value  Pr(>F)
fungroup   1.87   2  0.6069 0.54656
numsp.fun 11.14   2  3.6214 0.02942
block486.47   4 79.0379 < 2e-16
fungroup:numsp.fun 6.32   4  1.0275 0.39553
fungroup:block11.43   8  0.9283 0.49542
numsp.fun:block   11.29   8  0.9169 0.50473
fungroup:numsp.fun:block  26.89  16  1.0920 0.36882
Residuals204.65 133

> options('digits')
$digits
[1] 7

> options('contrasts')
$contrasts
[1] "contr.sum"  "contr.poly"

> sessionInfo()
R version 2.5.1 (2007-06-27) 
i486-pc-linux-gnu 

locale:

LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;
LC_MONETARY=en_CA.UTF-8;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;
LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"  
[7] "base" 

other attached packages:
car 
"1.2-7"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova(car) SS digits

2007-11-29 Thread Tyler Smith
On 2007-11-29, John Fox <[EMAIL PROTECTED]> wrote:
>
> Anova() creates an object of class "anova" which then gets printed by the
> print method for "anova" objects. The reason that the sums of squares for
> the type II tests are rounded like this is because of the large value for
> the intercept. Although Anova() doesn't take a digits argument,
> print.anova() does, and so, e.g., you could use the command
>
> print(Anova(bot.lm3, type ="III"), digits=7)
>
> I hope this helps,
>  John

Very much. Thanks!

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply to loop

2007-12-20 Thread Tyler Smith
On 2007-12-21, Louis Martin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am running the following loop, but it takes hours to run as n is big. Is 
> there any way "apply" can be used? Thanks.
> ### Start
> nclass <- dim(data)[[2]] - 1
> z <- matrix(0, ncol = nclass, nrow = nclass)
> n <- dim(data)[[1]]
> x <- c(1:nclass)
> # loop starts
> for(loop in 1:n) { # looping over rows in data
> r <- data[loop, 1:nclass] # vector of length(nclass)
> classified <- x[r == max(r)] # index of rows == max(r)
>
> truth <- data[loop, nclass + 1] # next column, single value
> z[classified, truth] <- z[classified, truth] + 1 # increment
> the values of 
> }
> # loop ends
>

Off the top, data is a bad choice for your dataframe, as it conflicts
with a standard function. Also, including some actual data would make
this easier to work with. I think you're using dim(data)[[1]] to get
the number of rows of data? That can be more clearly expressed as
nrow(data), and dim(data)[[2]] == ncol(data).

Anyways, this might be helpful:

add.mat <- apply(data[,1:nclass], MAR = 1, FUN = function(x) 
 ifelse(x == max(x), 1, 0))

for(i in 1:n)
z[ , data[i, ncol(data)]] <- z[ , data[i, ncol(data)]] + add.mat[,i]

There's still a loop, but it might not be needed depending on what
values 'truth' holds. Most of the calculations are shifted into the
apply() call, so the one line loop should run at least a little faster
than what you started with.

HTH,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R appearance under linux

2007-12-21 Thread Tyler Smith
On 2007-12-21, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
>  Dear R users,
>
> I have just moved to R2.6.1 under Opensuse linux 10.3. I used to
> work with R under XPpro. Is it "normal" to have a visual aspect of R
> under linux different ? 

Yes, that's normal. Under windows, you get a GUI interface with menus
and separate windows etc. by default. In Linux you get to choose for
yourself how you interact with R. One of the better options is Emacs
with ESS. Details can be found at http://ESS.R-project.org/ including
rpms for suse. Other options are presented here:
http://www.sciviews.org/_rgui/

There will be a bit of a learning curve here, but eventually you'll
find you can do everything that you could do with the Windows GUI, and
at least with Emacs and ESS, a whole lot more.

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ecological Detective worked solutions [R-wiki]

2008-01-05 Thread Tyler Smith
Hi,

I've added several pages of worked solutions for the book Ecological
Detective by Hilborn and Mangel to the R-wiki. My hope is that this
will be of use to others working through this book without access to a
local expert. I am certainly not an expert, local or otherwise. 

I have posted solutions for chapters 3-6, which includes some really
horrible direct translations of the pseudocode from the book, and some
less-horrible R versions. I make no claims as to the quality of my
code, other than it appears to provide correct solutions. Anyone
interested in contributing alternative solutions, correcting my
solutions, adding solutions for missing problems or later chapters, or
adding further notes is most welcome to do so.

Thanks,

Tyler

http://wiki.r-project.org/rwiki/doku.php?id=guides:tutorials:ecological_detective

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discriminant function analysis

2008-02-06 Thread Tyler Smith
On 2008-02-06, Birgit Lemcke <[EMAIL PROTECTED]> wrote:
>
> I am using R 2.6.1 on a PowerBook G4.
> I would like to perform a discriminant function analysis. I found lda  
> in MASS but as far as I understood, is it only working with  
> explanatory variables of the class factor. 

I think you are mistaken. If you reread the help for lda() you'll see
that the grouping has to be a factor, but the explanatory variables
should be numeric.

> My dataset contains variables of the classes factor and numeric. Is
> there another function that is able to handle this?

The numeric variables are fine. The factor variables may have to be
recoded into dummy binary variables, I'm not sure if lda() will deal
with them properly otherwise.

HTH,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discriminant function analysis

2008-02-07 Thread Tyler Smith
On 2008-02-07, Birgit Lemcke <[EMAIL PROTECTED]> wrote:
>
> Am 06.02.2008 um 21:00 schrieb Tyler Smith:
>>
>>> My dataset contains variables of the classes factor and numeric. Is
>>> there another function that is able to handle this?
>>
>> The numeric variables are fine. The factor variables may have to be
>> recoded into dummy binary variables, I'm not sure if lda() will deal
>> with them properly otherwise.
>
> But aren´t binary variables also factors? Or is there another  
> variable class than factor or numeric?
> Do I have have to set the classe of the binaries as numeric?
>

There is no binary class in R, so you would have to use a numeric
field. For example:

| sample | factor_1 |
|+--|
| A  | red  |
| B  | green|
| C  | blue |

becomes:

| sample | dummy_1 | dummy_2 |
|+-+-|
| A  |   1 |   0 |
| B  |   0 |   1 |
| C  |   0 |   0 |

R can deal with dummy_1 and dummy_2 as numeric vectors. The details
should be explained in a good reference on multivariate statistics
(I'm looking at Legendre and Legendre (1998) section 1.5.7 and 11.5).

HTH,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] constrained clustering

2008-03-04 Thread Tyler Smith
Having searched CRAN and the mailing list archives, I'm pretty sure
there is no function in R that implements constrained clustering.
Before I start writing my own, have I overlooked something? 

Thanks,

Tyler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.