Re: [R] barplot problem
On 2007-11-20, jim holtman <[EMAIL PROTECTED]> wrote: > Does something like this work for you? You can vary the mgp parameter > for placement of the label. > You can also use mtext to place the axis label separately: barplot(rev(modelledprofile),horiz=TRUE,xlim=c(0,1),col="cornflowerblue",names.arg=rev(depthnames),las=1) mtext("depth", side = 2, line = 3) The line option sets the distance from the axis - higher numbers will give you more space. See ?mtext for details. Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] anova planned comparisons/contrasts
Hi, I'm trying to figure out how anova works in R by translating the examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a snag with planned comparisons, their box 9.4 and section 9.6. It's a basic anova design: treatment <- factor(rep(c("control", "glucose", "fructose", "gluc+fruct", "sucrose"), each = 10)) length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68, 57, 58, 60, 59, 62, 60, 60, 57, 59, 61, 58, 61, 56, 58, 57, 56, 61, 60, 57, 58, 58, 59, 58, 61, 57, 56, 58, 57, 57, 59, 62, 66, 65, 63, 64, 62, 65, 65, 62, 67) sugars <- data.frame(treatment, length) The basic analysis is easy enough: anova(lm(length ~ treatment, sugars)) S&R proceed to calculate planned comparisons between control and all other groups, between gluc+fruct and the remaining sugars, and among the three pure sugars. I can get the first two comparisons using the contrasts() function: contrasts(sugars$treatment) <- matrix(c(-4, 1, 1, 1, 1, 0, -1, 3, -1, -1), 5, 2) summary(lm(length ~ treatment, sugars)) The results appear to be the same, excepting that the book calculates an F value, whereas R produces an equivalent t value. However, I can't figure out how to calculate a contrast among three groups. For those without access to Sokal and Rohlf, I've written an R function that performs the analysis they use, copied below. Is there a better way to do this in R? Also, I don't know how to interpret the Estimate and Std. Error columns from the summary of the lm with contrasts. It's clear the intercept in this case is the grand mean, but what are the other values? They appear to be the difference between one of the contrast groups and the grand mean -- wouldn't an estimate of the differences between the contrasted groups be more appropriate, or am I (likely) misinterpreting this? Thanks! Tyler MyContrast <- function(Var, Group, G1, G2, G3=NULL) { ## Var == data vector, Group == factor ## G1, G2, G3 == character vectors of factor levels to contrast nG1 = sum(Group %in% G1) nG2 = sum(Group %in% G2) GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) G1Mean = mean(Var[Group %in% G1]) G2Mean = mean(Var[Group %in% G2]) if(is.null(G3)) MScontr = nG1 * ((G1Mean - GrandMean)^2) + nG2 * ((G2Mean - GrandMean)^2) else { nG3 = sum(Group %in% G3) G3Mean = mean(Var[Group %in% G3]) MScontr = (nG1 * ((G1Mean - GrandMean)^2) + nG2 * ((G2Mean - GrandMean)^2) + nG3 * ((G3Mean - GrandMean)^2))/2 } An <- anova(lm(Var ~ Group)) MSwithin = An[3]['Residuals',] DegF = An$Df[length(An$Df)] Fval = MScontr / MSwithin Pval = 1 - pf(Fval, 1, DegF) return (list(MS_contrasts = MScontr, MS_within = MSwithin, F_value = Fval, P_value = Pval)) } ## The first two contrasts produce the same (+/- rounding error) ## p-values as obtained using contrasts() MyContrast(sugars$length, sugars$treatment, 'control', c("fructose", "gluc+fruct", "glucose", "sucrose")) MyContrast(sugars$length, sugars$treatment, 'gluc+fruct', c("fructose", "glucose", "sucrose")) ## How do you do this in standard R? MyContrast(sugars$length, sugars$treatment, "fructose", "glucose", "sucrose") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova planned comparisons/contrasts
On 2007-11-22, Peter Alspach <[EMAIL PROTECTED]> wrote: > Tyler > > For balanced data like this you might find aov() gives an output which > is more comparable to Sokal and Rohlf (which I don't have): > >> trtCont <- C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5, > 2)) >> sugarsAov <- aov(length ~ trtCont, sugars) >> summary(sugarsAov, split=list(trtCont=list('control vs rest'=1, 'gf vs > others'=2))) >> model.tables(sugarsAov, type='mean', se=T) Thank you Peter, that's a big help! To confirm that I understand correctly, aov is identical to lm, but provides better summary information for balanced anova designs. As such, it is preferred to lm for balanced anova designs, but should be avoided otherwise. Is that correct? Also, it appears that C and contrasts serve pretty much the same purpose. Is there a context in which one is preferable to the other? Cheers, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple comparisons/tukey kramer
Thank you for your response. I think you have misunderstood what I'm asking, though. On 2007-11-23, Emmanuel Charpentier <[EMAIL PROTECTED]> wrote: > > - Tukey HSD will enable you to test the p(p-1)/2 pair differences one > can create with p groups ; > - Dunnett's procedure is made to compare (p-1) "treatments" to a common > control ; > - Scheffé's procedure is applicable to *any* ("reasonable") set of > contrasts you can form ; > - Newman-Keuls : aims to create separate subset of groups (but has > serious conceptual and technical flaws ! Don't do that nunless you know > what you're doing...). > - etc ... > > You'll have to refer to the subject matter to make a choice. Of course. I also have to know which function in R corresponds to which test, which is my main question. > Google ("multiple comparisons") will offer you some dubious and quite a > few good references... I have indeed found many dubious and a few good references to multiple comparisons, both from google and r-site-search. Many posts in the archive, including one made today in response to another question of mine, point to glht as the appropriate function to use in R. However, I don't know what exactly glht does, and the help file is extremely terse. It offers the following options (in contrMat()): contrMat(n, type=c("Dunnett", "Tukey", "Sequen", "AVE", "Changepoint", "Williams", "Marcus", "McDermott"), base = 1) The only reference to the source of these tests is: Frank Bretz, Alan Genz and Ludwig A. Hothorn (2001), On the numerical availability of multiple comparison procedures. _Biometrical Journal_, *43*(5), 645-656. This is a very technical paper, which as far as I can follow, is primarily a discussion of the numerical methods involved in calculating these contrasts, rather than the contrasts themselves. I can't decide which one is appropriate without knowing what the differences are. Dunnett seems pretty straightforward. Tukey, I think, may refer to what is referred to as the Tukey-Kramer test in other sources? Are any of them related to Scheffe? I have no idea. None of them are related to Newman-Keuls, as several archive messages make very clear that this is not a valid comparison to use, so R doesn't implement it. What I need is a reference to the tests implemented in glht, so I can decide which one is appropriate for my data. Sequen, Changepoint et al. may be common terms in some fields, but not in the references I'm working from. Thanks, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple comparisons/tukey kramer
Hi, I'm trying to make sense of the options for multiple comparisons options in R. I've found the following options: pairwise.t.test, which provides standard t-tests, with options for choosing an appropriate correction for multiple comparisons TukeyHSD, which provides the usual Tukey test glht(package multcomp), which provides a variety of options >From the help list, it appears that glht is the preferred approach. However, I don't understand what the options are. ?glht refers to a very technical paper on the numerical computations involved, and I couldn't find a description corresponding to the McDermott or AVE options. I did notice that the Tukey option provides the same result as TukeyHSD for balanced data. Is this the same as Tukey-Kramer? As I understand it, there is no universal consensus as to which test is best. TukeyHSD appears to be appropriate for balanced designs. I have an unbalanced design to analyze. I can use glht, but can someone tell me what each option is actually calculating? A reference to a paper that describes the procedures would be great, but I'm afraid I the one offered in ?glht[1] is beyond me. Thanks, Tyler [1] Frank Bretz, Alan Genz and Ludwig A. Hothorn (2001), On the numerical availability of multiple comparison procedures. _Biometrical Journal_, *43*(5), 645-656. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova planned comparisons/contrasts
On 2007-11-22, Dieter Menne <[EMAIL PROTECTED]> wrote: > Tyler Smith mail.mcgill.ca> writes: > >> I'm trying to figure out how anova works in R by translating the >> examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a >> snag with planned comparisons, their box 9.4 and section 9.6. It's a >> basic anova design: >> > > how to do contrast > > It's easier to use function estimable in package gmodels, or glht in package > multcomp. > Isn't glht calculating unplanned comparisons, as opposed to the planned comparisons with contrasts() and summary(..., split= ...)? Can you do planned comparisons with glht, or unplanned comparisons with summary()? contrasts(warpbreaks$tension) <- matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3) amod <- aov(formula = breaks ~ tension, data = warpbreaks) ## this isn't right: summary(amod, split = list(tension = list('L vs M' =1, 'L vs H'=2, 'M vs H' = 3))) ## posthoc contrasts (three ways to do the same test, I think): summary(glht(amod, linfct = mcp(tension = 'Tukey'))) summary(glht(amod, linfct = mcp(tension = matrix(c(-1,1,0,1,0,-1,0,-1,1), 3, 3 TukeyHSD(amod) Thanks, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple comparisons/tukey kramer
On 2007-11-23, hadley wickham <[EMAIL PROTECTED]> wrote: >> >> What I need is a reference to the tests implemented in glht, so I can >> decide which one is appropriate for my data. Sequen, Changepoint et >> al. may be common terms in some fields, but not in the references I'm >> working from. > > Have you read the vignette: > http://cran.r-project.org/doc/vignettes/multcomp/multcomp.pdf > ? Thanks, I have, and I've now reread it, as well as looking at the examples in contrMat. I now understand that Tukey, Dunnett, Sequen et al. are 'convenience' terms, specifying all pair-wise, treatments vs control etc., contrasts. I worked through the numerical example in Dunnett's original paper, and also found a worked example of Tukey-Kramer contrasts, and reproduced the results using glht(LM, linfct = mcp(fac = 'Dunnett')) glht(LM, linfct = mcp(fac = 'Tukey')) I'm still finding the language very difficult. I was taught these methods using SS/MS language. Linear models were mentioned in passing, but the focus was on using (and hopefully understanding) the formulas for filling in the cells in an anova table. Has the linear model focus now replaced the 'traditional' approach I learned? It's been a very frustrating week or two as I try and come to terms with material I thought I already knew, so what I'm wondering is if I'm doomed to repeat this process with every variation on Anova designs I want to analyze in R unless I take the time to study linear models. Today's task, for example, is figuring out how to convert Sokal and Rohlf's two-level nested model II anova into R code, and it's not going well. Thanks, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Anova(car) SS digits
Hi, When I use Anova(car) to produce type III SS, 'Sum Sq' is reported in integers: > Anova(bot.lm3, type ="III") Anova Table (Type III tests) Response: bottemp Sum Sq DfF value Pr(>F) (Intercept) 45295 1 29436.4440 < 2e-16 fungroup 3 2 0.8259 0.44006 numsp.fun11 2 3.4503 0.03460 block 419 468.0017 < 2e-16 fungroup:numsp.fun6 4 1.0317 0.39340 fungroup:block9 8 0.7429 0.65346 numsp.fun:block 12 8 0.9501 0.47795 fungroup:numsp.fun:block 27 16 1.0920 0.36882 Residuals 205 133 I have tried specifying digits = 7 in the call to Anova, but that doesn't change anything. The output from another stats program for the same data confirms that the ouptut above is correct, but is rounded to the nearest integer. The dataset is too large to post here, but I can confirm that I get the expected results from the examples in ?Anova, as well as for type = 'II' on my own data. This output, and relevant options() and sessionInfo are pasted below. What am I doing wrong? Thanks, Tyler > mod <- lm(conformity ~ fcategory*partner.status, data=Moore, +contrasts=list(fcategory=contr.sum, partner.status=contr.sum)) > Anova(mod) Anova Table (Type II tests) Response: conformity Sum Sq Df F value Pr(>F) fcategory 11.61 2 0.2770 0.759564 partner.status 212.21 1 10.1207 0.002874 fcategory:partner.status 175.49 2 4.1846 0.022572 Residuals817.76 39 > Anova(mod, type="III") Anova Table (Type III tests) Response: conformity Sum Sq Df F valuePr(>F) (Intercept) 5752.8 1 274.3592 < 2.2e-16 fcategory 36.0 2 0.8589 0.431492 partner.status239.6 1 11.4250 0.001657 fcategory:partner.status 175.5 2 4.1846 0.022572 Residuals 817.8 39 > Anova(bot.lm3, type ="II") Anova Table (Type II tests) Response: bottemp Sum Sq Df F value Pr(>F) fungroup 1.87 2 0.6069 0.54656 numsp.fun 11.14 2 3.6214 0.02942 block486.47 4 79.0379 < 2e-16 fungroup:numsp.fun 6.32 4 1.0275 0.39553 fungroup:block11.43 8 0.9283 0.49542 numsp.fun:block 11.29 8 0.9169 0.50473 fungroup:numsp.fun:block 26.89 16 1.0920 0.36882 Residuals204.65 133 > options('digits') $digits [1] 7 > options('contrasts') $contrasts [1] "contr.sum" "contr.poly" > sessionInfo() R version 2.5.1 (2007-06-27) i486-pc-linux-gnu locale: LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8; LC_MONETARY=en_CA.UTF-8;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: car "1.2-7" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anova(car) SS digits
On 2007-11-29, John Fox <[EMAIL PROTECTED]> wrote: > > Anova() creates an object of class "anova" which then gets printed by the > print method for "anova" objects. The reason that the sums of squares for > the type II tests are rounded like this is because of the large value for > the intercept. Although Anova() doesn't take a digits argument, > print.anova() does, and so, e.g., you could use the command > > print(Anova(bot.lm3, type ="III"), digits=7) > > I hope this helps, > John Very much. Thanks! Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using apply to loop
On 2007-12-21, Louis Martin <[EMAIL PROTECTED]> wrote: > Hi, > > I am running the following loop, but it takes hours to run as n is big. Is > there any way "apply" can be used? Thanks. > ### Start > nclass <- dim(data)[[2]] - 1 > z <- matrix(0, ncol = nclass, nrow = nclass) > n <- dim(data)[[1]] > x <- c(1:nclass) > # loop starts > for(loop in 1:n) { # looping over rows in data > r <- data[loop, 1:nclass] # vector of length(nclass) > classified <- x[r == max(r)] # index of rows == max(r) > > truth <- data[loop, nclass + 1] # next column, single value > z[classified, truth] <- z[classified, truth] + 1 # increment > the values of > } > # loop ends > Off the top, data is a bad choice for your dataframe, as it conflicts with a standard function. Also, including some actual data would make this easier to work with. I think you're using dim(data)[[1]] to get the number of rows of data? That can be more clearly expressed as nrow(data), and dim(data)[[2]] == ncol(data). Anyways, this might be helpful: add.mat <- apply(data[,1:nclass], MAR = 1, FUN = function(x) ifelse(x == max(x), 1, 0)) for(i in 1:n) z[ , data[i, ncol(data)]] <- z[ , data[i, ncol(data)]] + add.mat[,i] There's still a loop, but it might not be needed depending on what values 'truth' holds. Most of the calculations are shifted into the apply() call, so the one line loop should run at least a little faster than what you started with. HTH, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R appearance under linux
On 2007-12-21, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Dear R users, > > I have just moved to R2.6.1 under Opensuse linux 10.3. I used to > work with R under XPpro. Is it "normal" to have a visual aspect of R > under linux different ? Yes, that's normal. Under windows, you get a GUI interface with menus and separate windows etc. by default. In Linux you get to choose for yourself how you interact with R. One of the better options is Emacs with ESS. Details can be found at http://ESS.R-project.org/ including rpms for suse. Other options are presented here: http://www.sciviews.org/_rgui/ There will be a bit of a learning curve here, but eventually you'll find you can do everything that you could do with the Windows GUI, and at least with Emacs and ESS, a whole lot more. Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ecological Detective worked solutions [R-wiki]
Hi, I've added several pages of worked solutions for the book Ecological Detective by Hilborn and Mangel to the R-wiki. My hope is that this will be of use to others working through this book without access to a local expert. I am certainly not an expert, local or otherwise. I have posted solutions for chapters 3-6, which includes some really horrible direct translations of the pseudocode from the book, and some less-horrible R versions. I make no claims as to the quality of my code, other than it appears to provide correct solutions. Anyone interested in contributing alternative solutions, correcting my solutions, adding solutions for missing problems or later chapters, or adding further notes is most welcome to do so. Thanks, Tyler http://wiki.r-project.org/rwiki/doku.php?id=guides:tutorials:ecological_detective __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discriminant function analysis
On 2008-02-06, Birgit Lemcke <[EMAIL PROTECTED]> wrote: > > I am using R 2.6.1 on a PowerBook G4. > I would like to perform a discriminant function analysis. I found lda > in MASS but as far as I understood, is it only working with > explanatory variables of the class factor. I think you are mistaken. If you reread the help for lda() you'll see that the grouping has to be a factor, but the explanatory variables should be numeric. > My dataset contains variables of the classes factor and numeric. Is > there another function that is able to handle this? The numeric variables are fine. The factor variables may have to be recoded into dummy binary variables, I'm not sure if lda() will deal with them properly otherwise. HTH, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discriminant function analysis
On 2008-02-07, Birgit Lemcke <[EMAIL PROTECTED]> wrote: > > Am 06.02.2008 um 21:00 schrieb Tyler Smith: >> >>> My dataset contains variables of the classes factor and numeric. Is >>> there another function that is able to handle this? >> >> The numeric variables are fine. The factor variables may have to be >> recoded into dummy binary variables, I'm not sure if lda() will deal >> with them properly otherwise. > > But aren´t binary variables also factors? Or is there another > variable class than factor or numeric? > Do I have have to set the classe of the binaries as numeric? > There is no binary class in R, so you would have to use a numeric field. For example: | sample | factor_1 | |+--| | A | red | | B | green| | C | blue | becomes: | sample | dummy_1 | dummy_2 | |+-+-| | A | 1 | 0 | | B | 0 | 1 | | C | 0 | 0 | R can deal with dummy_1 and dummy_2 as numeric vectors. The details should be explained in a good reference on multivariate statistics (I'm looking at Legendre and Legendre (1998) section 1.5.7 and 11.5). HTH, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] constrained clustering
Having searched CRAN and the mailing list archives, I'm pretty sure there is no function in R that implements constrained clustering. Before I start writing my own, have I overlooked something? Thanks, Tyler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.