Re: [R] How do anova() and Anova(type="III") handle incomplete designs?
Thanks for your response, John. That was helpful. I was using Type III from Anova() as a comparison to some results I had obtained JMP, which I've lost access to and have moved on to R, and I was confused by the error. Given that I do have a continuous covariate, the analyses are not likely comparable, considering your response. I am still confused about interpretation of interactions within an anova() with an incomplete design, as mine is. Is the interaction term still informative? - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Sat, Jun 16, 2012 at 9:20 PM, John Fox wrote: > Dear Justin, > > anova() and Anova() are entirely different functions; the former is part > of the standard R distribution and the second part of the car package. By > default, Anova() produces an error for type-III tests conducted on > rank-deficient models because the hypotheses tested aren't generally > sensible. > > From ?Anova: > > "singular.ok > defaults to TRUE for type-II tests, and FALSE for type-III tests (where > the tests for models with aliased coefficients will not be > straightforwardly interpretable); if FALSE, a model with aliased > coefficients produces an error." > > and > > "The designations "type-II" and "type-III" are borrowed from SAS, but the > definitions used here do not correspond precisely to those employed by SAS. > Type-II tests are calculated according to the principle of marginality, > testing each term after all others, except ignoring the term's higher-order > relatives; so-called type-III tests violate marginality, testing each term > in the model after all of the others. This definition of Type-II tests > corresponds to the tests produced by SAS for analysis-of-variance models, > where all of the predictors are factors, but not more generally (i.e., when > there are quantitative predictors). Be very careful in formulating the > model for type-III tests, or the hypotheses tested will not make sense." > > I hope this helps, > John > > > John Fox > Sen. William McMaster Prof. of Social Statistics > Department of Sociology > McMaster University > Hamilton, Ontario, Canada > http://socserv.mcmaster.ca/jfox/ > > > On Fri, 15 Jun 2012 15:01:27 -0400 > Justin Montemarano wrote: > > Hello all: > > > > I am confused about the output from a lm() model with an incomplete > > design/missing level. > > > > I have two categorical predictors and a continuous covariate (day) that > > I am using to model larval mass (l.mass): > > > > leaf.species has three levels - map, syc, and oak > > > > cond.time has two levels - 30 and 150. > > > > There are no response values for Map-150, so that entire, two-way, level > > is missing. > > > > When running anova() on the model with Type I SS, the full factorial > > design does not return errors; however, using package:car Anova() and > > Type III SS, I receive an singularity error unless I used the argument > > 'singular.ok = T' (it is defaulted to F). > > > > So, why don't I receive an error with anova() when I do with Anova(type > > = "III")? How do anova() and Anova() handle incomplete designs, and how > > can interactions of variables with missing levels be interpreted? > > > > I realize these are fairly broad questions, but any insight would be > > helpful. Thanks, all. > > > > Below is code to illustrate my question(s): > > > > > lmMass <- lm(log(l.mass) ~ day*leaf.species + cond.time, data = > > growth.data) #lm() without cond.time interactions > > > lmMassInt <- lm(log(l.mass) ~ day*leaf.species*cond.time, data = > > growth.data) #lm() with cond.time interactions > > > anova(lmMass); anova(lmMassInt) #ANOVA summary of both models > > with Type I SS > > Analysis of Variance Table > > > > Response: log(l.mass) > >Df Sum Sq Mean Sq F valuePr(>F) > > day1 51.373 51.373 75.7451 2.073e-15 > > leaf.species 2 0.340 0.170 0.25060.7786 > > cond.time 1 0.161 0.161 0.23690.6271 > > day:leaf.species 2 1.296 0.648 0.95510.3867 > > Residuals179 121.404 0.678 > > Analysis of Variance Table > > > > Response: log(l.mass) > > Df Sum Sq Mean Sq F value Pr(>F) > > day 1 51.373 51.373 76.5651 1.693e-15 > > leaf
[R] How do anova() and Anova(type="III") handle incomplete designs?
Hello all: I am confused about the output from a lm() model with an incomplete design/missing level. I have two categorical predictors and a continuous covariate (day) that I am using to model larval mass (l.mass): leaf.species has three levels - map, syc, and oak cond.time has two levels - 30 and 150. There are no response values for Map-150, so that entire, two-way, level is missing. When running anova() on the model with Type I SS, the full factorial design does not return errors; however, using package:car Anova() and Type III SS, I receive an singularity error unless I used the argument 'singular.ok = T' (it is defaulted to F). So, why don't I receive an error with anova() when I do with Anova(type = "III")? How do anova() and Anova() handle incomplete designs, and how can interactions of variables with missing levels be interpreted? I realize these are fairly broad questions, but any insight would be helpful. Thanks, all. Below is code to illustrate my question(s): > lmMass <- lm(log(l.mass) ~ day*leaf.species + cond.time, data = growth.data) #lm() without cond.time interactions > lmMassInt <- lm(log(l.mass) ~ day*leaf.species*cond.time, data = growth.data) #lm() with cond.time interactions > anova(lmMass); anova(lmMassInt) #ANOVA summary of both models with Type I SS Analysis of Variance Table Response: log(l.mass) Df Sum Sq Mean Sq F valuePr(>F) day1 51.373 51.373 75.7451 2.073e-15 leaf.species 2 0.340 0.170 0.25060.7786 cond.time 1 0.161 0.161 0.23690.6271 day:leaf.species 2 1.296 0.648 0.95510.3867 Residuals179 121.404 0.678 Analysis of Variance Table Response: log(l.mass) Df Sum Sq Mean Sq F value Pr(>F) day 1 51.373 51.373 76.5651 1.693e-15 leaf.species 2 0.340 0.170 0.2533 0.77654 cond.time1 0.161 0.161 0.2394 0.62523 day:leaf.species 2 1.296 0.648 0.9655 0.38281 day:cond.time1 0.080 0.080 0.1198 0.72965 leaf.species:cond.time 1 1.318 1.318 1.9642 0.16282 day:leaf.species:cond.time 1 1.915 1.915 2.8539 0.09293 Residuals 176 118.091 0.671 > Anova(lmMass, type = 'III'); Anova(lmMassInt, type = 'III') #ANOVA summary of both models with Type III SS Anova Table (Type III tests) Response: log(l.mass) Sum Sq Df F value Pr(>F) (Intercept) 39.789 1 58.6653 1.13e-12 day3.278 1 4.8336 0.02919 leaf.species 0.934 2 0.6888 0.50352 cond.time 0.168 1 0.2472 0.61968 day:leaf.species 1.296 2 0.9551 0.38672 Residuals121.404 179 Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) : there are aliased coefficients in the model > Anova(lmMassInt, type = 'III', singular.ok = T) #Given the error in Anova() above, set singular.ok = T Anova Table (Type III tests) Response: log(l.mass) Sum Sq Df F value Pr(>F) (Intercept) 39.789 1 59.3004 9.402e-13 day 3.278 1 4.8860 0.02837 leaf.species 1.356 2 1.0103 0.36623 cond.time0.124 1 0.1843 0.66822 day:leaf.species 2.783 2 2.0738 0.12877 day:cond.time0.805 1 1.1994 0.27493 leaf.species:cond.time 0.568 1 0.8462 0.35888 day:leaf.species:cond.time 1.915 1 2.8539 0.09293 Residuals 118.091 176 > - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com <http://www.montegraphia.com/> -- Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Violation of sample independence in Pearson's product-moment correlation
Hi all: There was a concern raised by reviewers of a manuscript of mine over the proper execution of a Pearson's correlation. In brief, this was undertaken in order to determine the relationship between the extent of wheel running (y axis) and ethanol intake (x axis) across three, separate 10 day periods in 7 animals. In the paper, the correlational plots for each 10 day-period had 70 data points: One point for each day and each animal across 10 days of experimentation. The reviewers, however, appropriately pointed out that this is a violation of the assumption of sample independence for Pearson's test, and I should have had only 7 points, which would reflect the means of my two variables for each individual animal across 10 days. Is this appropriate or is there a means of accounting for repeated sampling with a correlation test? - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Hi all: I've got it... it appears that total.density was also defined in two separate data frames (se.predict.data and dc.predict.data) with levels order 16, 32, 8. Using relevel(), I moved 8 to the first position and it's solved the plotting problem. Ista's 'minimal' reproducible code request prompted me to discover my error; thanks all. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Wed, Mar 21, 2012 at 12:42 PM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: > You'll also want to use dput() to send us an exact encoding of your > data when making that reproducible example: there might be something > subtle at play here that print methods won't show. > > Michael > > On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn wrote: > > On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano > wrote: > >> Ista: > >> > >> Your attached code did work for me; moreover, the facets were presented > in > >> the desired order with facet_wrap() and facet_grid(), which is what I'm > >> using because I have a second factor used in facet_grid(). > >> > >> Still, my plots with total.density as a facet are coming out in 16, 32, > 8, > >> and I'm not seeing why. Below is my plot code - > >> > >>> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = > >>> per.remain)) + facet_grid(total.density ~ prop.ec) + > >>> #add point and error bar data > >>> theme_set(theme_bw()) + > >>> geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = > >>> per.remain + se), width = 3) + > >>> #add predicted model data > >>> geom_line(data = se.predict.data[se.predict.data$plant.sp == > 'EC',], > >>> aes(x = x.values, y = predicted.values), colour = c('red')) + > >>> geom_line(data = dc.predict.data[dc.predict.data$plant.sp == > 'EC',], > >>> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = > >>> c('dashed')) + > >>> > >>> xlab('Day') + ylab('Percent Mass Remaining') + > opts(panel.grid.major = > >>> theme_blank(), panel.grid.minor = theme_blank()) > >> > >> Is there anything odd about it that might be producing the odd ordering > >> problem? FYI, avoiding subsetting ag.tab doesn't do the trick. > > > > I don't know. Please create a minimal example that isolates the > > problem. You can start with > > > > levels(ag.tab$total.density) > > > > ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = > per.remain)) + > >facet_grid(total.density ~ prop.ec) + > >geom_point() > > > > Best, > > Ista > > > >> - > >> Justin Montemarano > >> Graduate Student > >> Kent State University - Biological Sciences > >> > >> http://www.montegraphia.com > >> > >> > >> On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: > >>> > >>> Hi Justin, > >>> > >>> this gives the correct order (8, 16, 32) on my machine: > >>> > >>> total.density <- > >>> > >>> > c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) > >>> total.density <- factor(total.density, levels=c(8, 16, 32), > ordered=TRUE) > >>> str(total.density) > >>> > >>> order(levels(total.density)) > >>> > >>> dat <- data.frame(td = total.density, v1 = > rnorm(1:length(total.density))) > >>> > >>> ggplot(dat, aes(x = v1)) + > >>> geom_density() + > >>> facet_wrap(~td) > >>> > >>> Does it work for you? If yes, then you need to tell us what you're > >>> doing that is different from this example. If no, please give use the > >>> output of sessionInfo(). > >>> > >>> best, > >>> Ista > >>> > >>> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano < > jmont...@kent.edu> > >>> wrote: > >>> > I think I understand, but I believe my original interest is in the > order > >>> > of > >>> > levels(total.density), since ggplot appears to be using that to order > >>> &
Re: [R] Unable to specify order of a factor
Ista: Your attached code did work for me; moreover, the facets were presented in the desired order with facet_wrap() and facet_grid(), which is what I'm using because I have a second factor used in facet_grid(). Still, my plots with total.density as a facet are coming out in 16, 32, 8, and I'm not seeing why. Below is my plot code - ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) > + facet_grid(total.density ~ prop.ec) + > #add point and error bar data > theme_set(theme_bw()) + > geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = > per.remain + se), width = 3) + > #add predicted model data > geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',], > aes(x = x.values, y = predicted.values), colour = c('red')) + > geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',], > aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = > c('dashed')) + > > xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major = > theme_blank(), panel.grid.minor = theme_blank()) Is there anything odd about it that might be producing the odd ordering problem? FYI, avoiding subsetting ag.tab doesn't do the trick. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: > Hi Justin, > > this gives the correct order (8, 16, 32) on my machine: > > total.density <- > > c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) > total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) > str(total.density) > > order(levels(total.density)) > > dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density))) > > ggplot(dat, aes(x = v1)) + > geom_density() + > facet_wrap(~td) > > Does it work for you? If yes, then you need to tell us what you're > doing that is different from this example. If no, please give use the > output of sessionInfo(). > > best, > Ista > > On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano > wrote: > > I think I understand, but I believe my original interest is in the order > of > > levels(total.density), since ggplot appears to be using that to order the > > facets. Thus, I'm still getting three graphs, ordered (and displayed as) > > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry > if > > I wasn't clear and/or I've missed your message. > > - > > Justin Montemarano > > Graduate Student > > Kent State University - Biological Sciences > > > > http://www.montegraphia.com > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
I think I understand, but I believe my original interest is in the order of levels(total.density), since ggplot appears to be using that to order the facets. Thus, I'm still getting three graphs, ordered (and displayed as) 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry if I wasn't clear and/or I've missed your message. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Actually I've try that too, Sarah The test is to run order(levels(total.density)), which I need to be 1 2 3, not 2 3 1, and your solution still gives me 2 3 1. I also don't know how to reply to this thread with the previous message below... - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to specify order of a factor
Hi all: I'm attempting to create a faceted plot with ggplot2 and I'm having issues with a factor's order that is used to define the facet_grid(). The factor (named total.density) has three levels - 8, 16, and 32 - and I would like them presented in that order. Running order(levels(total.density)) yields the incorrect order of the facet grid - 2 3 1, corresponding with 16, 32, and 8. I have attempted correcting the order with the following solutions (of course, not run at once): #total.density <- relevel(total.density, '8') #total.density <- as.numeric(levels(total.density)[total.density]) #total.density <- factor(total.density, levels = c('8','16','32')) #total.density <- factor(total.density, levels = levels(total.density)[c(3,1,2)]) #library(gregmisc) #total.density <- reorder.factor(total.density, c('8', '16', '32'), order = T) The data are as follows: total.density <- c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) I'm running R 2.14.2 with all packages up-to-date as of 21.3.2012. Any help would be greatly appreciated. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find the first values in vector
Use which() vec_out <- which(vec == T) - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing treatment effects on exponential decay models
Hello all: I would like to test whether there are treatment effects on decomposition rate, and I would like to inquire about the best, most appropriate means using R. I have plant decomposition data that is generally considered to follow an exponential decay model as follows: Wt = Wi * exp(-k * t) Where Wt and Wi are the weights of the plant material at time t and 0, respectively. k is a constant describing decomposition rate and is usually reported in the literature. Is it possible to fit a non-linear model as above and test for effects of two, independent, treatment factors + the interaction? I have used nls() to fit the above model for each treatment and approximate k, but I would like to examine treatment effects on these fits as I might using glm(). If I can get passed this hurdle, I may also inquire about adding a random effect as well. If it helps, I can post a subset of data. Thanks for any help. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] coxme() random effect syntax
Hello: I would like to run a Cox proportional hazards regression on crayfish dislodgement at different water velocities by crayfish size class and substrate (rock) type. Additionally, there is a covariate variable, rock movement that may be influencing crayfish dislodgment. So... I have crayfish size class (CFSZCL) and substrate type (SUBSZ) as fixed factors influencing the dislodgment of crayfish at different water velocities (SLIPVEL). Thus, I am currently using the call: coxph(Surv(SLIPVEL, FREEDOM) ~ CFSZCL + SUBSZ + CFSZCL:SUBSZ, data = data.table) However, I would like to add rock movement (BEDLOAD) as a random co-variate, which is not possible with the coxph() function. Is it possible to do so with coxme() in the kinship package? If so, what is the proper syntax? Thanks for any help. Justin Montemarano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] coxph() command design and data setup
Hello all: I'm attempting to run a Cox proportional hazards function on survival data from insects and I have a few questions. My current command that I'm using to call the model is as follows (using coxph() from the survival library): coxph(Surv(day, censor) ~ treatment + room + chamber %in% treatment, data = data.table) Day indicates which day a particular observation occurred, and censor indicates the type of observation; that is, mortality or emergence. My first question is: how do I code my censor variable? I am interested in treatment effects on mortality, and the mortality observations are currently coded as a 1 in the censor variable. Emergence is another mechanism by which the insects in my study may leave the system, similar to a patient opting out in the middle of a clinical study, and are coded as a 0. I also have a nested and block variable that I'm unsure of how to deal with. Chamber is nested within treatment, where each chamber has a maximum of 4 observations or insects, all within 1 treatment. The block variable is room, where each room contains ~40 chambers each of which has been randomly assigned a treatment. I believe that I have chamber %in% treatment right in representing my data, but how do I account for room as a block? Additionally, I have a frequency column in data.table that indicates how many of a particular observation occurred in each row. Is it appropriate to assign this column as a weigh argument in cosph? If so, how is that done? I can send sample data if necessary. Thanks for any help! Justin Montemarano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Post-hoc tests for Cox proportional hazards
Hello all: I'm having some difficulty interpreting the results of a Cox proportional hazards analysis that I recently performed. I am running the fit with a fixed categorical variable (Treatment) and a nested categorical variable (Chamber) within Treatment. The likelihood ratio tests of both of these effects have a low p (<0.001), but I am unable to discern which specific treatments (or chambers) are statistically different from one another. I was wondering if there are any aposteriori or post-hoc tests that are performed in relation to proportional hazards analysis and can help pinpoint which treatments are responsible for these results. My experience with proportional hazards tests is minimal, so if I need to provide more details, please let me know. Thanks for any help. Justin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.