[R] Strange behavior using subset
Dear R Gurus, Let's get the technical details out of the way first: Computer: 1.83 GHz MacBook R version 2.5.1 I have a data set that contains the following variables: site, species, total.vines. I need to partition the main data set by site, the further select only those species that occurred at each site. When I select by site (site.name-subset(data frame,Site==Site name), the resulting data frame is normal, containing all the records for that particular site. When I then further select by species, (site.name1-subset(site.name, Species=c(species 1, species 2, species 4, species 7, species 8))), I get an error message: Warning messages: 1: longer object length is not a multiple of shorter object length in: is.na(e1) | is.na(e2) 2: longer object length is not a multiple of shorter object length in: `==.default` (Species, c(ACNE, ACSA2, JUNI, PLOC, ULAM)) If I then only select for two species instead of five, the error messages disappear HOWEVER, the data will be cut in half, so the new data frame only contains 13 records of species 1 (instead of 26 as in the original) and 12 records of species 2 (instead of the original 24). This is the first time I'm experiencing this problem, as I have used subset on this data several times in the past month. Any ideas on where I'm going wrong? Thanks for your help. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 http://www.wright.edu/academics/envsci/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message when using zero-inflated count regression model in package zicounts
No, zeroinfl() in pscl did not give me any errors when I ran the model. So there may be a bug in zicounts. Only problem now is how to interpret the zeroinfl model. Am I correct in my understanding that zeroinfl runs both the poisson and binomial models without interactions? I'm thinking along those lines since no interaction terms appear in either model. If so, how do I check for any interactions? Thanks. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 On Aug 16, 2007, at 11:19 AM, Achim Zeileis wrote: On Thu, 16 Aug 2007, James R. Milks wrote: Dr. Stevens, I've double-checked my variable lengths. All of my variables (Total.vines, Site, Species, and DBH) came in at 549. I did correct one problem in the data entry that had escaped my previous notice: somehow the undergrad who entered all the data managed to make the Acer negundo data split into two separate categories while still appearing to use the same ACNE abbreviation. When I made that correction and re-ran zicounts, R gave me the following error messages: Hmm, I don't know about the error messages in zicounts, but you could try to use the zeroinfl() implementation in package pscl: vines.zip - zeroinfl(Total.vines ~ Site + Species + DBH | Site + Species + DBH, data = sycamores.1) and see whether this produces a similar error. Z vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not match the length of object [549] In addition: Warning messages: 1: longer object length is not a multiple of shorter object length in: eta + offset 2: longer object length is not a multiple of shorter object length in: y/mu In addition, zicounts would not run a normal poisson regression on the data, giving me the same error messages as the ZIP regression. Doing a poisson regression with glm did not show any error messages. However, the glm model with full interactions was still over- dispersed. Could the zicounts problem be that the individual sites and species had different population sizes? For instance, Site A had 149 trees, site B had 55 trees, site C had 270 trees, and site D had 75 trees. The species had similar discrepancies in population sizes, with Platanus occidentalis and Acer negundo forming the majority of the trees. Thanks for your help. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error message when using zero-inflated count regression model in package zicounts
I have data on number of vines per tree for ~550 trees. Over half of the trees did not have any vines and the data is fairly skewed (median = 0, mean = 1.158, 3rd qu. = 1.000). I am attempting to investigate whether plot location (four sites), species (I'm using only the four most common species), or tree dbh has a significant influence on the number of vines per tree. When I attempted to use the zicounts function, R gave me the following error message: vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distrname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim- : dims [product 12] do not match the length of object [549] In addition: Warning messages: 1: longer object length is not a multiple of shorter object length in: x[good, ] * w 2: longer object length is not a multiple of shorter object length in: eta + offset 3: longer object length is not a multiple of shorter object length in: y/mu I do not know enough about the calculations done in the function to interpret the error messages. Is there a glitch in my data and if yes, what is it? Thanks for your help. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] choosing between Poisson regression models: no interactions vs. interactions
R gurus, I'm working on data analysis for a small project. My response variable is total vines per tree (median = 0, mean = 1.65, min = 0, max = 24). My predictors are two categorical variables (four sites and four species) and one continuous (tree diameter at breast height (DBH)). The main question I'm attempting to answer is whether or not the species identity of a tree has any effects on the number of vines clinging to the trunk. Given that the response variable is count data, I decided to use Poisson regression, even though I'm not as familiar with it as linear or logit regression. My problem is deciding which model to use. I have created several, one without interaction terms (Total.vines~Site+Species+DBH), one with an interaction term between Site and Species (Total.vines~Site*Species+DBH), and one with interactions between all variables (Total.vines~Site*Species*DBH). Here is my output from R for the first two models (the last model has the same number (and identity) of significant variables as the second model, even though the last model had more interaction terms overall): % Call: glm(formula = Total.vines ~ Site + Species + DBH, family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -5.2067 -1.2915 -0.7095 -0.3525 6.3756 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -2.987695 0.231428 -12.910 2e-16 *** SiteHuffman Dam 2.725193 0.249423 10.926 2e-16 *** SiteNarrows 1.902987 0.227599 8.361 2e-16 *** SiteSugar Creek 1.752754 0.242186 7.237 4.58e-13 *** SpeciesFRAM 0.955468 0.157423 6.069 1.28e-09 *** SpeciesPLOC 1.187903 0.141707 8.383 2e-16 *** SpeciesULAM 0.340792 0.184615 1.846 0.0649 . DBH 0.020708 0.001292 16.026 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1972.3 on 544 degrees of freedom Residual deviance: 1290.0 on 537 degrees of freedom AIC: 1796.0 Number of Fisher Scoring iterations: 6 Call: glm(formula = Total.vines ~ Site * Species + DBH, family = poisson, data = sycamores.1) Deviance Residuals: Min 1Q Median 3Q Max -4.9815 -1.2370 -0.6339 -0.3403 6.5664 Coefficients: (3 not defined because of singularities) Estimate Std. Error z value Pr(|z|) (Intercept) -2.788243 0.303064 -9.200 2e-16 *** SiteHuffman Dam 1.838952 0.354127 5.193 2.07e-07 *** SiteNarrows 2.252716 0.323184 6.970 3.16e-12 *** SiteSugar Creek -12.961519 519.152077 -0.025 0.980082 SpeciesFRAM 13.938716 519.152230 0.027 0.978580 SpeciesPLOC 0.240223 0.540676 0.444 0.656824 SpeciesULAM 1.919586 0.540246 3.553 0.000381 *** DBH 0.019984 0.001337 14.946 2e-16 *** SiteHuffman Dam:SpeciesFRAM -11.513823 519.152294 -0.022 0.982306 SiteNarrows:SpeciesFRAM -13.593127 519.152268 -0.026 0.979111 SiteSugar Creek:SpeciesFRAM NA NA NA NA SiteHuffman Dam:SpeciesPLOC NA NA NA NA SiteNarrows:SpeciesPLOC 0.397503 0.555218 0.716 0.474028 SiteSugar Creek:SpeciesPLOC 15.640450 519.152277 0.030 0.975966 SiteHuffman Dam:SpeciesULAM -0.102841 0.610027 -0.169 0.866124 SiteNarrows:SpeciesULAM -2.809092 0.606804 -4.629 3.67e-06 *** SiteSugar Creek:SpeciesULAM NA NA NA NA --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1972.3 on 544 degrees of freedom Residual deviance: 1178.7 on 531 degrees of freedom AIC: 1696.6 Number of Fisher Scoring iterations: 13 As you can see, the two models give very different output, especially in regards to whether or not the individual species are significant. In the no-interaction model, the only species that was not significant was ULAM. In the one-way interaction model, ULAM was the only significant species. My question is this: which model should I use when I present this analysis? I know that the one-way interaction model has the lower AIC. Should I base my choice solely on AIC? The reasons I'm asking is that the second model has only one significant interaction term, fewer significant terms overall, and three undefined terms. Thanks for any guidance you can give to someone running his first Poisson regression. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 [[alternative HTML version deleted]] __
[R] Generating artificial datasets with a specific correlation coefficient.
I need to create artificial datasets with specific correlation coefficients (i.e. a dataset that returns r = 0.30, etc.) as examples for a lab I am teaching this summer. Is there a way to do that in R? Thanks. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.