[R] Strange behavior using subset

2007-08-31 Thread James Milks
Dear R Gurus,

Let's get the technical details out of the way first:

Computer: 1.83 GHz MacBook
R version 2.5.1

I have a data set that contains the following variables: site,  
species, total.vines.  I need to partition the main data set by site,  
the further select only those species that occurred at each site.   
When I select by site (site.name-subset(data frame,Site==Site  
name), the resulting data frame is normal, containing all the  
records for that particular site.

When I then further select by species, (site.name1-subset(site.name,  
Species=c(species 1, species 2, species 4, species 7,  
species 8))), I get an error message:

 Warning messages:
 1: longer object length
   is not a multiple of shorter object length in: is.na(e1) | is.na(e2)
 2: longer object length
   is not a multiple of shorter object length in: `==.default` 
 (Species, c(ACNE, ACSA2, JUNI, PLOC, ULAM))

If I then only select for two species instead of five, the error  
messages disappear HOWEVER, the data will be cut in half, so the new  
data frame only contains 13 records of species 1 (instead of 26 as in  
the original) and 12 records of species 2 (instead of the original  
24).  This is the first time I'm experiencing this problem, as I have  
used subset on this data several times in the past month.  Any ideas  
on where I'm going wrong?

Thanks for your help.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435
http://www.wright.edu/academics/envsci/



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error message when using zero-inflated count regression model in package zicounts

2007-08-16 Thread James Milks
No, zeroinfl() in pscl did not give me any errors when I ran the  
model.  So there may be a bug in zicounts.  Only problem now is how  
to interpret the zeroinfl model.  Am I correct in my understanding  
that zeroinfl runs both the poisson and binomial models without  
interactions?  I'm thinking along those lines since no interaction  
terms appear in either model.  If so, how do I check for any  
interactions?

Thanks.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435


On Aug 16, 2007, at 11:19 AM, Achim Zeileis wrote:

 On Thu, 16 Aug 2007, James R. Milks wrote:

 Dr. Stevens,

 I've double-checked my variable lengths.  All of my variables
 (Total.vines, Site, Species, and DBH) came in at 549.  I did correct
 one problem in the data entry that had escaped my previous notice:
 somehow the undergrad who entered all the data managed to make the
 Acer negundo data split into two separate categories while still
 appearing to use the same ACNE abbreviation.  When I made that
 correction and re-ran zicounts, R gave me the following error  
 messages:

 Hmm, I don't know about the error messages in zicounts, but you  
 could try to use the zeroinfl() implementation in package pscl:
   vines.zip - zeroinfl(Total.vines ~ Site + Species + DBH | Site +
 Species + DBH, data = sycamores.1)
 and see whether this produces a similar error.
 Z

  vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site
 +Species+DBH,distname=ZIP,data=sycamores.1)

 Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not
 match the length of object [549]
 In addition: Warning messages:
 1: longer object length is not a multiple of shorter object length
 in: eta + offset
 2: longer object length is not a multiple of shorter object length
 in: y/mu

 In addition, zicounts would not run a normal poisson regression on
 the data, giving me the same error messages as the ZIP regression.
 Doing a poisson regression with glm did not show any error messages.
 However, the glm model with full interactions was still over- 
 dispersed.

 Could the zicounts problem be that the individual sites and species
 had different population sizes?  For instance, Site A had 149 trees,
 site B had 55 trees, site C had 270 trees, and site D had 75 trees.
 The species had similar discrepancies in population sizes, with
 Platanus occidentalis and Acer negundo forming the majority of the
 trees.

 Thanks for your help.

 Jim Milks

 Graduate Student
 Environmental Sciences Ph.D. Program
 136 Biological Sciences
 Wright State University
 3640 Colonel Glenn Hwy
 Dayton, OH 45435



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error message when using zero-inflated count regression model in package zicounts

2007-08-13 Thread James Milks
I have data on number of vines per tree for ~550 trees.  Over half of  
the trees did not have any vines and the data is fairly skewed  
(median = 0, mean = 1.158, 3rd qu. = 1.000).  I am attempting to  
investigate whether plot location (four sites), species (I'm using  
only the four most common species), or tree dbh has a significant  
influence on the number of vines per tree.  When I attempted to use  
the zicounts function, R gave me the following error message:

  vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site 
+Species+DBH,distrname=ZIP,data=sycamores.1)
Error in ifelse(y == 0, 1, y/mu) : dim- : dims [product 12] do not  
match the length of object [549]
In addition: Warning messages:
1: longer object length
is not a multiple of shorter object length in: x[good, ] * w
2: longer object length
is not a multiple of shorter object length in: eta + offset
3: longer object length
is not a multiple of shorter object length in: y/mu

I do not know enough about the calculations done in the function to  
interpret the error messages.  Is there a glitch in my data and if  
yes, what is it?

Thanks for your help.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] choosing between Poisson regression models: no interactions vs. interactions

2007-07-31 Thread James Milks
R gurus,

I'm working on data analysis for a small project.  My response  
variable is total vines per tree (median = 0, mean = 1.65, min = 0,  
max = 24).  My predictors are two categorical variables (four sites  
and four species) and one continuous (tree diameter at breast height  
(DBH)).  The main question I'm attempting to answer is whether or not  
the species identity of a tree has any effects on the number of vines  
clinging to the trunk.  Given that the response variable is count  
data, I decided to use Poisson regression, even though I'm not as  
familiar with it as linear or logit regression.

My problem is deciding which model to use.  I have created several,  
one without interaction terms (Total.vines~Site+Species+DBH), one  
with an interaction term between Site and Species  
(Total.vines~Site*Species+DBH), and one with interactions between all  
variables (Total.vines~Site*Species*DBH).  Here is my output from R  
for the first two models (the last model has the same number (and  
identity) of significant variables as the second model, even though  
the last model had more interaction terms overall):

%
Call:
glm(formula = Total.vines ~ Site + Species + DBH, family = poisson)

Deviance Residuals:
 Min   1Q   Median   3Q  Max
-5.2067  -1.2915  -0.7095  -0.3525   6.3756

Coefficients:
  Estimate Std. Error z value Pr(|z|)
(Intercept) -2.987695   0.231428 -12.910   2e-16 ***
SiteHuffman Dam  2.725193   0.249423  10.926   2e-16 ***
SiteNarrows  1.902987   0.227599   8.361   2e-16 ***
SiteSugar Creek  1.752754   0.242186   7.237 4.58e-13 ***
SpeciesFRAM  0.955468   0.157423   6.069 1.28e-09 ***
SpeciesPLOC  1.187903   0.141707   8.383   2e-16 ***
SpeciesULAM  0.340792   0.184615   1.846   0.0649 .
DBH  0.020708   0.001292  16.026   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

 Null deviance: 1972.3  on 544  degrees of freedom
Residual deviance: 1290.0  on 537  degrees of freedom
AIC: 1796.0

Number of Fisher Scoring iterations: 6


Call:
glm(formula = Total.vines ~ Site * Species + DBH, family = poisson,
 data = sycamores.1)

Deviance Residuals:
 Min   1Q   Median   3Q  Max
-4.9815  -1.2370  -0.6339  -0.3403   6.5664

Coefficients: (3 not defined because of singularities)
   Estimate Std. Error z value Pr(|z|)
(Intercept)  -2.788243   0.303064  -9.200   2e-16 ***
SiteHuffman Dam   1.838952   0.354127   5.193 2.07e-07 ***
SiteNarrows   2.252716   0.323184   6.970 3.16e-12 ***
SiteSugar Creek -12.961519 519.152077  -0.025 0.980082
SpeciesFRAM  13.938716 519.152230   0.027 0.978580
SpeciesPLOC   0.240223   0.540676   0.444 0.656824
SpeciesULAM   1.919586   0.540246   3.553 0.000381 ***
DBH   0.019984   0.001337  14.946   2e-16 ***
SiteHuffman Dam:SpeciesFRAM -11.513823 519.152294  -0.022 0.982306
SiteNarrows:SpeciesFRAM -13.593127 519.152268  -0.026 0.979111
SiteSugar Creek:SpeciesFRAM NA NA  NA   NA
SiteHuffman Dam:SpeciesPLOC NA NA  NA   NA
SiteNarrows:SpeciesPLOC   0.397503   0.555218   0.716 0.474028
SiteSugar Creek:SpeciesPLOC  15.640450 519.152277   0.030 0.975966
SiteHuffman Dam:SpeciesULAM  -0.102841   0.610027  -0.169 0.866124
SiteNarrows:SpeciesULAM  -2.809092   0.606804  -4.629 3.67e-06 ***
SiteSugar Creek:SpeciesULAM NA NA  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

 Null deviance: 1972.3  on 544  degrees of freedom
Residual deviance: 1178.7  on 531  degrees of freedom
AIC: 1696.6

Number of Fisher Scoring iterations: 13


As you can see, the two models give very different output, especially  
in regards to whether or not the individual species are significant.   
In the no-interaction model, the only species that was not  
significant was ULAM.  In the one-way interaction model, ULAM was the  
only significant species.  My question is this: which model should I  
use when I present this analysis?  I know that the one-way  
interaction model has the lower AIC.  Should I base my choice solely  
on AIC?  The reasons I'm asking is that the second model has only one  
significant interaction term, fewer significant terms overall, and  
three undefined terms.

Thanks for any guidance you can give to someone running his first  
Poisson regression.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435



[[alternative HTML version deleted]]

__

[R] Generating artificial datasets with a specific correlation coefficient.

2007-06-12 Thread James Milks
I need to create artificial datasets with specific correlation  
coefficients (i.e. a dataset that returns r = 0.30, etc.) as examples  
for a lab I am teaching this summer.  Is there a way to do that in R?

Thanks.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.