I have been able to isolate the problem, though I still do not understand why this is occurring. Consider the following code:
##### library(foreign) muscatine <- read.dta('http://www.hsph.harvard.edu/fitzmaur/ala2e/muscatine.dta') muscatine$gender <- as.factor(muscatine$gender) muscatine$y.fac <- as.factor(muscatine$y) # Make the response a factor muscatine$cage <- muscatine$age - 12 muscatine$cage2 <- muscatine$cage^2 muscatine2 <- na.omit(muscatine) # Remove missing data > str(muscatine2) 'data.frame': 9856 obs. of 9 variables: $ id : num 1 1 1 2 2 2 3 3 3 4 ... $ gender : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... $ baseage : num 6 6 6 6 6 6 6 6 6 6 ... $ age : num 6 8 10 6 8 10 6 8 10 6 ... $ occasion: num 1 2 3 1 2 3 1 2 3 1 ... $ y : num 1 1 1 1 1 1 1 1 1 1 ... $ y.fac : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... $ cage : num -6 -4 -2 -6 -4 -2 -6 -4 -2 -6 ... $ cage2 : num 36 16 4 36 16 4 36 16 4 36 ... # This model works and is fairly close to Fitzmaurice book results f1 <- geeglm(y ~ gender*cage + gender*cage2, id=id, data=muscatine2, family=binomial(link=logit), waves=muscatine2$occasions, corstr='unstructured') # This model does not work, only difference is response is a factor f2 <- geeglm(y.fac ~ gender*cage + gender*cage2, id=id, data=muscatine2, family=binomial(link=logit), waves=muscatine2$occasions, corstr='unstructured') > ... Error in lm.fit(zsca, qlf(pr2), offset = soffset) : NA/NaN/Inf in 'y' In addition: Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, mu) : - not meaningful for factors # These models do not fix problem and give a new error message f3 <- geeglm(as.numeric(y.fac) ~ gender*cage + gender*cage2, id=id, data=muscatine2, family=binomial(link=logit), waves=muscatine2$occasions, corstr='unstructured') f4 <- geeglm(as.integer(y.fac) ~ gender*cage + gender*cage2, id=id, data=muscatine2, family=binomial(link=logit), waves=muscatine2$occasions, corstr='unstructured') > ... Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 # This model works and is really really close to Fitzmaurice book f5 <- ordgee(ordered(y.fac) ~ gender*cage + gender*cage2, id=id, data=muscatine2, mean.link='logit', waves=muscatine2$occasions, corstr='unstructured') ### Bottom line: Something is occurring when I changed the response variable "y" into the factor "y.fac". that makes geeglm spit out an error and occasionally even crash R (according to some respondents that were trying to help me). This error is not reversible by converting y.fac back into a numeric variable. Interestingly, the ordgee function from the same geepack package handles the factor response variable without issue and appears to give results that best mimic the textbook example. Thanks to all that helped. Hope this summary helps debug the geeglm function and help others. I have cc'd the geepack maintainer as suggested by some of you. Brant On Mar 1, 2014, at 8:31 PM, Brant Inman <brant.in...@me.com> wrote: > Duncan, > > Thank you for your reply. The example is in fact not ordinal (the response > variable Y is an indicator of the presence or absence of obesity). I too saw > their code snippet online where they use an ordinal GEE, but the outcome > variable is binary as can be seen from the imported data from the link I > provided. I thought that that since Y is a dichotomous outcome that the model > I proposed would be appropriate, but somehow the geeglm function thinks there > is missing data and I don't see how that can be. > > Any other ideas? > > > Brant > > On Mar 1, 2014, at 8:13 PM, Duncan Mackay <dulca...@bigpond.com> wrote: > >> Hi Brant >> >> I have not got Fitzmaurice etal but from their web site it seems that you >> are trying to do ordinal GEE >> >> With GEE models particularly ordinal models you MUST get your data structure >> correct otherwise it can fail or even R can crash >> >> try >> >> f1 = >> ordgee(ordered(y) ~ factor(gender) + cage + cage2 + >> factor(gender):cage + factor(gender):cage2, id = id, data = >> muscatine2, >> waves=muscatine2$occasion, mean.link="logit", >> corstr=("unstructured")) >> >>> summary(f1) >> >> Call: >> ordgee(formula = ordered(y) ~ factor(gender) + cage + cage2 + >> factor(gender):cage + factor(gender):cage2, id = id, waves = >> muscatine2$occasion, >> data = muscatine2, mean.link = "logit", corstr = ("unstructured")) >> >> Mean Model: >> Mean Link: logit >> Variance to Mean Relation: binomial >> >> Coefficients: >> estimate san.se wald p >> Inter:0 -1.214613103 0.050571150 576.8597850 0.000000e+00 >> factor(gender)1 0.115330450 0.071158497 2.6268450 1.050703e-01 >> cage 0.037419375 0.013263832 7.9589357 4.785054e-03 >> cage2 -0.017437692 0.003378786 26.6352422 2.457205e-07 >> factor(gender)1:cage 0.007510802 0.018268075 0.1690390 6.809673e-01 >> factor(gender)1:cage2 0.003860069 0.004632095 0.6944407 4.046580e-01 >> >> Scale is fixed. >> >> Correlation Model: >> Correlation Structure: unstructured >> Correlation Link: log >> >> Estimated Correlation Parameters: >> estimate san.se wald p >> alpha.1 3.130702 0.1535950 415.4599 0 >> alpha.2 2.408103 0.1455606 273.6921 0 >> alpha.3 2.793549 0.1351264 427.3978 0 >> >> Returned Error Value: 0 >> Number of clusters: 4856 Maximum cluster size: 3 >> >> I presume that you may have a dataset in mind to work on later >> >> you may want to check out the repolr and multgee packages as well >> >> Duncan >> >> Duncan Mackay >> Department of Agronomy and Soil Science >> University of New England >> Armidale NSW 2351 >> Email: home: mac...@northnet.com.au >> >> >> >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >> Behalf Of Brant Inman >> Sent: Sunday, 2 March 2014 03:52 >> To: r-help@r-project.org >> Subject: [R] geeglm error NA/NaN/Inf in 'y' >> >> R-helpers: >> >> I am getting an error when trying to fit a GEE model. Below is code >> reproducing the error. >> >> ### >> library(foreign) >> muscatine <- >> read.dta('http://www.hsph.harvard.edu/fitzmaur/ala2e/muscatine.dta') >> muscatine$gender <- as.factor(muscatine$gender) >> muscatine$y <- as.factor(muscatine$y) >> muscatine$cage <- muscatine$age - 12 >> muscatine$cage2 <- muscatine$cage^2 >> head(muscatine); summary(muscatine) >> muscatine2 <- na.omit(muscatine); summary(muscatine2) # Remove missing >> data >> >> # GEE model to reproduce example in Fitzmaurice, Laird, Ware book >> library(geepack) >> >> f1 <- geeglm(y ~ gender*cage + gender*cage2, id=id, data=muscatine2, >> family=binomial(link=logit), >> waves=occasion, corstr='unstructured') >> ### >> >> This gives me the following error >> >>> f1 <- geeglm(y ~ gender*cage + gender*cage2, id=id, data=muscatine2, >> + family=binomial(link=logit), >> + waves=occasion, corstr='unstructured') >> Error in lm.fit(zsca, qlf(pr2), offset = soffset) : NA/NaN/Inf in 'y' >> In addition: Warning messages: >> 1: In model.response(mf, "numeric") : >> using type = "numeric" with a factor response will be ignored >> 2: In Ops.factor(y, mu) : - not meaningful for factors >> >> ### >> >> I would tremendously appreciate any help that could explain why I am getting >> this error as I am not understanding this. >> >> Brant >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.