LNadler <lauren.e.nadler <at> gmail.com> writes: > > Hi there, > > I have been having trouble running negative binomial regression (glm.nb) > using library MASS in R v2.15.0 on Mac OSX. > > I am running multiple models on the variables influencing the group size of > damselfish in coral reefs (count data). For total group size and two of my > species, glm.nb is working great to deal with overdispersion in my count > data. For two of my species, I am getting a mixture of warning and error > messages. These species are different from the others in that they have > slightly smaller group sample size (so there are more 0s) and the group size > varies more widely (from 1 to 45 fish versus 1 to 11 fish for the other > species for which the model is working). Here is a sample of my output and > the error messages:
> > > model1<-glm.nb(X...of.C..viridis~PC1+Average.Branch.Diameter+ > Average.Branch.Spacing+PC1*Average.Branch.Diameter+ > Average.Branch.Diameter*Average.Branch.Spacing+PC1*Average.Branch.Spacing) By the way, this model specification is equivalent to X...of.C..viridis~(PC1+Average.Branch.Diameter+Average.Branch.Spacing)^2 (i.e. main effects plus all pairwise interactions); also, A*B stands for interactions plus all main effects (A + B+ A:B) so your formula is slightly (harmlessly) redundant > There were 50 or more warnings (use warnings() to see the first 50) > > warnings() > Warning messages: > 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = > control$trace > ... : > iteration limit reached > 2: In sqrt(1/i) : NaNs produced > 3: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = > control$trace > ... : > iteration limit reached (ETC...CONTINUES ON THE SAME FOR 50 WARNINGS) > > (Dispersion parameter for Negative Binomial(61302.24) family taken to be 1) This estimated dispersion parameter is ridiculously large ... either your data are effectively Poisson, or the the theta estimate is questionable. > Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : > invalid 'nsmall' argument This error actually derives not from the model fitting itself, but from R's attempt to print the summary -- the print.summary method is trying to use the standard error of theta to determine the format, but that's NA (or NaN, I forget) in this case. Here's a reproducible example I came up with several years ago that shows a similar pathology; I was able to get around it using bbmle::mle2 ... ############################################# library(MASS) tadT <- c(0,0,0,450,0,0,0,315,233,0,200,300) Distance <- rep(c(10,100,500),rep(4,3)) d <- data.frame(tadT,Distance) plot(tadT) plot(Distance,jitter(tadT)) library(ggplot2) ggplot(d,aes(Distance,tadT))+stat_sum(aes(size=factor(..n..)))+ theme_bw()+geom_smooth(method="glm",family="quasipoisson") ## constant model tad.con <- glm.nb(tadT ~ 1, control=glm.control(trace=10)) log(mean(tadT)) ## OK, it got the mean right at least ... tad.con$theta ss <- summary(tad.con) print(ss) ## error ## Error in prettNum(...) invalid 'nsmall' argument ## debug(MASS:::print.summary.negbin) ## x$SE.theta is NaN so dp is NaN so format(...,nsmall=dp) ## gives an error ## given more iterations to get where it's going, it goes ## out of control (and crashes) glm.nb(tadT ~ 1, control=glm.control(trace=10,maxit=100)) ## doesn't help to make an identity link glm.nb(tadT ~ 1,link=identity, control=glm.control(trace=10)) glm.nb(tadT ~ 1,link=identity, control=glm.control(trace=10,maxit=100)) library(bbmle) ## can also use starting values of 0,0, but get warnings m1 <- mle2(tadT~dnbinom(mu=exp(logmu),size=exp(logk)), data=d, start=list(logmu=4,logk=-1)) confint(m1) ## works OK p1 <- profile(m1) plot(p1) ## also try as a function of distance tad.dist <- glm.nb(tadT ~ Distance) ## with trace/maxit set glm.nb(tadT ~ Distance,control=glm.control(trace=10,maxit=100)) ## does setting init.theta <<1 help? glm.nb(tadT ~ Distance,init.theta=1e-6,control=glm.control(trace=10,maxit=100)) glm.nb(tadT ~ Distance,init.theta=1e-1,control=glm.control(trace=10,maxit=100)) m2 <- mle2(tadT~dnbinom(mu=exp(logmu),size=exp(logk)), data=d, parameters=list(logmu~Distance), start=list(logmu=4,logk=-1)) summary(m2) p2 <- profile(m2) plot(p2) confint(p2) anova(m1,m2) ## setting all zeros to 1 gives reasonable answers but??? tadT1 <- tadT tadT1[tadT1==0] <- 1 glm.nb(tadT1 ~ Distance,control=glm.control(trace=10,maxit=100)) ## quasipoisson does work ... but no likelihood/AIC available glm.q1 <- glm(tadT ~ 1,family=quasipoisson()) glm.qd <- glm(tadT ~ Distance,family=quasipoisson()) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.