Re: [R] mgcv: distribution of dev with Poisson data

Simon Wood Wed, 05 Feb 2014 06:07:59 -0800

Hi Greg,

Yes, this sounds right - with quasipoisson gam uses `extendedquasi-likelihood' (see McCullagh and Nelder's GLM book) to allowestimation of the scale parameter along with the smoothing parametersvia (RE)ML, and it could well be that this gives a biased scale estimatewith low counts (although the shape of the mean variance relationship isunaffected by this).

It might well be more sensible to default to a Pearson estimate of thescale parameter for 'gam' ('bam' and 'gamm' already do this), to avoidthis issue with quasipoisson and low counts. Your email reminded me thatsomeone had reported rather too high p-values for quasipoisson withthese sort of data, and looking back at my list of open issues I seethat `someone' was you as well! It's possible that moving to a Pearsonestimator of the scale will solve that problem too.


Thanks for this... very helpful.

best,
Simon


On 05/02/14 12:56, Greg Dropkin wrote:

thanks Simon

also, it appears at least with ML that the default scale estimate with
quasipoisson (i.e. using dev) is the scale which minimises the ML value of
the fitted model. So it is the "best" model but doesn't actually give the
correct mean-variance relation. Is that right?

thanks again

Greg

Greg,

The deviance being chi^2 distributed on the residual degrees of freedom
works in some cases (mostly where the response itself can be reasonably
approximated as Gaussian), but rather poorly in others (noteably low
count data). This is what you are seeing in your simulations - in the
first case the Poisson mean is relatively high - so the  normal
approximation to the Poisson is not too bad and the deviance is approx.
chi.sq on residual df. In the second case the Poisson mean is low, there
are lots of zeroes, and the approximation breaks down. So yes, the
Pearson estimator is probably a better bet in the latter case.

best,
Simom

ps. Note that the chi squared approximation for the *difference* in
deviance between two nested models does not suffer from this problem.


On 04/02/14 09:25, Greg Dropkin wrote:

mgcv: distribution of dev

hi

I can't tell if this is a simple error.
I'm puzzled by the distribution of dev when fitting a gam to Poisson
generated data.
I expected dev to be approximately chi-squared on residual d.f., i.e.
about 1000 in each case below.
In particular, the low values in the 3rd and 4th versions would suggest
scale < 1, yet the data is Poisson generated.
The problem isn't caused by insufficient k values in the smoother.
Does this mean that with sparse data the distribution of dev is no
longer
approx chi-sq on residual df?
Does it mean the scale estimate when fitting quasipoisson should be the
Pearson version?

thanks

greg

library(mgcv)
set.seed(1)
x1<-runif(1000)
linp<-2+exp(-2*x1)*sin(8*x1)
sum(exp(linp))
dev1<-dev2<-sums<-rep(0,20)
for (j in 1:20)
{
y<-rpois(1000,exp(linp))
sums[j]<-sum(y)
m1<-gam(y~s(x1),family="poisson")
m2<-gam(y~s(x1,k=20),family="poisson")
dev1[j]=m1$dev
dev2[j]=m2$dev
}
mean(sums)
sd(sums)
mean(dev1)
sd(dev1)
mean(dev2)
sd(dev2)

#dev slighly higher than expected

linpa<--1+exp(-2*x1)*sin(8*x1)
sum(exp(linpa))
dev1a<-dev2a<-sumsa<-rep(0,20)
for (j in 1:20)
{
y<-rpois(1000,exp(linpa))
sumsa[j]<-sum(y)
m1<-gam(y~s(x1),family="poisson")
m2<-gam(y~s(x1,k=20),family="poisson")
dev1a[j]=m1$dev
dev2a[j]=m2$dev
}
mean(sumsa)
sd(sumsa)
mean(dev1a)
sd(dev1a)
mean(dev2a)
sd(dev2a)

#dev a bit lower than expected

linpb<--1.5+exp(-2*x1)*sin(8*x1)
sum(exp(linpb))
dev1b<-dev2b<-sumsb<-rep(0,20)
for (j in 1:20)
{
y<-rpois(1000,exp(linpb))
sumsb[j]<-sum(y)
m1<-gam(y~s(x1),family="poisson")
m2<-gam(y~s(x1,k=20),family="poisson")
dev1b[j]=m1$dev
dev2b[j]=m2$dev
}
mean(sumsb)
sd(sumsb)
mean(dev1b)
sd(dev1b)
mean(dev2b)
sd(dev2b)

#dev much lower than expected

m1q<-gam(y~s(x1),family="quasipoisson",scale=-1)
m1q$scale
m1q$dev/(1000-sum(m1q$edf))

#scale estimate < 1

sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf))

#Pearson estimate of scale closer, but also < 1


linpc<--2+exp(-2*x1)*sin(8*x1)
sum(exp(linpc))
dev1c<-dev2c<-sumsc<-rep(0,20)
for (j in 1:20)
{
y<-rpois(1000,exp(linpc))
sumsc[j]<-sum(y)
m1<-gam(y~s(x1),family="poisson")
m2<-gam(y~s(x1,k=20),family="poisson")
dev1c[j]=m1$dev
dev2c[j]=m2$dev
}
mean(sumsc)
sd(sumsc)
mean(dev1c)
sd(dev1c)
mean(dev2c)
sd(dev2c)

#dev much lower than expected

m1q<-gam(y~s(x1),family="quasipoisson",scale=-1)
m1q$scale
m1q$sig2
m1q$dev/(1000-sum(m1q$edf))

#scale estimate < 1

sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf))

#Pearson estimate of scale ok



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mgcv: distribution of dev with Poisson data

Reply via email to