Following up on Duncan's answer...

You can see that the model doesn't fit with a little bit of residual checking. For example

gam.check(Model)

shows that the binomial assumption is just wrong here: there is massive overdispersion, for example. Even the crudest response to this, of using quasibinomial, then gives a non-significant s(x1) in the examples I've tried.

best,
Simon




On 10/10/15 12:40, Duncan Murdoch wrote:
On 09/10/2015 8:00 PM, Erin Conlisk wrote:
Hello,

I am having trouble testing for the significance using a binomial model in
gam{mgcv}.  Have I stumbled on a bug?  I doubt I would be so lucky, so
could someone tell me what I am doing wrong?

Please see the following code:
________________________________

# PROBLEM USING cbind

x1 <- runif(500, 0, 100)  # Create 500 random variables to use as my
explanatory variable

y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as
binomial "successes"

y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100
trials and the successes recorded in y1

Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial)
summary(Model)
________________________________

The result is that my random variable, x1, is highly significant.  This
can't be right...
The problem is that statistical significance of a test doesn't mean that
the alternative you have in mind is right, it just means that the null
is wrong (or you were unlucky, but let's ignore that).

The null hypothesis here is that all of the y1 values are independent
binomial values with a common n=100 and common unspecifed probability of
success.

Since in fact y1 comes from a discrete uniform distribution, that's
false, and the p-value is not anywhere near being a random Unif(0,1) value.

If you wanted the null hypothesis to be true, you'd need to choose a
success probability p somehow, then set y1 <- rbinom(500, size=100, prob
= p).  The random uniform has a far larger variance, and that leads to a
larger deviance in gam, hence significance.

Duncan Murdoch

So what happens when I change the observations from a "batch" of 100 trials
to individual successes and failures?
________________________________

# NOW MAKE ALL THESE DATA 0 and 1

r01<-rep(0,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4]  # Creates a
replicate of the      #  explanatory variables for each individual "success"

r01<-rep(1,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4]  # Creates a
replicate of the      #  explanatory variables for each individual "failure"

data01<-rbind(expanded0,expanded1)

Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial)
summary(Model2)
___________________________________

The result is what I expect.  Now my random variable, x1, is NOT
significant.

What is going on here?

I should say that I didn't just make this up.  My question arose playing
with my real data, where I was getting high significance, but a terrible
proportion of deviance explained.

My apologies if this is explained elsewhere, but I have spent hours
searching for an answer online.

Thank you kindly,
Erin Conlisk

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to