Re: [R] Problem with binomial gam{mgcv}

2015-10-13 Thread Simon Wood

Following up on Duncan's answer...

You can see that the model doesn't fit with a little bit of residual 
checking. For example


gam.check(Model)

shows that the binomial assumption is just wrong here: there is massive 
overdispersion, for example. Even the crudest response to this, of using 
quasibinomial, then gives a non-significant  s(x1) in the examples I've 
tried.


best,
Simon




On 10/10/15 12:40, Duncan Murdoch wrote:

On 09/10/2015 8:00 PM, Erin Conlisk wrote:

Hello,

I am having trouble testing for the significance using a binomial model in
gam{mgcv}.  Have I stumbled on a bug?  I doubt I would be so lucky, so
could someone tell me what I am doing wrong?

Please see the following code:


# PROBLEM USING cbind

x1 <- runif(500, 0, 100)  # Create 500 random variables to use as my
explanatory variable

y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as
binomial "successes"

y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100
trials and the successes recorded in y1

Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial)
summary(Model)


The result is that my random variable, x1, is highly significant.  This
can't be right...

The problem is that statistical significance of a test doesn't mean that
the alternative you have in mind is right, it just means that the null
is wrong (or you were unlucky, but let's ignore that).

The null hypothesis here is that all of the y1 values are independent
binomial values with a common n=100 and common unspecifed probability of
success.

Since in fact y1 comes from a discrete uniform distribution, that's
false, and the p-value is not anywhere near being a random Unif(0,1) value.

If you wanted the null hypothesis to be true, you'd need to choose a
success probability p somehow, then set y1 <- rbinom(500, size=100, prob
= p).  The random uniform has a far larger variance, and that leads to a
larger deviance in gam, hence significance.

Duncan Murdoch


So what happens when I change the observations from a "batch" of 100 trials
to individual successes and failures?


# NOW MAKE ALL THESE DATA 0 and 1

r01<-rep(0,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4]  # Creates a
replicate of the  #  explanatory variables for each individual "success"

r01<-rep(1,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4]  # Creates a
replicate of the  #  explanatory variables for each individual "failure"

data01<-rbind(expanded0,expanded1)

Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial)
summary(Model2)
___

The result is what I expect.  Now my random variable, x1, is NOT
significant.

What is going on here?

I should say that I didn't just make this up.  My question arose playing
with my real data, where I was getting high significance, but a terrible
proportion of deviance explained.

My apologies if this is explained elsewhere, but I have spent hours
searching for an answer online.

Thank you kindly,
Erin Conlisk


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with binomial gam{mgcv}

2015-10-10 Thread Duncan Murdoch
On 09/10/2015 8:00 PM, Erin Conlisk wrote:
> Hello,
> 
> I am having trouble testing for the significance using a binomial model in
> gam{mgcv}.  Have I stumbled on a bug?  I doubt I would be so lucky, so
> could someone tell me what I am doing wrong?
> 
> Please see the following code:
> 
> 
> # PROBLEM USING cbind
> 
> x1 <- runif(500, 0, 100)  # Create 500 random variables to use as my
> explanatory variable
> 
> y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as
> binomial "successes"
> 
> y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100
> trials and the successes recorded in y1
> 
> Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial)
> summary(Model)
> 
> 
> The result is that my random variable, x1, is highly significant.  This
> can't be right...

The problem is that statistical significance of a test doesn't mean that
the alternative you have in mind is right, it just means that the null
is wrong (or you were unlucky, but let's ignore that).

The null hypothesis here is that all of the y1 values are independent
binomial values with a common n=100 and common unspecifed probability of
success.

Since in fact y1 comes from a discrete uniform distribution, that's
false, and the p-value is not anywhere near being a random Unif(0,1) value.

If you wanted the null hypothesis to be true, you'd need to choose a
success probability p somehow, then set y1 <- rbinom(500, size=100, prob
= p).  The random uniform has a far larger variance, and that leads to a
larger deviance in gam, hence significance.

Duncan Murdoch

> 
> So what happens when I change the observations from a "batch" of 100 trials
> to individual successes and failures?
> 
> 
> # NOW MAKE ALL THESE DATA 0 and 1
> 
> r01<-rep(0,500)
> data01<-cbind(x1, y1, y2, r01)
> rownames(data01)<-seq(1,500, 1)
> colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
> data01<-data.frame(data01)
> 
> expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4]  # Creates a
> replicate of the  #  explanatory variables for each individual "success"
> 
> r01<-rep(1,500)
> data01<-cbind(x1, y1, y2, r01)
> rownames(data01)<-seq(1,500, 1)
> colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
> data01<-data.frame(data01)
> 
> expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4]  # Creates a
> replicate of the  #  explanatory variables for each individual "failure"
> 
> data01<-rbind(expanded0,expanded1)
> 
> Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial)
> summary(Model2)
> ___
> 
> The result is what I expect.  Now my random variable, x1, is NOT
> significant.
> 
> What is going on here?
> 
> I should say that I didn't just make this up.  My question arose playing
> with my real data, where I was getting high significance, but a terrible
> proportion of deviance explained.
> 
> My apologies if this is explained elsewhere, but I have spent hours
> searching for an answer online.
> 
> Thank you kindly,
> Erin Conlisk
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with binomial gam{mgcv}

2015-10-09 Thread Erin Conlisk
Hello,

I am having trouble testing for the significance using a binomial model in
gam{mgcv}.  Have I stumbled on a bug?  I doubt I would be so lucky, so
could someone tell me what I am doing wrong?

Please see the following code:


# PROBLEM USING cbind

x1 <- runif(500, 0, 100)  # Create 500 random variables to use as my
explanatory variable

y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as
binomial "successes"

y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100
trials and the successes recorded in y1

Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial)
summary(Model)


The result is that my random variable, x1, is highly significant.  This
can't be right...

So what happens when I change the observations from a "batch" of 100 trials
to individual successes and failures?


# NOW MAKE ALL THESE DATA 0 and 1

r01<-rep(0,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4]  # Creates a
replicate of the  #  explanatory variables for each individual "success"

r01<-rep(1,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4]  # Creates a
replicate of the  #  explanatory variables for each individual "failure"

data01<-rbind(expanded0,expanded1)

Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial)
summary(Model2)
___

The result is what I expect.  Now my random variable, x1, is NOT
significant.

What is going on here?

I should say that I didn't just make this up.  My question arose playing
with my real data, where I was getting high significance, but a terrible
proportion of deviance explained.

My apologies if this is explained elsewhere, but I have spent hours
searching for an answer online.

Thank you kindly,
Erin Conlisk

-- 
Postdoctoral Researcher
UC Berkeley
Energy and Resources Group
310 Barrows Hall
Berkeley, CA 94720

cell: 858-776-2939

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.