Re: [R] Problem with binomial gam{mgcv}
Following up on Duncan's answer... You can see that the model doesn't fit with a little bit of residual checking. For example gam.check(Model) shows that the binomial assumption is just wrong here: there is massive overdispersion, for example. Even the crudest response to this, of using quasibinomial, then gives a non-significant s(x1) in the examples I've tried. best, Simon On 10/10/15 12:40, Duncan Murdoch wrote: On 09/10/2015 8:00 PM, Erin Conlisk wrote: Hello, I am having trouble testing for the significance using a binomial model in gam{mgcv}. Have I stumbled on a bug? I doubt I would be so lucky, so could someone tell me what I am doing wrong? Please see the following code: # PROBLEM USING cbind x1 <- runif(500, 0, 100) # Create 500 random variables to use as my explanatory variable y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as binomial "successes" y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100 trials and the successes recorded in y1 Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial) summary(Model) The result is that my random variable, x1, is highly significant. This can't be right... The problem is that statistical significance of a test doesn't mean that the alternative you have in mind is right, it just means that the null is wrong (or you were unlucky, but let's ignore that). The null hypothesis here is that all of the y1 values are independent binomial values with a common n=100 and common unspecifed probability of success. Since in fact y1 comes from a discrete uniform distribution, that's false, and the p-value is not anywhere near being a random Unif(0,1) value. If you wanted the null hypothesis to be true, you'd need to choose a success probability p somehow, then set y1 <- rbinom(500, size=100, prob = p). The random uniform has a far larger variance, and that leads to a larger deviance in gam, hence significance. Duncan Murdoch So what happens when I change the observations from a "batch" of 100 trials to individual successes and failures? # NOW MAKE ALL THESE DATA 0 and 1 r01<-rep(0,500) data01<-cbind(x1, y1, y2, r01) rownames(data01)<-seq(1,500, 1) colnames(data01)<-c('x1', 'y1', 'y2', 'r01') data01<-data.frame(data01) expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4] # Creates a replicate of the # explanatory variables for each individual "success" r01<-rep(1,500) data01<-cbind(x1, y1, y2, r01) rownames(data01)<-seq(1,500, 1) colnames(data01)<-c('x1', 'y1', 'y2', 'r01') data01<-data.frame(data01) expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4] # Creates a replicate of the # explanatory variables for each individual "failure" data01<-rbind(expanded0,expanded1) Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial) summary(Model2) ___ The result is what I expect. Now my random variable, x1, is NOT significant. What is going on here? I should say that I didn't just make this up. My question arose playing with my real data, where I was getting high significance, but a terrible proportion of deviance explained. My apologies if this is explained elsewhere, but I have spent hours searching for an answer online. Thank you kindly, Erin Conlisk __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with binomial gam{mgcv}
On 09/10/2015 8:00 PM, Erin Conlisk wrote: > Hello, > > I am having trouble testing for the significance using a binomial model in > gam{mgcv}. Have I stumbled on a bug? I doubt I would be so lucky, so > could someone tell me what I am doing wrong? > > Please see the following code: > > > # PROBLEM USING cbind > > x1 <- runif(500, 0, 100) # Create 500 random variables to use as my > explanatory variable > > y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as > binomial "successes" > > y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100 > trials and the successes recorded in y1 > > Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial) > summary(Model) > > > The result is that my random variable, x1, is highly significant. This > can't be right... The problem is that statistical significance of a test doesn't mean that the alternative you have in mind is right, it just means that the null is wrong (or you were unlucky, but let's ignore that). The null hypothesis here is that all of the y1 values are independent binomial values with a common n=100 and common unspecifed probability of success. Since in fact y1 comes from a discrete uniform distribution, that's false, and the p-value is not anywhere near being a random Unif(0,1) value. If you wanted the null hypothesis to be true, you'd need to choose a success probability p somehow, then set y1 <- rbinom(500, size=100, prob = p). The random uniform has a far larger variance, and that leads to a larger deviance in gam, hence significance. Duncan Murdoch > > So what happens when I change the observations from a "batch" of 100 trials > to individual successes and failures? > > > # NOW MAKE ALL THESE DATA 0 and 1 > > r01<-rep(0,500) > data01<-cbind(x1, y1, y2, r01) > rownames(data01)<-seq(1,500, 1) > colnames(data01)<-c('x1', 'y1', 'y2', 'r01') > data01<-data.frame(data01) > > expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4] # Creates a > replicate of the # explanatory variables for each individual "success" > > r01<-rep(1,500) > data01<-cbind(x1, y1, y2, r01) > rownames(data01)<-seq(1,500, 1) > colnames(data01)<-c('x1', 'y1', 'y2', 'r01') > data01<-data.frame(data01) > > expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4] # Creates a > replicate of the # explanatory variables for each individual "failure" > > data01<-rbind(expanded0,expanded1) > > Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial) > summary(Model2) > ___ > > The result is what I expect. Now my random variable, x1, is NOT > significant. > > What is going on here? > > I should say that I didn't just make this up. My question arose playing > with my real data, where I was getting high significance, but a terrible > proportion of deviance explained. > > My apologies if this is explained elsewhere, but I have spent hours > searching for an answer online. > > Thank you kindly, > Erin Conlisk > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with binomial gam{mgcv}
Hello, I am having trouble testing for the significance using a binomial model in gam{mgcv}. Have I stumbled on a bug? I doubt I would be so lucky, so could someone tell me what I am doing wrong? Please see the following code: # PROBLEM USING cbind x1 <- runif(500, 0, 100) # Create 500 random variables to use as my explanatory variable y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as binomial "successes" y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100 trials and the successes recorded in y1 Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial) summary(Model) The result is that my random variable, x1, is highly significant. This can't be right... So what happens when I change the observations from a "batch" of 100 trials to individual successes and failures? # NOW MAKE ALL THESE DATA 0 and 1 r01<-rep(0,500) data01<-cbind(x1, y1, y2, r01) rownames(data01)<-seq(1,500, 1) colnames(data01)<-c('x1', 'y1', 'y2', 'r01') data01<-data.frame(data01) expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4] # Creates a replicate of the # explanatory variables for each individual "success" r01<-rep(1,500) data01<-cbind(x1, y1, y2, r01) rownames(data01)<-seq(1,500, 1) colnames(data01)<-c('x1', 'y1', 'y2', 'r01') data01<-data.frame(data01) expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4] # Creates a replicate of the # explanatory variables for each individual "failure" data01<-rbind(expanded0,expanded1) Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial) summary(Model2) ___ The result is what I expect. Now my random variable, x1, is NOT significant. What is going on here? I should say that I didn't just make this up. My question arose playing with my real data, where I was getting high significance, but a terrible proportion of deviance explained. My apologies if this is explained elsewhere, but I have spent hours searching for an answer online. Thank you kindly, Erin Conlisk -- Postdoctoral Researcher UC Berkeley Energy and Resources Group 310 Barrows Hall Berkeley, CA 94720 cell: 858-776-2939 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.