Re: [R] Lambert (1992) simulation

2012-05-06 Thread Achim Zeileis

On Sat, 5 May 2012, Christopher Desjardins wrote:


Hi,
I am a little confused at the output from predict() for a zeroinfl object.

Here's my confusion:

## From zeroinfl package
fm_zinb2 - zeroinfl(art ~ . | ., data = bioChemists, dist = negbin)


## The raw zero-inflated overdispersed data
 table(bioChemists$art)

  0   1   2   3   4   5   6   7   8   9  10  11  12  16  19
275 246 178  84  67  27  17  12   1   2   1   1   2   1   1

## The default output from predict. It looks like it is doing a horrible
job. Does it really predict 7 zeros?


No, see also this R-help post on Zero-inflated regression models: 
predicting no 0s:

https://stat.ethz.ch/pipermail/r-help/2011-June/279765.html

The predicted _mean_ of a negative binomial distribution is not the most 
likely outcome (i.e., the _mode_) of the distribution. The post above 
presents some hands on examples.




 table(round(predict(fm_zinb2)) )

  0   1   2   3   4   5   6  10
  7 354 487  45  12   6   3   1

##  The output from predict using count
 table(round(predict(fm_zinb2,type=count)))

  1   2   3   4   5   6  10
312 536  45  12   6   3   1

## The output from predict using zero, but here it predicts 24
structural zeros?
 table(round(predict(fm_zinb2,type=zero)))

  0   1
891  24


So my question is how do I interpret these different outputs from the
zeroinf object? What are the differences? The help page just left me
confused. I would expect that table(round(predict(fm_zinb2))) would be E(Y)
and would most accurately track table(bioChemists$art) but I am wrong. How
can I find the E(Y) that would most closely track the raw data?

Thanks,
Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lambert (1992) simulation

2012-04-27 Thread Achim Zeileis

On Thu, 26 Apr 2012, Christopher Desjardins wrote:


Hi,
I am trying to replicate Lambert (1992)'s simulation with zero-inflated
Poisson models. The citation is here:

@article{lambert1992zero,
Author = {Lambert, D.},
Journal = {Technometrics},
Pages = {1--14},
Publisher = {JSTOR},
Title = {Zero-inflated {P}oisson regression, with an application to defects
in manufacturing},
Year = {1992}}

Specifically I am trying to recreate Table 2. But my estimates for Gamma
are not working out. Any ideas why?


Your implementation of the inverse logit link is wrong, it should be 
exp(x)/(1 + exp(x)) and not exp(x)/exp(1 + x). In R I never code this by 
hand but either use qlogis()/plogis() or make.link(logit).


Your setting resulting in an almost constant probability of zero inflation 
and hence gamma was completely off.


Below is my cleaned up code which nicely replicates the results for n = 
100. The other two would require additional work because one would need to 
catch non-convergence etc.


hth,
Z

## data-generating process
dgp - function(nobs = 100) {
  gamma - c(-1.5, 2)
  beta - c(1.5, -2)
  x - seq(0, 1, length.out = nobs)
  lambda - exp(beta[1] + beta[2] * x)
  p - plogis(gamma[1] + gamma[2] * x)
  y - ifelse(runif(nobs) = p, 0, rpois(nobs, lambda = lambda))
  data.frame(y = y, x = x)
}

## simulation of coefficients and standard errors
sim - function(nrep = 2000, ...) {
  onesim - function(i) {
d - dgp(...)
m - zeroinfl(y ~ x, data = d)
c(coef(m), sqrt(diag(vcov(m
  }
  t(sapply(1:nrep, onesim))
}

## run simulation #3
library(pscl)
set.seed(1)
cfse - sim(nobs = 100)

## mean coefficient estimates
apply(cfse[, 1:4], 2, mean)

## median coefficient estimates
apply(cfse[, 1:4], 2, median)

## sd of coefficient estimates
apply(cfse[, 1:4], 2, sd)

## mean of the standard deviation estimates from observed information
apply(cfse[, 5:8], 2, mean)




Please cc me on a reply!
Thanks,
Chris

## ZIP simulations based on Lambert (1992)'s conditions

# Empty workspace
rm(list=ls())

# Load zero-inflation package
library(pscl)

# Simulation conditions #3
NumSimulations=2000
X=seq(from=0,to=1,length.out=N)
Model.Matrix=cbind(rep(1,length(X)),X)
Gamma=c(-1.5,2)
Beta=c(1.5,-2)
P=exp(Model.Matrix%*%Gamma)/exp(1+Model.Matrix%*%Gamma)
Lambda=exp(Model.Matrix%*%Beta)
CoefSimulations=matrix(nrow=NumSimulations,ncol=2*dim(Model.Matrix)[2])

for(i in 1 : NumSimulations){
Lambda.Draw=rpois(N,Lambda)
U=runif(N)
Y=ifelse(U=P,0,Lambda.Draw)
CoefSimulations[i,]=coef(zeroinfl(Y~X|X))
}

# What were the estimates?
colMeans(CoefSimulations) # My gamma estimates aren't even close ...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lambert (1992) simulation

2012-04-27 Thread Christopher Desjardins
On Fri, Apr 27, 2012 at 1:53 AM, Achim Zeileis achim.zeil...@uibk.ac.atwrote:

 On Thu, 26 Apr 2012, Christopher Desjardins wrote:

  Hi,
 I am trying to replicate Lambert (1992)'s simulation with zero-inflated
 Poisson models. The citation is here:

 @article{lambert1992zero,
 Author = {Lambert, D.},
 Journal = {Technometrics},
 Pages = {1--14},
 Publisher = {JSTOR},
 Title = {Zero-inflated {P}oisson regression, with an application to
 defects
 in manufacturing},
 Year = {1992}}

 Specifically I am trying to recreate Table 2. But my estimates for Gamma
 are not working out. Any ideas why?


 Your implementation of the inverse logit link is wrong, it should be
 exp(x)/(1 + exp(x)) and not exp(x)/exp(1 + x). In R I never code this by
 hand but either use qlogis()/plogis() or make.link(logit).



Doh! So obvious. I really should have noticed that. Thanks! I'll have a
look at the rest of your code too.
Cheers,
Chris



 Your setting resulting in an almost constant probability of zero inflation
 and hence gamma was completely off.

 Below is my cleaned up code which nicely replicates the results for n =
 100. The other two would require additional work because one would need to
 catch non-convergence etc.

 hth,
 Z

 ## data-generating process
 dgp - function(nobs = 100) {
  gamma - c(-1.5, 2)
  beta - c(1.5, -2)
  x - seq(0, 1, length.out = nobs)
  lambda - exp(beta[1] + beta[2] * x)
  p - plogis(gamma[1] + gamma[2] * x)
  y - ifelse(runif(nobs) = p, 0, rpois(nobs, lambda = lambda))
  data.frame(y = y, x = x)
 }

 ## simulation of coefficients and standard errors
 sim - function(nrep = 2000, ...) {
  onesim - function(i) {
d - dgp(...)
m - zeroinfl(y ~ x, data = d)
c(coef(m), sqrt(diag(vcov(m
  }
  t(sapply(1:nrep, onesim))
 }

 ## run simulation #3
 library(pscl)
 set.seed(1)
 cfse - sim(nobs = 100)

 ## mean coefficient estimates
 apply(cfse[, 1:4], 2, mean)

 ## median coefficient estimates
 apply(cfse[, 1:4], 2, median)

 ## sd of coefficient estimates
 apply(cfse[, 1:4], 2, sd)

 ## mean of the standard deviation estimates from observed information
 apply(cfse[, 5:8], 2, mean)



  Please cc me on a reply!
 Thanks,
 Chris

 ## ZIP simulations based on Lambert (1992)'s conditions

 # Empty workspace
 rm(list=ls())

 # Load zero-inflation package
 library(pscl)

 # Simulation conditions #3
 NumSimulations=2000
 X=seq(from=0,to=1,length.out=**N)
 Model.Matrix=cbind(rep(1,**length(X)),X)
 Gamma=c(-1.5,2)
 Beta=c(1.5,-2)
 P=exp(Model.Matrix%*%Gamma)/**exp(1+Model.Matrix%*%Gamma)
 Lambda=exp(Model.Matrix%*%**Beta)
 CoefSimulations=matrix(nrow=**NumSimulations,ncol=2*dim(**
 Model.Matrix)[2])

 for(i in 1 : NumSimulations){
 Lambda.Draw=rpois(N,Lambda)
 U=runif(N)
 Y=ifelse(U=P,0,Lambda.Draw)
 CoefSimulations[i,]=coef(**zeroinfl(Y~X|X))
 }

 # What were the estimates?
 colMeans(CoefSimulations) # My gamma estimates aren't even close ...

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lambert (1992) simulation

2012-04-26 Thread Christopher Desjardins
Hi,
I am trying to replicate Lambert (1992)'s simulation with zero-inflated
Poisson models. The citation is here:

@article{lambert1992zero,
Author = {Lambert, D.},
Journal = {Technometrics},
Pages = {1--14},
Publisher = {JSTOR},
Title = {Zero-inflated {P}oisson regression, with an application to defects
in manufacturing},
Year = {1992}}

Specifically I am trying to recreate Table 2. But my estimates for Gamma
are not working out. Any ideas why? Please cc me on a reply!
Thanks,
Chris

## ZIP simulations based on Lambert (1992)'s conditions

# Empty workspace
rm(list=ls())

# Load zero-inflation package
library(pscl)

# Simulation conditions #3
NumSimulations=2000
X=seq(from=0,to=1,length.out=N)
Model.Matrix=cbind(rep(1,length(X)),X)
Gamma=c(-1.5,2)
Beta=c(1.5,-2)
P=exp(Model.Matrix%*%Gamma)/exp(1+Model.Matrix%*%Gamma)
Lambda=exp(Model.Matrix%*%Beta)
CoefSimulations=matrix(nrow=NumSimulations,ncol=2*dim(Model.Matrix)[2])

for(i in 1 : NumSimulations){
Lambda.Draw=rpois(N,Lambda)
U=runif(N)
Y=ifelse(U=P,0,Lambda.Draw)
CoefSimulations[i,]=coef(zeroinfl(Y~X|X))
}

# What were the estimates?
colMeans(CoefSimulations) # My gamma estimates aren't even close ...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.