Re: [R] Modelling an incomplete Poisson distribution ?

2009-04-19 Thread Emmanuel Charpentier
Dear Ben,

Le samedi 18 avril 2009 à 23:37 +, Ben Bolker a écrit :
 Emmanuel Charpentier charpent at bacbuc.dyndns.org writes:
 
  
  I forgot to add that yes, I've done my homework, and that it seems to me
  that answers pointing to zero-inflated Poisson (and negative binomial)
  are irrelevant ; I do not have a mixture of distributions but only part
  of one distribution, or, if you'll have it, a zero-deflated Poisson.
  
  An answer by Brian Ripley
  (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
  question leaves me a bit dismayed : if it is easy to compute the
  probability function of this zero-deflated RV (off the top of my head,
  Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
  (still) able to use optim to ML-estimate lambda, using this to
  (correctly) model my problem set and test it amounts to re-writing some
  (large) part of glm. Furthermore, I'd be a bit embarrassed to test it
  cleanly (i. e. justifiably) : out of the top of my head, only the
  likelihood ration test seems readily applicable to my problem. Testing
  contrasts in my covariates ... hum !
  
  So if someone has written something to that effect, I'd be awfully glad
  to use it. A not-so-cursory look at the existing packages did not ring
  any bells to my (admittedly untrained) ears...
  
  Of course, I could also bootstrap the damn thing and study the
  distribution of my contrasts. I'd still been hard pressed to formally
  test hypotheses on factors...
  
 
   I would call this a truncated Poisson distribution, related
 to hurdle models.  You could probably use the hurdle function
 in the pscl package to do this, by ignoring the fitting to
 the zero part of the model.  On the other hand, it might break
 if there are no zeros at all (adding some zeros would be a
 pretty awful hack/workaround).

Indeed ... 

   If you defined a dtpoisson() for the distribution of the
 truncated Poisson model, you could probably also use bbmle
 with the formula interface and the parameters argument.

This I was not aware of... Thank you for the tip !

By the way, I never delved into stats4, relying (erroneously) on its
one-liner description in CRAN, which is (more or less) an implementation
of stats with S4 classes. Therefore mle escaped me also... May I suggest
a better one-liner ? (This goes also for bbmle...)

   The likelihood ratio test seems absolutely appropriate for
 this case.  Why not?

Few data, therefore a bit far from the asymptotic condition...

Anyway, a better look at my data after discovering a bag'o bugs in the
original files led me to reconsider my model : I wanted to assess, among
other things, the effect of a boolean (really a two-class variable).
After the *right* graph, I now tend to think that my distribution is
indeed zero-deflated Poisson in one group and ... zero-deflated negative
binomial in the other (still might be zero-deflated Poisson with very
small mean, hard to say graphically...). Which gives me some insights on
the mechanisms at work (yippeee !!) but will require some nonparametric
beast for assessing central value differences (yes, this still has a
meaning...), such as bootstrap.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Modelling an incomplete Poisson distribution ?

2009-04-18 Thread Emmanuel Charpentier
Dear list,

I have the following problem : I want to model a series of observations
of a given hospital activity on various days under various conditions.
among my outcomes (dependent variables) is the number of patients for
which a certain procedure is done. The problem is that, when no relevant
patient is hospitalized on said day, there is no observation (for which
the number of patients item would be 0). 

My goal is to model this number of patients as a function of the
various conditions described by my independant variables, mosty of
them observed but uncontrolled, some of them unobservable (random
effects). I am tempted to model them along the lines of :

glm(NoP~X+Y+..., data=MyData, family=poisson(link=log))

or (accounting for some random effects) :

lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log))

While the preliminary analysis suggest that (the right part of) a
Poisson distribution might be reasonable for all real observations, the
lack of observations with count==0 bothers me.

Is there a way to cajole glm (and lmer, by the way) into modelling these
data to an incomplete Poisson model, i. e. with unobserved 0
values ?

Sincerely,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modelling an incomplete Poisson distribution ?

2009-04-18 Thread Emmanuel Charpentier
I forgot to add that yes, I've done my homework, and that it seems to me
that answers pointing to zero-inflated Poisson (and negative binomial)
are irrelevant ; I do not have a mixture of distributions but only part
of one distribution, or, if you'll have it, a zero-deflated Poisson.

An answer by Brian Ripley
(http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
question leaves me a bit dismayed : if it is easy to compute the
probability function of this zero-deflated RV (off the top of my head,
Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
(still) able to use optim to ML-estimate lambda, using this to
(correctly) model my problem set and test it amounts to re-writing some
(large) part of glm. Furthermore, I'd be a bit embarrassed to test it
cleanly (i. e. justifiably) : out of the top of my head, only the
likelihood ration test seems readily applicable to my problem. Testing
contrasts in my covariates ... hum !

So if someone has written something to that effect, I'd be awfully glad
to use it. A not-so-cursory look at the existing packages did not ring
any bells to my (admittedly untrained) ears...

Of course, I could also bootstrap the damn thing and study the
distribution of my contrasts. I'd still been hard pressed to formally
test hypotheses on factors...

Any ideas ?

Emmanuel Charpentier

Le samedi 18 avril 2009 à 19:28 +0200, Emmanuel Charpentier a écrit :
 Dear list,
 
 I have the following problem : I want to model a series of observations
 of a given hospital activity on various days under various conditions.
 among my outcomes (dependent variables) is the number of patients for
 which a certain procedure is done. The problem is that, when no relevant
 patient is hospitalized on said day, there is no observation (for which
 the number of patients item would be 0). 
 
 My goal is to model this number of patients as a function of the
 various conditions described by my independant variables, mosty of
 them observed but uncontrolled, some of them unobservable (random
 effects). I am tempted to model them along the lines of :
 
 glm(NoP~X+Y+..., data=MyData, family=poisson(link=log))
 
 or (accounting for some random effects) :
 
 lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log))
 
 While the preliminary analysis suggest that (the right part of) a
 Poisson distribution might be reasonable for all real observations, the
 lack of observations with count==0 bothers me.
 
 Is there a way to cajole glm (and lmer, by the way) into modelling these
 data to an incomplete Poisson model, i. e. with unobserved 0
 values ?
 
 Sincerely,
 
   Emmanuel Charpentier
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modelling an incomplete Poisson distribution ?

2009-04-18 Thread Ben Bolker
Emmanuel Charpentier charpent at bacbuc.dyndns.org writes:

 
 I forgot to add that yes, I've done my homework, and that it seems to me
 that answers pointing to zero-inflated Poisson (and negative binomial)
 are irrelevant ; I do not have a mixture of distributions but only part
 of one distribution, or, if you'll have it, a zero-deflated Poisson.
 
 An answer by Brian Ripley
 (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
 question leaves me a bit dismayed : if it is easy to compute the
 probability function of this zero-deflated RV (off the top of my head,
 Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
 (still) able to use optim to ML-estimate lambda, using this to
 (correctly) model my problem set and test it amounts to re-writing some
 (large) part of glm. Furthermore, I'd be a bit embarrassed to test it
 cleanly (i. e. justifiably) : out of the top of my head, only the
 likelihood ration test seems readily applicable to my problem. Testing
 contrasts in my covariates ... hum !
 
 So if someone has written something to that effect, I'd be awfully glad
 to use it. A not-so-cursory look at the existing packages did not ring
 any bells to my (admittedly untrained) ears...
 
 Of course, I could also bootstrap the damn thing and study the
 distribution of my contrasts. I'd still been hard pressed to formally
 test hypotheses on factors...
 

  I would call this a truncated Poisson distribution, related
to hurdle models.  You could probably use the hurdle function
in the pscl package to do this, by ignoring the fitting to
the zero part of the model.  On the other hand, it might break
if there are no zeros at all (adding some zeros would be a
pretty awful hack/workaround).

  If you defined a dtpoisson() for the distribution of the
truncated Poisson model, you could probably also use bbmle
with the formula interface and the parameters argument.

  The likelihood ratio test seems absolutely appropriate for
this case.  Why not?

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.