Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-28 Thread John Fox

Dear Jeff,

On 2022-07-28 11:12 a.m., Jeff Newmiller wrote:

No, in this case I think I needed the "obvious" breakdown. Still digesting, 
though... I would prefer that if an arbitrary selection had been made that it be explicit 
.. the NA should be replaced with zero if the singular.ok argument is TRUE, rather than 
making that interpretation in predict.glm.


That's one way to think about, but another is that the model matrix X 
has 10 columns but is of rank 9. Thus 9 basis vectors are needed to span 
the column space of X, and a simple way to provide a basis is to 
eliminate a redundant column, hence the NA. The fitted values y-hat in a 
linear model are the orthogonal projection of y onto the space spanned 
by the columns of X, and are thus independent of the basis chosen. A GLM 
is a little more complicated, but it's still the column space of X 
that's important.


Best,
 John



On July 28, 2022 5:45:35 AM PDT, John Fox  wrote:

Dear Jeff,

On 2022-07-28 1:31 a.m., Jeff Newmiller wrote:

But "disappearing" is not what NA is supposed to do normally. Why is it being 
treated that way here?


NA has a different meaning here than in data.

By default, in glm() the argument singular.ok is TRUE, and so estimates are 
provided even when there are singularities, and even though the singularities 
are resolved arbitrarily.

In this model, the columns of the model matrix labelled LifestageL1 and 
TrtTime:LifestageL1 are perfectly collinear -- the second is 12 times the first 
(both have 0s in the same rows and either 1 or 12 in three of the rows) -- and 
thus both can't be estimated simultaneously, but the model can be estimated by 
eliminating one or the other (effectively setting its coefficient to 0), or by 
taking any linear combination of the two regressors (i.e., using any regressor 
with 0s and some other value). The fitted values under the model are invariant 
with respect to this arbitrary choice.

My apologies if I'm stating the obvious and misunderstand your objection.

Best,
John



On July 27, 2022 7:04:20 PM PDT, John Fox  wrote:

Dear Rolf,

The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) and by 
setting it to NA, glm() effectively removes it from the model. An equivalent 
model is therefore


fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +

+   I((Lifestage == "Egg + L1")*TrtTime) +
+   I((Lifestage == "L1 + L2")*TrtTime) +
+   I((Lifestage == "L3")*TrtTime),
+ family=binomial, data=demoDat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred


cbind(coef(fit, complete=FALSE), coef(fit2))

   [,1] [,2]
(Intercept)-0.91718302  -0.91718302
TrtTime 0.88846195   0.88846195
LifestageEgg + L1 -45.36420974 -45.36420974
LifestageL114.27570572  14.27570572
LifestageL1 + L2   -0.30332697  -0.30332697
LifestageL3-3.58672631  -3.58672631
TrtTime:LifestageEgg + L1   8.10482459   8.10482459
TrtTime:LifestageL1 + L20.05662651   0.05662651
TrtTime:LifestageL3 1.66743472   1.66743472

There is no problem computing fitted values for the model, specified either way. That the 
fitted values when Lifestage == "L1" all round to 1 on the probability scale is 
coincidental -- that is, a consequence of the data.

I hope this helps,
John

On 2022-07-27 8:26 p.m., Rolf Turner wrote:


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

   fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
  data=demoDat)

I got:


as.matrix(coef(fit))
 [,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472


That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:


predict(fit)[demoDat$Lifestage=="L1"]
 26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
26  65 131
 1   1   1


That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that

Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-28 Thread Jeff Newmiller
No, in this case I think I needed the "obvious" breakdown. Still digesting, 
though... I would prefer that if an arbitrary selection had been made that it 
be explicit .. the NA should be replaced with zero if the singular.ok argument 
is TRUE, rather than making that interpretation in predict.glm.

On July 28, 2022 5:45:35 AM PDT, John Fox  wrote:
>Dear Jeff,
>
>On 2022-07-28 1:31 a.m., Jeff Newmiller wrote:
>> But "disappearing" is not what NA is supposed to do normally. Why is it 
>> being treated that way here?
>
>NA has a different meaning here than in data.
>
>By default, in glm() the argument singular.ok is TRUE, and so estimates are 
>provided even when there are singularities, and even though the singularities 
>are resolved arbitrarily.
>
>In this model, the columns of the model matrix labelled LifestageL1 and 
>TrtTime:LifestageL1 are perfectly collinear -- the second is 12 times the 
>first (both have 0s in the same rows and either 1 or 12 in three of the rows) 
>-- and thus both can't be estimated simultaneously, but the model can be 
>estimated by eliminating one or the other (effectively setting its coefficient 
>to 0), or by taking any linear combination of the two regressors (i.e., using 
>any regressor with 0s and some other value). The fitted values under the model 
>are invariant with respect to this arbitrary choice.
>
>My apologies if I'm stating the obvious and misunderstand your objection.
>
>Best,
> John
>
>> 
>> On July 27, 2022 7:04:20 PM PDT, John Fox  wrote:
>>> Dear Rolf,
>>> 
>>> The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) and 
>>> by setting it to NA, glm() effectively removes it from the model. An 
>>> equivalent model is therefore
>>> 
 fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +
>>> +   I((Lifestage == "Egg + L1")*TrtTime) +
>>> +   I((Lifestage == "L1 + L2")*TrtTime) +
>>> +   I((Lifestage == "L3")*TrtTime),
>>> + family=binomial, data=demoDat)
>>> Warning message:
>>> glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> 
 cbind(coef(fit, complete=FALSE), coef(fit2))
>>>   [,1] [,2]
>>> (Intercept)-0.91718302  -0.91718302
>>> TrtTime 0.88846195   0.88846195
>>> LifestageEgg + L1 -45.36420974 -45.36420974
>>> LifestageL114.27570572  14.27570572
>>> LifestageL1 + L2   -0.30332697  -0.30332697
>>> LifestageL3-3.58672631  -3.58672631
>>> TrtTime:LifestageEgg + L1   8.10482459   8.10482459
>>> TrtTime:LifestageL1 + L20.05662651   0.05662651
>>> TrtTime:LifestageL3 1.66743472   1.66743472
>>> 
>>> There is no problem computing fitted values for the model, specified either 
>>> way. That the fitted values when Lifestage == "L1" all round to 1 on the 
>>> probability scale is coincidental -- that is, a consequence of the data.
>>> 
>>> I hope this helps,
>>> John
>>> 
>>> On 2022-07-27 8:26 p.m., Rolf Turner wrote:
 
 I have a data frame with a numeric ("TrtTime") and a categorical
 ("Lifestage") predictor.
 
 Level "L1" of Lifestage occurs only with a single value of TrtTime,
 explicitly 12, whence it is not possible to estimate a TrtTime "slope"
 when Lifestage is "L1".
 
 Indeed, when I fitted the model
 
   fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
  data=demoDat)
 
 I got:
 
> as.matrix(coef(fit))
> [,1]
> (Intercept)-0.91718302
> TrtTime 0.88846195
> LifestageEgg + L1 -45.36420974
> LifestageL114.27570572
> LifestageL1 + L2   -0.30332697
> LifestageL3-3.58672631
> TrtTime:LifestageEgg + L1   8.10482459
> TrtTime:LifestageL1 NA
> TrtTime:LifestageL1 + L20.05662651
> TrtTime:LifestageL3 1.66743472
 
 That is, TrtTime:LifestageL1 is NA, as expected.
 
 I would have thought that fitted or predicted values corresponding to
 Lifestage = "L1" would thereby be NA, but this is not the case:
 
> predict(fit)[demoDat$Lifestage=="L1"]
> 26   65  131
> 24.02007 24.02007 24.02007
> 
> fitted(fit)[demoDat$Lifestage=="L1"]
>26  65 131
> 1   1   1
 
 That is, the predicted values on the scale of the linear predictor are
 large and positive, rather than being NA.
 
 What this amounts to, it seems to me, is saying that if the linear
 predictor in a Binomial glm is NA, then "success" is a certainty.
 This strikes me as being a dubious proposition.  My gut feeling is that
 misleading results could be produced.
 
 Can anyone explain to me a rationale for this behaviour pattern?
 Is there some justification for it that I am not currently seeing?

Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-28 Thread John Fox

Dear Jeff,

On 2022-07-28 1:31 a.m., Jeff Newmiller wrote:

But "disappearing" is not what NA is supposed to do normally. Why is it being 
treated that way here?


NA has a different meaning here than in data.

By default, in glm() the argument singular.ok is TRUE, and so estimates 
are provided even when there are singularities, and even though the 
singularities are resolved arbitrarily.


In this model, the columns of the model matrix labelled LifestageL1 and 
TrtTime:LifestageL1 are perfectly collinear -- the second is 12 times 
the first (both have 0s in the same rows and either 1 or 12 in three of 
the rows) -- and thus both can't be estimated simultaneously, but the 
model can be estimated by eliminating one or the other (effectively 
setting its coefficient to 0), or by taking any linear combination of 
the two regressors (i.e., using any regressor with 0s and some other 
value). The fitted values under the model are invariant with respect to 
this arbitrary choice.


My apologies if I'm stating the obvious and misunderstand your objection.

Best,
 John



On July 27, 2022 7:04:20 PM PDT, John Fox  wrote:

Dear Rolf,

The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) and by 
setting it to NA, glm() effectively removes it from the model. An equivalent 
model is therefore


fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +

+   I((Lifestage == "Egg + L1")*TrtTime) +
+   I((Lifestage == "L1 + L2")*TrtTime) +
+   I((Lifestage == "L3")*TrtTime),
+ family=binomial, data=demoDat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred


cbind(coef(fit, complete=FALSE), coef(fit2))

  [,1] [,2]
(Intercept)-0.91718302  -0.91718302
TrtTime 0.88846195   0.88846195
LifestageEgg + L1 -45.36420974 -45.36420974
LifestageL114.27570572  14.27570572
LifestageL1 + L2   -0.30332697  -0.30332697
LifestageL3-3.58672631  -3.58672631
TrtTime:LifestageEgg + L1   8.10482459   8.10482459
TrtTime:LifestageL1 + L20.05662651   0.05662651
TrtTime:LifestageL3 1.66743472   1.66743472

There is no problem computing fitted values for the model, specified either way. That the 
fitted values when Lifestage == "L1" all round to 1 on the probability scale is 
coincidental -- that is, a consequence of the data.

I hope this helps,
John

On 2022-07-27 8:26 p.m., Rolf Turner wrote:


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

  fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
 data=demoDat)

I got:


as.matrix(coef(fit))
[,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472


That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:


predict(fit)[demoDat$Lifestage=="L1"]
26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
   26  65 131
1   1   1


That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

  demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, 

Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-27 Thread Jeff Newmiller
But "disappearing" is not what NA is supposed to do normally. Why is it being 
treated that way here?

On July 27, 2022 7:04:20 PM PDT, John Fox  wrote:
>Dear Rolf,
>
>The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) and by 
>setting it to NA, glm() effectively removes it from the model. An equivalent 
>model is therefore
>
>> fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +
>+   I((Lifestage == "Egg + L1")*TrtTime) +
>+   I((Lifestage == "L1 + L2")*TrtTime) +
>+   I((Lifestage == "L3")*TrtTime),
>+ family=binomial, data=demoDat)
>Warning message:
>glm.fit: fitted probabilities numerically 0 or 1 occurred
>
>> cbind(coef(fit, complete=FALSE), coef(fit2))
>  [,1] [,2]
>(Intercept)-0.91718302  -0.91718302
>TrtTime 0.88846195   0.88846195
>LifestageEgg + L1 -45.36420974 -45.36420974
>LifestageL114.27570572  14.27570572
>LifestageL1 + L2   -0.30332697  -0.30332697
>LifestageL3-3.58672631  -3.58672631
>TrtTime:LifestageEgg + L1   8.10482459   8.10482459
>TrtTime:LifestageL1 + L20.05662651   0.05662651
>TrtTime:LifestageL3 1.66743472   1.66743472
>
>There is no problem computing fitted values for the model, specified either 
>way. That the fitted values when Lifestage == "L1" all round to 1 on the 
>probability scale is coincidental -- that is, a consequence of the data.
>
>I hope this helps,
> John
>
>On 2022-07-27 8:26 p.m., Rolf Turner wrote:
>> 
>> I have a data frame with a numeric ("TrtTime") and a categorical
>> ("Lifestage") predictor.
>> 
>> Level "L1" of Lifestage occurs only with a single value of TrtTime,
>> explicitly 12, whence it is not possible to estimate a TrtTime "slope"
>> when Lifestage is "L1".
>> 
>> Indeed, when I fitted the model
>> 
>>  fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
>> data=demoDat)
>> 
>> I got:
>> 
>>> as.matrix(coef(fit))
>>>[,1]
>>> (Intercept)-0.91718302
>>> TrtTime 0.88846195
>>> LifestageEgg + L1 -45.36420974
>>> LifestageL114.27570572
>>> LifestageL1 + L2   -0.30332697
>>> LifestageL3-3.58672631
>>> TrtTime:LifestageEgg + L1   8.10482459
>>> TrtTime:LifestageL1 NA
>>> TrtTime:LifestageL1 + L20.05662651
>>> TrtTime:LifestageL3 1.66743472
>> 
>> That is, TrtTime:LifestageL1 is NA, as expected.
>> 
>> I would have thought that fitted or predicted values corresponding to
>> Lifestage = "L1" would thereby be NA, but this is not the case:
>> 
>>> predict(fit)[demoDat$Lifestage=="L1"]
>>>26   65  131
>>> 24.02007 24.02007 24.02007
>>> 
>>> fitted(fit)[demoDat$Lifestage=="L1"]
>>>   26  65 131
>>>1   1   1
>> 
>> That is, the predicted values on the scale of the linear predictor are
>> large and positive, rather than being NA.
>> 
>> What this amounts to, it seems to me, is saying that if the linear
>> predictor in a Binomial glm is NA, then "success" is a certainty.
>> This strikes me as being a dubious proposition.  My gut feeling is that
>> misleading results could be produced.
>> 
>> Can anyone explain to me a rationale for this behaviour pattern?
>> Is there some justification for it that I am not currently seeing?
>> Any other comments?  (Please omit comments to the effect of "You are as
>> thick as two short planks!". :-) )
>> 
>> I have attached the example data set in a file "demoDat.txt", should
>> anyone want to experiment with it.  The file was created using dput() so
>> you should access it (if you wish to do so) via something like
>> 
>>  demoDat <- dget("demoDat.txt")
>> 
>> Thanks for any enlightenment.
>> 
>> cheers,
>> 
>> Rolf Turner
>> 
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-27 Thread Rolf Turner


On Thu, 28 Jul 2022 00:42:51 +
"Ebert,Timothy Aaron"  wrote:

> Time is often used in this sort of problem, but really time is not
> relevant. A better choice is accumulated thermal units. The insect
> will molt when X thermal units have been accumulated. This is often
> expressed as degree days, but could as easily be other units like
> degree seconds. However, I suspect that fine time units exceeds the
> accuracy of the measurement system. A growth chamber might maintain
> 28 C, but the temperature the insect experiences might be somewhat
> different thereby introducing additional variability in the outcome.
> No thermal units accumulated, no development, so that is not an
> issue. This approach allows one to predict life stage over a large
> temperature range. Accuracy can be improved if one knows the lower
> development threshold. At high temperatures development stops, and a
> mortality function can be added.

Very cogent comments in respect of dealing with the underlying
practical problem, but I am not so much concerned with the practical
problem at the moment but rather with the workings of the software that
I am using.

cheers,

Rolf

P.S.  I am at several removes from the data set(s) that I am messing
about with.  But if my understanding is correct (always an assumption
of which to be sceptical!) these data were collected with the
temperature being held *constant*, whence time and accumulated thermal
units would be equivalent.  Is it not so?

R.

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-27 Thread John Fox

Dear Rolf,

The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) 
and by setting it to NA, glm() effectively removes it from the model. An 
equivalent model is therefore


> fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +
+   I((Lifestage == "Egg + L1")*TrtTime) +
+   I((Lifestage == "L1 + L2")*TrtTime) +
+   I((Lifestage == "L3")*TrtTime),
+ family=binomial, data=demoDat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

> cbind(coef(fit, complete=FALSE), coef(fit2))
  [,1] [,2]
(Intercept)-0.91718302  -0.91718302
TrtTime 0.88846195   0.88846195
LifestageEgg + L1 -45.36420974 -45.36420974
LifestageL114.27570572  14.27570572
LifestageL1 + L2   -0.30332697  -0.30332697
LifestageL3-3.58672631  -3.58672631
TrtTime:LifestageEgg + L1   8.10482459   8.10482459
TrtTime:LifestageL1 + L20.05662651   0.05662651
TrtTime:LifestageL3 1.66743472   1.66743472

There is no problem computing fitted values for the model, specified 
either way. That the fitted values when Lifestage == "L1" all round to 1 
on the probability scale is coincidental -- that is, a consequence of 
the data.


I hope this helps,
 John

On 2022-07-27 8:26 p.m., Rolf Turner wrote:


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

 fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
data=demoDat)

I got:


as.matrix(coef(fit))
   [,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472


That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:


predict(fit)[demoDat$Lifestage=="L1"]
   26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
  26  65 131
   1   1   1


That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

 demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-27 Thread David Winsemius



On 7/27/22 17:26, Rolf Turner wrote:

I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

 fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
data=demoDat)

I got:


as.matrix(coef(fit))
   [,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472

That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:


predict(fit)[demoDat$Lifestage=="L1"]
   26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
  26  65 131
   1   1   1

That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.


The NA is most likely caused by aliasing, so some other combination of 
factors a perfect surrogate for every case with that level of the 
interaction. The `predict.glm` function always requires a complete set 
of values to construct a case. Whether apparent incremental linear 
prediction of that interaction term is large or small will depend on the 
degree of independent contribution of the surrogate levels of other 
variables..



David.



Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

 demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.