Re: [R] car::deltaMethod() fails when a particular combination of categorical variables is not present

2023-10-07 Thread Michael Cohn
Thank you very much, John. This has allowed us to move forward on several
fronts and better understand our data.

- Michael Cohn

On Tue, Sep 26, 2023 at 8:39 AM John Fox  wrote:

> Dear Michael,
>
> My previous response was inaccurate: First, linearHypothesis() *is* able
> to accommodate aliased coefficients by setting the argument singular.ok
> = TRUE:
>
>  > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0",
> +  singular.ok=TRUE)
>
> Linear hypothesis test:
> bt2  + csent  + bt2:csent = 0
>
> Model 1: restricted model
> Model 2: a ~ b * c
>
>Res.DfRSS Df Sum of Sq  F Pr(>F)
> 1 16 9392.1
> 2 15 9266.4  1125.67 0.2034 0.6584
>
> Moreover, when there is an empty cell, this F-test is (for a reason that
> I haven't worked out, but is almost surely due to how the rank-deficient
> model is parametrized) *not* equivalent to the t-test for the
> corresponding coefficient in the raveled version of the two factors:
>
>  > df$bc <- factor(with(df, paste(b, c, sep=":")))
>  > m <- lm(a ~ bc, data=df)
>  > summary(m)
>
> Call:
> lm(formula = a ~ bc, data = df)
>
> Residuals:
>  Min  1Q  Median  3Q Max
> -57.455 -11.750   0.439  14.011  37.545
>
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept)20.50  17.57   1.166   0.2617
> bct1:unsent37.50  24.85   1.509   0.1521
> bct2:other 32.00  24.85   1.287   0.2174
> bct2:sent  17.17  22.69   0.757   0.4610  <<< cf. F = 0.2034, p
> = 0.6584
> bct2:unsent38.95  19.11   2.039   0.0595
>
> Residual standard error: 24.85 on 15 degrees of freedom
> Multiple R-squared:  0.2613,Adjusted R-squared:  0.06437
> F-statistic: 1.327 on 4 and 15 DF,  p-value: 0.3052
>
> In the full-rank case, however, what I said is correct -- that is, the
> F-test for the 1 df hypothesis on the three coefficients is equivalent
> to the t-test for the corresponding coefficient when the two factors are
> raveled:
>
>  > linearHypothesis(minimal_model_fixed, "bt2 + csent + bt2:csent = 0")
>
> Linear hypothesis test:
> bt2  + csent  + bt2:csent = 0
>
> Model 1: restricted model
> Model 2: a ~ b * c
>
>Res.DfRSS Df Sum of Sq  F Pr(>F)
> 1 15 9714.5
> 2 14 9194.4  1520.08 0.7919 0.3886
>
>  > df_fixed$bc <- factor(with(df_fixed, paste(b, c, sep=":")))
>  > m <- lm(a ~ bc, data=df_fixed)
>  > summary(m)
>
> Call:
> lm(formula = a ~ bc, data = df_fixed)
>
> Residuals:
>  Min  1Q  Median  3Q Max
> -57.455 -11.750   0.167  14.011  37.545
>
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept)   64.000 25.627   2.497   0.0256
> bct1:sent-43.500 31.387  -1.386   0.1874
> bct1:unsent  -12.000 36.242  -0.331   0.7455
> bct2:other   -11.500 31.387  -0.366   0.7195
> bct2:sent-26.333 29.591  -0.890   0.3886 << cf.
> bct2:unsent   -4.545 26.767  -0.170   0.8676
>
> Residual standard error: 25.63 on 14 degrees of freedom
> Multiple R-squared:  0.2671,Adjusted R-squared:  0.005328
> F-statistic:  1.02 on 5 and 14 DF,  p-value: 0.4425
>
> So, to summarize:
>
> (1) You can use linearHypothesis() with singular.ok=TRUE to test the
> hypothesis that you specified, though I suspect that this hypothesis
> probably isn't testing what you think in the rank-deficient case. I
> suspect that the hypothesis that you want to test is obtained by
> raveling the two factors.
>
> (2) There is no reason to use deltaMethod() for a linear hypothesis, but
> there is also no intrinsic reason that deltaMethod() shouldn't be able
> to handle a rank-deficient model. We'll probably fix that.
>
> My apologies for the confusion,
>   John
>
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://www.john-fox.ca/
>
> On 2023-09-26 9:49 a.m., John Fox wrote:
> > Caution: External email.
> >
> >
> > Dear Michael,
> >
> > You're testing a linear hypothesis, so there's no need to use the delta
> > method, but the linearHypothesis() function in the car package also
> > fails in your case:
> >
> >  > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0")
> > Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent =
> > 0") :
> > there are aliased coefficients in the model.
> >
> > One work-around is to ravel the two factors into a single factor with 5
> > levels:
> >
> >  > df$bc <- factor(with(df, paste(b, c, sep=":")))
> >  > df$bc
> >   [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent   t2:unsent
> >   [7] t2:unsent t1:sent   t2:unsent t2:unsent t2:other  t2:unsent
> > [13] t1:unsent t1:sent   t2:unsent t2:other  t1:unsent t2:sent
> > [19] t2:sent   t2:unsent
> > Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent
> >
> >  > m <- lm(a ~ bc, data=df)
> >  > summary(m)
> >
> > Call:
> > lm(formula = a ~ bc, data = df)
> >
> > Residuals:
> >  Min  1Q  Median  3Q Max
> > -57.455 -11.750   0.439  14.011  37.545
> >
> > 

Re: [R] car::deltaMethod() fails when a particular combination of categorical variables is not present

2023-09-26 Thread John Fox

Dear Michael,

My previous response was inaccurate: First, linearHypothesis() *is* able 
to accommodate aliased coefficients by setting the argument singular.ok 
= TRUE:


> linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0",
+  singular.ok=TRUE)

Linear hypothesis test:
bt2  + csent  + bt2:csent = 0

Model 1: restricted model
Model 2: a ~ b * c

  Res.DfRSS Df Sum of Sq  F Pr(>F)
1 16 9392.1
2 15 9266.4  1125.67 0.2034 0.6584

Moreover, when there is an empty cell, this F-test is (for a reason that 
I haven't worked out, but is almost surely due to how the rank-deficient 
model is parametrized) *not* equivalent to the t-test for the 
corresponding coefficient in the raveled version of the two factors:


> df$bc <- factor(with(df, paste(b, c, sep=":")))
> m <- lm(a ~ bc, data=df)
> summary(m)

Call:
lm(formula = a ~ bc, data = df)

Residuals:
Min  1Q  Median  3Q Max
-57.455 -11.750   0.439  14.011  37.545

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)20.50  17.57   1.166   0.2617
bct1:unsent37.50  24.85   1.509   0.1521
bct2:other 32.00  24.85   1.287   0.2174
bct2:sent  17.17  22.69   0.757   0.4610  <<< cf. F = 0.2034, p 
= 0.6584

bct2:unsent38.95  19.11   2.039   0.0595

Residual standard error: 24.85 on 15 degrees of freedom
Multiple R-squared:  0.2613,Adjusted R-squared:  0.06437
F-statistic: 1.327 on 4 and 15 DF,  p-value: 0.3052

In the full-rank case, however, what I said is correct -- that is, the 
F-test for the 1 df hypothesis on the three coefficients is equivalent 
to the t-test for the corresponding coefficient when the two factors are 
raveled:


> linearHypothesis(minimal_model_fixed, "bt2 + csent + bt2:csent = 0")

Linear hypothesis test:
bt2  + csent  + bt2:csent = 0

Model 1: restricted model
Model 2: a ~ b * c

  Res.DfRSS Df Sum of Sq  F Pr(>F)
1 15 9714.5
2 14 9194.4  1520.08 0.7919 0.3886

> df_fixed$bc <- factor(with(df_fixed, paste(b, c, sep=":")))
> m <- lm(a ~ bc, data=df_fixed)
> summary(m)

Call:
lm(formula = a ~ bc, data = df_fixed)

Residuals:
Min  1Q  Median  3Q Max
-57.455 -11.750   0.167  14.011  37.545

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   64.000 25.627   2.497   0.0256
bct1:sent-43.500 31.387  -1.386   0.1874
bct1:unsent  -12.000 36.242  -0.331   0.7455
bct2:other   -11.500 31.387  -0.366   0.7195
bct2:sent-26.333 29.591  -0.890   0.3886 << cf.
bct2:unsent   -4.545 26.767  -0.170   0.8676

Residual standard error: 25.63 on 14 degrees of freedom
Multiple R-squared:  0.2671,Adjusted R-squared:  0.005328
F-statistic:  1.02 on 5 and 14 DF,  p-value: 0.4425

So, to summarize:

(1) You can use linearHypothesis() with singular.ok=TRUE to test the 
hypothesis that you specified, though I suspect that this hypothesis 
probably isn't testing what you think in the rank-deficient case. I 
suspect that the hypothesis that you want to test is obtained by 
raveling the two factors.


(2) There is no reason to use deltaMethod() for a linear hypothesis, but 
there is also no intrinsic reason that deltaMethod() shouldn't be able 
to handle a rank-deficient model. We'll probably fix that.


My apologies for the confusion,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-09-26 9:49 a.m., John Fox wrote:

Caution: External email.


Dear Michael,

You're testing a linear hypothesis, so there's no need to use the delta
method, but the linearHypothesis() function in the car package also
fails in your case:

 > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0")
Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent = 
0") :

there are aliased coefficients in the model.

One work-around is to ravel the two factors into a single factor with 5
levels:

 > df$bc <- factor(with(df, paste(b, c, sep=":")))
 > df$bc
  [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent   t2:unsent
  [7] t2:unsent t1:sent   t2:unsent t2:unsent t2:other  t2:unsent
[13] t1:unsent t1:sent   t2:unsent t2:other  t1:unsent t2:sent
[19] t2:sent   t2:unsent
Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent

 > m <- lm(a ~ bc, data=df)
 > summary(m)

Call:
lm(formula = a ~ bc, data = df)

Residuals:
     Min  1Q  Median  3Q Max
-57.455 -11.750   0.439  14.011  37.545

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
(Intercept)    20.50  17.57   1.166   0.2617
bct1:unsent    37.50  24.85   1.509   0.1521
bct2:other 32.00  24.85   1.287   0.2174
bct2:sent  17.17  22.69   0.757   0.4610
bct2:unsent    38.95  19.11   2.039   0.0595

Residual standard error: 24.85 on 15 degrees of freedom
Multiple R-squared:  0.2613,    Adjusted R-squared:  0.06437
F-statistic: 1.327 on 4 and 15 DF,  p-value: 0.3052

Then the hypothesis is tested 

Re: [R] car::deltaMethod() fails when a particular combination of categorical variables is not present

2023-09-26 Thread John Fox

Dear Michael,

You're testing a linear hypothesis, so there's no need to use the delta 
method, but the linearHypothesis() function in the car package also 
fails in your case:


> linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0")
Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent = 0") :
there are aliased coefficients in the model.

One work-around is to ravel the two factors into a single factor with 5 
levels:


> df$bc <- factor(with(df, paste(b, c, sep=":")))
> df$bc
 [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent   t2:unsent
 [7] t2:unsent t1:sent   t2:unsent t2:unsent t2:other  t2:unsent
[13] t1:unsent t1:sent   t2:unsent t2:other  t1:unsent t2:sent
[19] t2:sent   t2:unsent
Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent

> m <- lm(a ~ bc, data=df)
> summary(m)

Call:
lm(formula = a ~ bc, data = df)

Residuals:
Min  1Q  Median  3Q Max
-57.455 -11.750   0.439  14.011  37.545

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)20.50  17.57   1.166   0.2617
bct1:unsent37.50  24.85   1.509   0.1521
bct2:other 32.00  24.85   1.287   0.2174
bct2:sent  17.17  22.69   0.757   0.4610
bct2:unsent38.95  19.11   2.039   0.0595

Residual standard error: 24.85 on 15 degrees of freedom
Multiple R-squared:  0.2613,Adjusted R-squared:  0.06437
F-statistic: 1.327 on 4 and 15 DF,  p-value: 0.3052

Then the hypothesis is tested directly by the t-value for the 
coefficient bct2:sent.


I hope that this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-09-26 1:12 a.m., Michael Cohn wrote:

Caution: External email.


I'm running a linear regression with two categorical predictors and their
interaction. One combination of levels does not occur in the data, and as
expected, no parameter is estimated for it. I now want to significance test
a particular combination of levels that does occur in the data (ie, I want
to get a confidence interval for the total prediction at given levels of
each variable).

In the past I've done this using car::deltaMethod() but in this dataset
that does not work, as shown in the example below: The regression model
gives the expected output, but deltaMethod() gives this error:

error in t(gd) %*% vcov. : non-conformable arguments

I believe this is because there is no parameter estimate for when the
predictors have the values 't1' and 'other'. In the df_fixed dataframe,
putting one person into that combination of categories causes deltaMethod()
to work as expected.

I don't know of any theoretical reason that missing one interaction
parameter estimate should prevent getting a confidence interval for a
different combination of predictors. Is there a way to use deltaMethod() or
some other function to do this without changing my data?

Thank you,

- Michael Cohn
Vote Rev (http://voterev.org)


Demonstration:
--

library(car)
# create dataset with outcome and two categorical predictors
outcomes <- c(91,2,60,53,38,78,48,33,97,41,64,84,64,8,66,41,52,18,57,34)
persontype <-
c("t2","t2","t2","t2","t2","t2","t2","t1","t2","t2","t2","t2","t1","t1","t2","t2","t1","t2","t2","t2")
arm_letter <-
c("unsent","unsent","unsent","unsent","sent","unsent","unsent","sent","unsent","unsent","other","unsent","unsent","sent","unsent","other","unsent","sent","sent","unsent")
df <- data.frame(a = outcomes, b=persontype, c=arm_letter)

# note: there are no records with the combination 't1' + 'other'
table(df$b,df$c)


#regression works as expected
minimal_formula <- formula("a ~ b*c")
minimal_model <- lm(minimal_formula, data=df)
summary(minimal_model)

#use deltaMethod() to get a prediction for individuals with the combination
'b2' and 'sent'
# deltaMethod() fails with "error in t(gd) %*% vcov. : non-conformable
arguments."
deltaMethod(minimal_model, "bt2 + csent + `bt2:csent`", rhs=0)

# duplicate the dataset and change one record to be in the previously empty
cell
df_fixed <- df
df_fixed[c(13),"c"] <- 'other'
table(df_fixed$b,df_fixed$c)

#deltaMethod() now works
minimal_model_fixed <- lm(minimal_formula, data=df_fixed)
deltaMethod(minimal_model_fixed, "bt2 + csent + `bt2:csent`", rhs=0)

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] car::deltaMethod() fails when a particular combination of categorical variables is not present

2023-09-26 Thread Michael Cohn
I'm running a linear regression with two categorical predictors and their
interaction. One combination of levels does not occur in the data, and as
expected, no parameter is estimated for it. I now want to significance test
a particular combination of levels that does occur in the data (ie, I want
to get a confidence interval for the total prediction at given levels of
each variable).

In the past I've done this using car::deltaMethod() but in this dataset
that does not work, as shown in the example below: The regression model
gives the expected output, but deltaMethod() gives this error:

error in t(gd) %*% vcov. : non-conformable arguments

I believe this is because there is no parameter estimate for when the
predictors have the values 't1' and 'other'. In the df_fixed dataframe,
putting one person into that combination of categories causes deltaMethod()
to work as expected.

I don't know of any theoretical reason that missing one interaction
parameter estimate should prevent getting a confidence interval for a
different combination of predictors. Is there a way to use deltaMethod() or
some other function to do this without changing my data?

Thank you,

- Michael Cohn
Vote Rev (http://voterev.org)


Demonstration:
--

library(car)
# create dataset with outcome and two categorical predictors
outcomes <- c(91,2,60,53,38,78,48,33,97,41,64,84,64,8,66,41,52,18,57,34)
persontype <-
c("t2","t2","t2","t2","t2","t2","t2","t1","t2","t2","t2","t2","t1","t1","t2","t2","t1","t2","t2","t2")
arm_letter <-
c("unsent","unsent","unsent","unsent","sent","unsent","unsent","sent","unsent","unsent","other","unsent","unsent","sent","unsent","other","unsent","sent","sent","unsent")
df <- data.frame(a = outcomes, b=persontype, c=arm_letter)

# note: there are no records with the combination 't1' + 'other'
table(df$b,df$c)


#regression works as expected
minimal_formula <- formula("a ~ b*c")
minimal_model <- lm(minimal_formula, data=df)
summary(minimal_model)

#use deltaMethod() to get a prediction for individuals with the combination
'b2' and 'sent'
# deltaMethod() fails with "error in t(gd) %*% vcov. : non-conformable
arguments."
deltaMethod(minimal_model, "bt2 + csent + `bt2:csent`", rhs=0)

# duplicate the dataset and change one record to be in the previously empty
cell
df_fixed <- df
df_fixed[c(13),"c"] <- 'other'
table(df_fixed$b,df_fixed$c)

#deltaMethod() now works
minimal_model_fixed <- lm(minimal_formula, data=df_fixed)
deltaMethod(minimal_model_fixed, "bt2 + csent + `bt2:csent`", rhs=0)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.