[R] missing level of a nested factor results in an NA in lm output

Timothy Clough Mon, 21 Sep 2009 11:20:47 -0700

Dear All,

I have posted to this list before regarding the same issue so Iapologize for the multiple e-mails. I am still struggling with thisissue so I thought I'd give it another try. This time I have includedreproducible code and a subset of the data I am analyzing.

I am running an ANOVA with three factors: GROUP (5 levels), FEATURE(2 levels), and PATIENT (2 levels), where PATIENT is nested withinGROUP. I am interested in estimating various linear functions of themodel coefficients (which I sometimes refer to as 'contrasts' below).An example of the data can be set up using the following code:


example <- data.frame(
                ABUNDANCE = rnorm(30, 12),
                FEATURE = factor(rep(c(3218, 4227, 6374), 10)),
                GROUP = factor(rep(c(0, 1, 2, 3, 4), 6)),
                PATIENT = factor(rep(c(1, 2), 15))
)

I am using the lm function to run the model as shown.

fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT,example)

summary(fit)

The output of this code is below.

> fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT, example)

> summary(fit)

Call:

lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT,

   data = example)

Residuals:
      Min         1Q     Median         3Q        Max
-5.510e-01 -1.961e-01  3.469e-17  1.961e-01  5.510e-01

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)         12.6487     0.4037  31.331 2.58e-11 ***
FEATURE4227         -0.3271     0.4944  -0.661  0.52325
FEATURE6374         -1.0743     0.4944  -2.173  0.05492 .
GROUP1              -1.2641     0.5709  -2.214  0.05120 .
GROUP2              -0.2359     0.5709  -0.413  0.68825
GROUP3              -1.3081     0.5709  -2.291  0.04492 *
GROUP4              -0.5867     0.5709  -1.028  0.32836
FEATURE4227:GROUP1   1.5651     0.6993   2.238  0.04915 *
FEATURE6374:GROUP1   1.1435     0.6993   1.635  0.13302
FEATURE4227:GROUP2   0.4372     0.6993   0.625  0.54577
FEATURE6374:GROUP2   0.6728     0.6993   0.962  0.35867
FEATURE4227:GROUP3   1.0135     0.6993   1.449  0.17786
FEATURE6374:GROUP3   2.2665     0.6993   3.241  0.00885 **
FEATURE4227:GROUP4   1.3278     0.6993   1.899  0.08679 .
FEATURE6374:GROUP4   0.5610     0.6993   0.802  0.44103
GROUP0:PATIENT2     -0.5569     0.4037  -1.379  0.19785
GROUP1:PATIENT2     -0.1104     0.4037  -0.273  0.79014
GROUP2:PATIENT2     -0.9702     0.4037  -2.403  0.03712 *
GROUP3:PATIENT2     -0.1400     0.4037  -0.347  0.73586
GROUP4:PATIENT2     -0.5947     0.4037  -1.473  0.17147
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4944 on 10 degrees of freedom
Multiple R-squared: 0.8004,     Adjusted R-squared: 0.4211
F-statistic:  2.11 on 19 and 10 DF,  p-value: 0.1133

I then use the estimable function to estimate a linear combination ofthe parameter estimates.


myEstimate <- cbind(
                        '(Intercept)' = 1,
                        'GROUP1' = 1,
                        'FEATURE4227:GROUP1' = 0.5,
                        'FEATURE6374:GROUP1' = 0.5,
                        'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"

> estimable(fit, myEstimate)
    Estimate Std. Error  t value DF     Pr(>|t|)
test 12.18198  0.6694812 18.19615 10 5.395944e-09

I am able to get the t-statistic and associated p-value for thecontrast as desired. However, I sometimes have a case where there isa missing patient within one of the groups. To give an example ofthis situation, I remove patient 2 from group 4 and perform the sameanalysis.


example2 <- example[!(example$GROUP == 4 & example$PATIENT == 2),]

> fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT, example2)

> summary(fit)

Call:

lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT,

   data = example2)

Residuals:
      Min         1Q     Median         3Q        Max
-5.510e-01 -2.084e-01  6.099e-20  2.084e-01  5.510e-01

Coefficients: (1 not defined because of singularities)
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)         12.6487     0.4442  28.475  2.5e-09 ***
FEATURE4227         -0.3271     0.5440  -0.601   0.5644
FEATURE6374         -1.0743     0.5440  -1.975   0.0837 .
GROUP1              -1.2641     0.6282  -2.012   0.0790 .
GROUP2              -0.2359     0.6282  -0.375   0.7171
GROUP3              -1.3081     0.6282  -2.082   0.0709 .
GROUP4              -0.4570     0.7023  -0.651   0.5335
FEATURE4227:GROUP1   1.5651     0.7694   2.034   0.0764 .
FEATURE6374:GROUP1   1.1435     0.7694   1.486   0.1755
FEATURE4227:GROUP2   0.4372     0.7694   0.568   0.5854
FEATURE6374:GROUP2   0.6728     0.7694   0.874   0.4074
FEATURE4227:GROUP3   1.0135     0.7694   1.317   0.2242
FEATURE6374:GROUP3   2.2665     0.7694   2.946   0.0185 *
FEATURE4227:GROUP4   1.2146     0.9423   1.289   0.2334
FEATURE6374:GROUP4   0.2850     0.9423   0.302   0.7700
GROUP0:PATIENT2     -0.5569     0.4442  -1.254   0.2454
GROUP1:PATIENT2     -0.1104     0.4442  -0.248   0.8101
GROUP2:PATIENT2     -0.9702     0.4442  -2.184   0.0605 .
GROUP3:PATIENT2     -0.1400     0.4442  -0.315   0.7606
GROUP4:PATIENT2          NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.544 on 8 degrees of freedom
Multiple R-squared: 0.7852,     Adjusted R-squared: 0.3019
F-statistic: 1.625 on 18 and 8 DF,  p-value: 0.2462

As you can see there are now NAs in the row corresponding to theremoved patient. I still want to estimate my contrast so I run thesame code as before, but I receive the following error messages:


myEstimate <- cbind(
                        '(Intercept)' = 1,
                        'GROUP1' = 1,
                        'FEATURE4227:GROUP1' = 0.5,
                        'FEATURE6374:GROUP1' = 0.5,
                        'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"

> estimable(fit, myEstimate)
Error in estimable.default(fit, myEstimate) :

Dimension of structure(c(1, 0, 0, 1, 0, 0, 0, 0.5, 0.5, 0, 0, 0, 0, 0,0, : 1x20, not compatible with no of parameters in fit: 19Dimension of 1, 0, 0, 0, 0), .Dim = c(1L, 20L), .Dimnames =list("test", c("(Intercept)", : 1x20, not compatible with no ofparameters in fit: 19Dimension of "FEATURE4227", "FEATURE6374", "GROUP1", "GROUP2","GROUP3", "GROUP4", : 1x20, not compatible with no of parameters infit: 19Dimension of "FEATURE4227:GROUP1", "FEATURE6374:GROUP1","FEATURE4227:GROUP2", : 1x20, not compatible with no of parameters infit: 19Dimension of "FEATURE6374:GROUP2", "FEATURE4227:GROUP3","FEATURE6374:GROUP3", : 1x20, not compatible with no of parameters infit: 19Dimension of "FEATURE4227:GROUP4", "FEATURE6374:GROUP4","GROUP0:PATIENT2", : 1x20, not compatible with no of parameters infit: 19Dimension of "GROUP1:PATIENT2", "GROUP2:PATIENT2", "GROUP3:PATIENT2","GROUP4:PATIENT2": 1x20, not compatible with no of parameters in fit: 19

Dimension of ))): 1x2

Apparently the NA in the parameter estimate table is preventing theestimation of the contrast. I have tried multiple alternatives in thecoding of the estimable statement, but all give me errors similar tothe one above.

I think the problem is that R expects to see each patient within eachgroup and when it doesn't see patient 2 in group 4, it yields an NAparameter estimate for this combination. That's fine - the results ofthe other parameter estimates are still correct. However it'spreventing me from estimating a linear function of the modelcoefficients, and I know that there must be a way to do this even inthe presence of the missing combination.


Has anyone run into this problem before?

Tim

--
Timothy Clough
Ph.D. Student
Department of Statistics
Purdue University
765-496-7263

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] missing level of a nested factor results in an NA in lm output

Reply via email to