Re: [R] summary.manova rank deficiency error + data

Peter Dalgaard Wed, 13 Aug 2008 09:07:50 -0700

Pedro Mardones wrote:

Thanks for the reply. The SAS output is attached but seems to me that
doesn't correspond to the wihtin-row contrasts as you suggested. By
the way, yes the data are highly correlated, in fact each row
correspond to the first part of a signal vector. Thanks anyway....
PM

Agreed. I tried disabling the check that causes R to protest, and thenit gives similar DF but not quite the same statistics, quite possiblydue to numerical instabilities in one or both systems. (You can easilytry yourself, just do anova.mlm <- stats::anova.mlm and edit the qr()call inside.)


> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
Analysis of Variance Table

Df Wilks approx F num Df den Df Pr(>F)(Intercept) 1 0.002537 1887.24 5 24 <2e-16 ***GROUP 2 0.62 1.29 10 48 0.2616Residuals 28---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


                                      The GLM Procedure
                              Multivariate Analysis of Variance
                                    E = Error SSCP Matrix
                   y1                y2                y3
  y4                y5
 y1      0.0353518799       0.035256904      0.0351327804
0.0349749601      0.0347868018
 y2       0.035256904      0.0351627227      0.0350395053
0.0348827098      0.0346956744
 y3      0.0351327804      0.0350395053      0.0349173343
0.0347617352      0.0345760232
 y4      0.0349749601      0.0348827098      0.0347617352
0.0346075203      0.0344233531
 y5      0.0347868018      0.0346956744      0.0345760232
0.0344233531      0.0342409225

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
      DF = 28             y1             y2             y3
y4             y5
      y1            1.000000       0.999992       0.999967
0.999921       0.999852
                                     <.0001         <.0001
<.0001         <.0001
      y2            0.999992       1.000000       0.999991
0.999963       0.999911
                      <.0001                        <.0001
<.0001         <.0001
      y3            0.999967       0.999991       1.000000
0.999990       0.999958
                      <.0001         <.0001
<.0001         <.0001
      y4            0.999921       0.999963       0.999990
1.000000       0.999989
                      <.0001         <.0001         <.0001
           <.0001
      y5            0.999852       0.999911       0.999958
0.999989       1.000000
                      <.0001         <.0001         <.0001         <.0001

The SAS System     10:33 Wednesday, August 13, 2008   8
                                      The GLM Procedure
                              Multivariate Analysis of Variance
                              H = Type III SSCP Matrix for group
                   y1                y2                y3
  y4                y5
 y1      0.0023822408       0.002365848      0.0023471328
0.0023261249      0.0023030993
 y2       0.002365848      0.0023495679      0.0023309816
0.0023101183      0.0022872511
 y3      0.0023471328      0.0023309816      0.0023125426
0.0022918453      0.0022691608
 y4      0.0023261249      0.0023101183      0.0022918453
0.0022713359      0.0022488593
 y5      0.0023030993      0.0022872511      0.0022691608
0.0022488593      0.0022266141

                 Characteristic Roots and Vectors of: E Inverse * H, where
                              H = Type III SSCP Matrix for group
                                    E = Error SSCP Matrix
Characteristic           Characteristic Vector  V'EV=1
          Root  Percent            y1            y2            y3
      y4            y5
    0.41840103    71.72     -7542.628     17131.814      5347.394
-31627.317     16700.100
    0.16496011    28.28     -4180.854     -4413.446     32096.035
-35545.204     12040.697
    0.00000001     0.00    -41004.875    107291.004    -95905.664
32641.189     -3028.470
    0.00000000     0.00      -416.226      -111.206       410.721
 295.193      -171.953
    0.00000000     0.00    -14678.651      5787.997     54718.250
-69055.249     23218.580

   MANOVA Test Criteria and F Approximations for the Hypothesis of No
Overall group Effect
                             H = Type III SSCP Matrix for group
                                    E = Error SSCP Matrix
                                     S=2    M=1    N=11
       Statistic                        Value    F Value    Num DF
Den DF    Pr > F
       Wilks' Lambda               0.60518744       1.37        10
   48    0.2227
       Pillai's Trace              0.43658228       1.40        10
   50    0.2095
       Hotelling-Lawley Trace      0.58336114       1.37        10
33.362    0.2385
       Roy's Greatest Root         0.41840103       2.09         5
   25    0.1000










On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
<[EMAIL PROTECTED]> wrote:

Pedro Mardones wrote:

Dear R-users;

Previously I posted a question about the problem of rank deficiency in
summary.manova. As somebody suggested, I'm attaching a small part of
the data set.

#***************************************************

"test" <-

structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
levels = c("1", "2", "3"),
class = "factor")


,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,

0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,

0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)


,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,

0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,

0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)


,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,

0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,

0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)


,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,

0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,

0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)


,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,

0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,

0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
)
,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
,row.names = seq(1:31)
,class = "data.frame"
)

summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")

#Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),
 :
 residuals have rank 3 < 5

#***************************************************

What I don't understand is why SAS returns no errors using PROC GLM
for the same data set. Is because PROC GLM doesn't take into account
problems of rank deficiency? So, should I trust manova instead of PROC
GLM output? I know it can be a touchy question but I would like to
receive some insights.
Thanks
PM

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

What you have here is extremely correlated data:

(V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))

Y1 Y2 Y3 Y4 Y5
Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889

eigen(V)

$values
[1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12

$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
[2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
[3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
[4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
[5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918

Notice the more than 9 orders of magnitude between the eigenvalues.

I think that what is happening is that what SAS calls MANOVA is actually
looking at within-row contrasts, which effectively removes the largest
eigenvalue. In R, the equivalent would be

anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")

Analysis of Variance Table


Contrasts orthogonal to
~1

Df Wilks approx F num Df den Df Pr(>F)
(Intercept) 1 0.037 164.873 4 25 <2e-16 ***
GROUP 2 0.701 1.215 8 50 0.3098
Residuals 28
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

or (this could be computationally more precice, but in fact it gives the
same result)

anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")

--
 O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
 c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])              FAX: (+45) 35327907



--
  O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
 c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] summary.manova rank deficiency error + data

Reply via email to