Re: [R] dealing with multicollinearity

John Sorkin Mon, 11 Apr 2005 05:57:51 -0700

Manuel,
The problem you describe does not sound like it is due to
multicolinearity. I state this because you variance inflation factor is
modest (1.1) and, more importantly, the correlation between your
independent variables (x1 and x2) is modest, -0.25. I suspect the
problem is due to one, or more, observations having a disproportionally
large influence on your coefficients. I suggest you plot your residuals
vs. predicted values. I would also do a formal analysis of the influence
each observation has on the reported coefficients. You might consider
computing Cook's distance for each observation.
 
I hope this has helped.
 
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
- NOTE NEW EMAIL ADDRESS:
[EMAIL PROTECTED]


>>> Manuel Gutierrez <[EMAIL PROTECTED]> 4/11/2005
6:22:55 AM >>>



I have a linear model y~x1+x2 of some data where the
coefficient for
x1 is higher than I would have expected from theory
(0.7 vs 0.88)
I wondered whether this would be an artifact due to x1
and x2 being correlated despite that the variance
inflation factor is not too high (1.065):
I used perturbation analysis to evaluate collinearity
library(perturb)
P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))
> summary(P)
Perturb variables:
x1         normal(0,1) 
x2         normal(0,1) 

Impact of perturbations on coefficients:
            mean     s.d.     min      max     
(Intercept)  -26.067    0.270  -27.235  -25.481
x1             0.726    0.025    0.672    0.882
x2             0.060    0.011    0.037    0.082

I get a mean for x1 of 0.726 which is closer to what
is expected.
I am not an statistical expert so I'd like to know if
my evaluation of the effects of collinearity is
correct and in that case any solutions to obtain a
reliable linear model.
Thanks,
Manuel

Some more detailed information:

> A<-lm(y~x1+x2)
> summary(A)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
      Min        1Q    Median        3Q       Max 
-4.221946 -0.484055 -0.004762  0.397508  2.542769 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -27.23472    0.27996 -97.282  < 2e-16 ***
x1            0.88202    0.02475  35.639  < 2e-16 ***
x2            0.08180    0.01239   6.604 2.53e-10 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1 

Residual standard error: 0.823 on 241 degrees of
freedom
Multiple R-Squared: 0.8411,    Adjusted R-squared: 0.8398

F-statistic: 637.8 on 2 and 241 DF,  p-value: <
2.2e-16 

> cor.test(x1,x2)

    Pearson's product-moment correlation

data:  x1 and x2 
t = -3.9924, df = 242, p-value = 8.678e-05
alternative hypothesis: true correlation is not equal
to 0 
95 percent confidence interval:
-0.3628424 -0.1269618 
sample estimates:
      cor 
-0.248584

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] dealing with multicollinearity

Reply via email to