> Is it possible that multicollinearity can force a correlation that
> does not exist?
>
> I have a very large sample of n=5,000
> and have found that
>
> disease= exposure + exposure + exposure + exposure R^2=0.45
>
> where all 4 exposures are the exact same exposure in different units
> like ug/dL or mg/dL or molar units.
I liked Rich Ulrich's reply, but let me state it in more gentle terms. If you are confused by a complex model, take a step or two or three back and fit a simpler model.
Have you drawn a graph of disease versus exposure? Does it have the same shape when exposure is measured in different units?
Have you run a regression model with just one measure of exposure? What was your R^2?
When you tried a different measure of exposure, how did R^2 change? How did the other parameter estimates change?
Have you run a model with two measures of exposure--different units? Usually, you will get a result that is comparable to tossing a spoon down a garbage disposal. But the parts of the output that you can follow--what do they tell you? Has R^2 changed? Have any of the other parameter estimates changed?
Ideally, you should fit a model based on a theoretical understanding of the context of the problem. But if you want to just play around with data and randomly fit regression models, then you best bet is to start simple and then gradually work your way to more complex models.
That's actually a good rule to follow when you are not just messing around with data. Always go from the simple to the complex and try to identify and then understand what changes (or doesn't change) along the way.
Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
The STATS web page has moved to
http://www.childrens-mercy.org/stats