On Tue, Aug 23, 2011 at 7:54 AM, JC Matthews <j.c.matth...@bristol.ac.uk> wrote: > Thankyou for your replies, you've answered my question and given me more to > think on. I guess it is unwise to draw any conclusions from the > standardised results for these reasons.
No, by all means try to draw conclusions! Isn't that the point of the analysis in the first place? All I am (we are?) saying is that you need to do your homework and learn how to draw _appropriate_ conclusions from the analysis. Best, Ista > > James. > > --On 22 August 2011 17:30 +0100 ted.hard...@wlandres.net wrote: > >> On 22-Aug-11 15:37:40, JC Matthews wrote: >>> >>> Hello, >>> >>> I have a statistical problem that I am using R for, but I am >>> not making sense of the results. I am trying to use multiple >>> regression to explore which variables (weather conditions) >>> have the greater effect on a local atmospheric variable. >>> The data is taken from a database that has 20391 data points (Z1). >>> >>> A simplified version of the data I'm looking at is given below, >>> but I have a problem in that there is a disagreement in sign >>> between the regression coefficients and the standardised regression >>> coefficients. Intuitively I would expect both to be the same sign, >>> but in many of the parameters, they are not. >>> >>> I am aware that there is a strong opinion that using standardised >>> correlation coefficients is highly discouraged by some people, >>> but I would nevertheless like to see the results. Not least >>> because it has made me doubt the non-standardised values of B >>> that R has given me. >>> >>> The code I have used, and some of the data, is as follows (once >>> the database has been imported from SQL, and outliers removed). >>> >>> Z1sub <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)] >>> colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", >>> "mean1", "sd1" ) >>> >>> attach(Z1sub) >>> names(Z1sub) >>> >>> >>> Model1d <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) >>> ) >>> >>> summary(Model1d) >>> >>> Call: >>> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + >>> I(rain^2)) >>> >>> Residuals: >>> Min 1Q Median 3Q Max >>> -1230.64 -63.17 18.51 97.85 1275.73 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) -9.243e+02 5.689e+01 -16.246 < 2e-16 *** >>> hum 2.835e+01 1.468e+00 19.312 < 2e-16 *** >>> wind 1.236e+02 4.832e+00 25.587 < 2e-16 *** >>> rain -3.144e+03 7.635e+02 -4.118 3.84e-05 *** >>> I(hum^2) -1.953e-01 9.393e-03 -20.793 < 2e-16 *** >>> I(wind^2) 6.914e-01 2.174e-01 3.181 0.00147 ** >>> I(rain^2) 2.730e+02 3.265e+01 8.362 < 2e-16 *** >>> hum:wind -1.782e+00 5.448e-02 -32.706 < 2e-16 *** >>> hum:rain 2.798e+01 8.410e+00 3.327 0.00088 *** >>> wind:rain 6.018e+02 2.146e+02 2.805 0.00504 ** >>> hum:wind:rain -6.606e+00 2.401e+00 -2.751 0.00594 ** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 >>> ' ' 1 >>> >>> Residual standard error: 180.5 on 20337 degrees of freedom >>> Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 >>> F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 >>> >>> >>> >>> >>> >>> To calculate the standardised coefficients, I used the following: >>> >>> Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', >>> 'press', >>> 'rain', 's.rad', 'mean1', 'sd1' ) ] ) ) >>> >>> attach(Z1sub.scaled) >>> names(Z1sub.scaled) >>> >>> >>> Model1d.sc <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + >>> I(rain^2) ) >>> >>> summary(Model1d.scaled) >>> >>> Call: >>> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + >>> I(rain^2)) >>> >>> Residuals: >>> Min 1Q Median 3Q Max >>> -5.94713 -0.30527 0.08946 0.47287 6.16503 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 0.0806858 0.0096614 8.351 < 2e-16 *** >>> hum -0.4581509 0.0073456 -62.371 < 2e-16 *** >>> wind -0.1995316 0.0073767 -27.049 < 2e-16 *** >>> rain -0.1806894 0.0158037 -11.433 < 2e-16 *** >>> I(hum^2) -0.1120435 0.0053885 -20.793 < 2e-16 *** >>> I(wind^2) 0.0172870 0.0054346 3.181 0.00147 ** >>> I(rain^2) 0.0040575 0.0004853 8.362 < 2e-16 *** >>> hum:wind -0.2188729 0.0066659 -32.835 < 2e-16 *** >>> hum:rain 0.0267420 0.0146201 1.829 0.06740 . >>> wind:rain 0.0365615 0.0122335 2.989 0.00281 ** >>> hum:wind:rain -0.0438790 0.0159479 -2.751 0.00594 ** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 >>> ' ' 1 >>> >>> Residual standard error: 0.8723 on 20337 degrees of freedom >>> Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 >>> F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 >>> >>> >>> >>> So having, for instance for humidity (hum), B = 28.35 +/- 1.468, while >>> Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is >>> there >>> an error in my code that has caused this contradiction? >>> >>> Many thanks, >>> >>> James. >>> ---------------------- >>> JC Matthews >>> School of Chemistry >>> Bristol University >> >> Hi, >> without having your data, so unable to check, I would not be >> surprised if the changes of sign were the outcome of your model >> formula, in particular the 3-variable (2nd-order) interaction, >> i.e. you are using a model which is non-linear in the variables >> themselves. Let's just take that part of the model: >> >> lm(formula = mean1 ~ hum * wind * rain >> >> This, in its quantitative expression, expands to: >> >> mean1 = C0 + C11*hum + C12*wind + C13*rain >> + C21*hum*wind + C22*hum*rain + C23*wind*rain >> + C31*hum*wind*rain >> >> Suppose that is for the unstandardised variables. Now express >> it in terms of standardised variables (initial capital letters): >> >> mean1 = C0 + C11*sd(hum)*(Hum + mean(hum)/sd(hum)) >> + C12*sd(wind)*(Wind + mean(wind)/sd(wind)) >> + C13*sd(rain)*(Rain + mean(rain)/sd(rain)) >> >> + C21*sd(hum)*sd(wind)* >> (Hum + mean(hum)/sd(hum))*(Wind + mean(wind)/sd(wind)) >> >> + C22*sd(hum)*sd(rain)* >> (Hum + mean(hum)/sd(hum))*(Rain + mean(rain)/sd(rain)) >> >> + C23*sd(wind)*sd(rain)* >> (Wind + mean(wind)/sd(wind))* >> (Rain + mean(rain)/sd(rain)) >> >> + C31*sd(hum)*sd(wind)*sd(rain)* >> (Hum + mean(hum)/sd(hum))* >> (Wind + mean(wind)/sd(wind))* >> (Rain + mean(rain)/sd(rain)) >> >> Now pick out, say, the coefficient of 'Hum' in this latter expression >> (i.e. all the terms which involve 'Hum' but neither 'Wind' nor 'Rain'): >> >> C11*sd(hum) >> + C21*sd(hum)*sd(wind)*mean(wind)/sd(wind) >> + C22*sd(hum)*sd(rain)*mean(rain)/sd(rain) >> + C31*sd(hum)*sd(wind)*sd(rain)* >> (mean(wind)/sd(wind))*(mean(rain)/sd(rain)) >> >> = C11*sd(hum) >> + C21*sd(hum)*mean(wind) >> + C22*sd(hum)*mean(rain) >> + C31*sd(hum)*mean(wind)*mean(rain) >> >> So there is no reason to expect this to have even the same sign >> as the original C11, the coefficient of 'hum', let alone any more >> specific relationship with it! >> >> Hoping this helps, >> Ted. >> >> >> >> -------------------------------------------------------------------- >> E-Mail: (Ted Harding) <ted.hard...@wlandres.net> >> Fax-to-email: +44 (0)870 094 0861 >> Date: 22-Aug-11 Time: 17:30:29 >> ------------------------------ XFMail ------------------------------ > > > > ---------------------- > JC Matthews > Atmospheric Chemistry Research Group > School of Chemistry > Bristol University > j.c.matth...@bristol.ac.uk > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.