Hi Holger, I would also highly recommend you look at the ?boxcox and ?logtrans functions in the MASS package. There is also a very illuminating, concise discussion about their use on Pages 170 - 172 of
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. with example. Hope that helps, Bill On Sun, May 16, 2010 at 13:01, Peter Ehlers <ehl...@ucalgary.ca> wrote: > On 2010-05-16 6:22, Holger Steinmetz wrote: >> >> Dear experts, >> >> I tried to learn about Box-Cox-transformation but found the following >> thing: >> >> When I had to add a constant to make all values of the original variable >> positive, I found that >> the lambda estimates (box.cox.powers-function) differed dramatically >> depending on the specific constant chosen. > > Let's say that x is such that 1/x has a Normal distribution, > i.e. lambda = -1. > Then y = (1/x) + b also has a Normal distribution. > But you're expecting 1/(x+b) to also have a Normal distribution. > >> >> In addition, the correlation between the transformed variable and the >> original were not 1 (as I think it should be to use the transformed >> variable >> meaningfully) but much lower. > > Again, your expectation is faulty. The relationship between the > original and transformed variables is not linear (otherwise, > why do the transformation?), but cor() computes the Pearson > correlation coefficient by default. Try method='spearman'. > Better yet, plot the transformed variables vs the original > variable for further enlightenment. > > -Peter Ehlers > >> >> With higher added values (and a right skewed variable) the lambda estimate >> was even negative and the correlation between the transformed variable and >> the original varible was -.91!!? >> >> I guess that is something fundmental missing in my current thinking about >> box-cox... >> >> Best, >> Holger >> >> >> P.S. Here is what i did: >> >> # Creating of a skewed variable X (mixture of two normals) >> x1 = rnorm(120,0,.5) >> x2 = rnorm(40,2.5,2) >> X = c(x1,x2) >> >> # Adding a small constant >> Xnew1 = X +abs(min(X))+ .1 >> box.cox.powers(Xnew1) >> Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate) >> >> # Adding a larger constant >> Xnew2 = X +abs(min(X)) + 1 >> box.cox.powers(Xnew2) >> Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate) >> >> #Plotting it all >> par(mfrow=c(3,2)) >> hist(X) >> qqnorm(X) >> qqline(X,lty=2) >> hist(Xtrans1) >> qqnorm(Xtrans1) >> qqline(Xtrans1,lty=2) >> hist(Xtrans2) >> qqnorm(Xtrans2) >> qqline(Xtrans2,lty=2) >> >> #correlation among original and transformed variables >> round(cor(cbind(X,Xtrans1,Xtrans2)),2) > > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.