Re: [R] Box-cox transformation
Dear Ravi, In my previous example, I used the residuals, so: sum [ (r_i / scaling)^2 ] If you want to use the deviance from glm, that gives you: sum [ r_i^2 ] and since the scaling factor is just a constant for any given lambda, then the modification would be: sum [ r_i^2 ] / ( scaling^2 ) and is given in the modified code below (posted back to R-help in case any else has this question). Hope this helps, Josh ## require(MASS) myp - function(y, lambda) (y^lambda-1)/lambda lambda - seq(-0.05, 0.45, len = 20) N - nrow(quine) res - matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c(Lambda, LL))) # scaling contant C - exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { SS - deviance(glm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL - (- (N/2) * log(SS/((C^lambda[i])^2))) res[i, ] - c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:57 PM, Ravi Varadhan ravi.varad...@jhu.edu wrote: Dear Josh, Thank you very much. I knew that the scaling had to be adjusted, but was not sure on how to do this. Can you please show me how to do this scaling with `glm'? In other words, how would I scale the deviance from glm? Thanks, Ravi -Original Message- From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] Sent: Sunday, July 06, 2014 11:34 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Box-cox transformation Hi Ravi, Deviance is the SS in this case, but you need a normalizing constant adjusted by the lambda to put them on the same scale. I modified your example below to simplify slightly and use the normalization (see the LL line). Cheers, Josh ## require(MASS) myp - function(y, lambda) (y^lambda-1)/lambda lambda - seq(-0.05, 0.45, len = 20) N - nrow(quine) res - matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c(Lambda, LL))) # scaling contant C - exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { r - resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL - (- (N/2) * log(sum((r/(C^lambda[i]))^2))) res[i, ] - c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan ravi.varad...@jhu.edu wrote: Hi, I am trying to do Box-Cox transformation, but I am not sure how to do it correctly. Here is an example showing what I am trying: # example from MASS require(MASS) boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = seq(-0.05, 0.45, len = 20)) # Here is My attempt at getting the profile likelihood for the Box-Cox parameter lam - seq(-0.05, 0.45, len = 20) dev - rep(NA, length=20) for (i in 1:20) { a - lam[i] ans - glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = quine) dev[i] - ans$deviance } plot(lam, dev, type=b, xlab=lambda, ylab=deviance) I am trying to create the profile likelihood for the Box-Cox parameter, but obviously I am not getting it right. I am not sure that ans$deviance is the right thing to do. I would appreciate any guidance. Thanks Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-cox transformation
Thank you. It is very helpful. Ravi -Original Message- From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] Sent: Monday, July 07, 2014 4:15 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Box-cox transformation Dear Ravi, In my previous example, I used the residuals, so: sum [ (r_i / scaling)^2 ] If you want to use the deviance from glm, that gives you: sum [ r_i^2 ] and since the scaling factor is just a constant for any given lambda, then the modification would be: sum [ r_i^2 ] / ( scaling^2 ) and is given in the modified code below (posted back to R-help in case any else has this question). Hope this helps, Josh ## require(MASS) myp - function(y, lambda) (y^lambda-1)/lambda lambda - seq(-0.05, 0.45, len = 20) N - nrow(quine) res - matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c(Lambda, LL))) # scaling contant C - exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { SS - deviance(glm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL - (- (N/2) * log(SS/((C^lambda[i])^2))) res[i, ] - c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:57 PM, Ravi Varadhan ravi.varad...@jhu.edu wrote: Dear Josh, Thank you very much. I knew that the scaling had to be adjusted, but was not sure on how to do this. Can you please show me how to do this scaling with `glm'? In other words, how would I scale the deviance from glm? Thanks, Ravi -Original Message- From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] Sent: Sunday, July 06, 2014 11:34 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Box-cox transformation Hi Ravi, Deviance is the SS in this case, but you need a normalizing constant adjusted by the lambda to put them on the same scale. I modified your example below to simplify slightly and use the normalization (see the LL line). Cheers, Josh ## require(MASS) myp - function(y, lambda) (y^lambda-1)/lambda lambda - seq(-0.05, 0.45, len = 20) N - nrow(quine) res - matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c(Lambda, LL))) # scaling contant C - exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { r - resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL - (- (N/2) * log(sum((r/(C^lambda[i]))^2))) res[i, ] - c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan ravi.varad...@jhu.edu wrote: Hi, I am trying to do Box-Cox transformation, but I am not sure how to do it correctly. Here is an example showing what I am trying: # example from MASS require(MASS) boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = seq(-0.05, 0.45, len = 20)) # Here is My attempt at getting the profile likelihood for the Box-Cox parameter lam - seq(-0.05, 0.45, len = 20) dev - rep(NA, length=20) for (i in 1:20) { a - lam[i] ans - glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = quine) dev[i] - ans$deviance } plot(lam, dev, type=b, xlab=lambda, ylab=deviance) I am trying to create the profile likelihood for the Box-Cox parameter, but obviously I am not getting it right. I am not sure that ans$deviance is the right thing to do. I would appreciate any guidance. Thanks Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-cox transformation
Hi Ravi, Deviance is the SS in this case, but you need a normalizing constant adjusted by the lambda to put them on the same scale. I modified your example below to simplify slightly and use the normalization (see the LL line). Cheers, Josh ## require(MASS) myp - function(y, lambda) (y^lambda-1)/lambda lambda - seq(-0.05, 0.45, len = 20) N - nrow(quine) res - matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c(Lambda, LL))) # scaling contant C - exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { r - resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL - (- (N/2) * log(sum((r/(C^lambda[i]))^2))) res[i, ] - c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan ravi.varad...@jhu.edu wrote: Hi, I am trying to do Box-Cox transformation, but I am not sure how to do it correctly. Here is an example showing what I am trying: # example from MASS require(MASS) boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = seq(-0.05, 0.45, len = 20)) # Here is My attempt at getting the profile likelihood for the Box-Cox parameter lam - seq(-0.05, 0.45, len = 20) dev - rep(NA, length=20) for (i in 1:20) { a - lam[i] ans - glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = quine) dev[i] - ans$deviance } plot(lam, dev, type=b, xlab=lambda, ylab=deviance) I am trying to create the profile likelihood for the Box-Cox parameter, but obviously I am not getting it right. I am not sure that ans$deviance is the right thing to do. I would appreciate any guidance. Thanks Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-Cox transformation in R
On 05/04/2011 12:53 PM, FMH wrote: Hi, Could any one please help how I can transform data based on Box-Cox Transformations in R. Any helps will be much appreciated. thanks, Kagba [[alternative HTML version deleted]] See the boxcox function in the MASS package. ___ Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-Cox transformation in R
Hi: Start here: library(sos)# Install first if necessary findFn('Box-Cox') This search finds 131 matches; the basic Box-Cox transformations for regression are found in the MASS and car packages. For other situations, consult the packages and functions identified from the sos search. HTH, Dennis On Wed, May 4, 2011 at 9:53 AM, FMH kagba2...@yahoo.com wrote: Hi, Could any one please help how I can transform data based on Box-Cox Transformations in R. Any helps will be much appreciated. thanks, Kagba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-Cox Transformation: Drastic differences when varying added constants
Have you read the BoxCox paper? It has the theory in there for dealing with an offset parameter (though I don't know of any existing functions that help in estimating both lambdas at the same time). Though another important point (in the paper as well) is that the lambda values used should be based on sound science, not just what fits best. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Holger Steinmetz Sent: Sunday, May 16, 2010 6:22 AM To: r-help@r-project.org Subject: [R] Box-Cox Transformation: Drastic differences when varying added constants Dear experts, I tried to learn about Box-Cox-transformation but found the following thing: When I had to add a constant to make all values of the original variable positive, I found that the lambda estimates (box.cox.powers-function) differed dramatically depending on the specific constant chosen. In addition, the correlation between the transformed variable and the original were not 1 (as I think it should be to use the transformed variable meaningfully) but much lower. With higher added values (and a right skewed variable) the lambda estimate was even negative and the correlation between the transformed variable and the original varible was -.91!!? I guess that is something fundmental missing in my current thinking about box-cox... Best, Holger P.S. Here is what i did: # Creating of a skewed variable X (mixture of two normals) x1 = rnorm(120,0,.5) x2 = rnorm(40,2.5,2) X = c(x1,x2) # Adding a small constant Xnew1 = X +abs(min(X))+ .1 box.cox.powers(Xnew1) Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate) # Adding a larger constant Xnew2 = X +abs(min(X)) + 1 box.cox.powers(Xnew2) Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate) #Plotting it all par(mfrow=c(3,2)) hist(X) qqnorm(X) qqline(X,lty=2) hist(Xtrans1) qqnorm(Xtrans1) qqline(Xtrans1,lty=2) hist(Xtrans2) qqnorm(Xtrans2) qqline(Xtrans2,lty=2) #correlation among original and transformed variables round(cor(cbind(X,Xtrans1,Xtrans2)),2) -- View this message in context: http://r.789695.n4.nabble.com/Box-Cox- Transformation-Drastic-differences-when-varying-added-constants- tp2218490p2218490.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-Cox Transformation: Drastic differences when varying added constants
On 05/18/2010 10:41 PM, Greg Snow wrote: Have you read the BoxCox paper? It has the theory in there for dealing with an offset parameter (though I don't know of any existing functions that help in estimating both lambdas at the same time). Though another important point (in the paper as well) is that the lambda values used should be based on sound science, not just what fits best. Sensitivity of log-like and exponential transformations to the choice of origin is a significant limitation in my view. If there is no subject matter theory to back up a particular origin, then either the origin should be a parameter to be estimated or you should consider a nonparametric transformation of Y. avas is one such approach, based on variance stabilization. The areg.boot function in the Hmisc package gives you confidence bands for various quantities using avas and ace transform-both-sides nonparametric regression approaches. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-Cox Transformation: Drastic differences when varying added constants
Hi Holger, I would also highly recommend you look at the ?boxcox and ?logtrans functions in the MASS package. There is also a very illuminating, concise discussion about their use on Pages 170 - 172 of Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. with example. Hope that helps, Bill On Sun, May 16, 2010 at 13:01, Peter Ehlers ehl...@ucalgary.ca wrote: On 2010-05-16 6:22, Holger Steinmetz wrote: Dear experts, I tried to learn about Box-Cox-transformation but found the following thing: When I had to add a constant to make all values of the original variable positive, I found that the lambda estimates (box.cox.powers-function) differed dramatically depending on the specific constant chosen. Let's say that x is such that 1/x has a Normal distribution, i.e. lambda = -1. Then y = (1/x) + b also has a Normal distribution. But you're expecting 1/(x+b) to also have a Normal distribution. In addition, the correlation between the transformed variable and the original were not 1 (as I think it should be to use the transformed variable meaningfully) but much lower. Again, your expectation is faulty. The relationship between the original and transformed variables is not linear (otherwise, why do the transformation?), but cor() computes the Pearson correlation coefficient by default. Try method='spearman'. Better yet, plot the transformed variables vs the original variable for further enlightenment. -Peter Ehlers With higher added values (and a right skewed variable) the lambda estimate was even negative and the correlation between the transformed variable and the original varible was -.91!!? I guess that is something fundmental missing in my current thinking about box-cox... Best, Holger P.S. Here is what i did: # Creating of a skewed variable X (mixture of two normals) x1 = rnorm(120,0,.5) x2 = rnorm(40,2.5,2) X = c(x1,x2) # Adding a small constant Xnew1 = X +abs(min(X))+ .1 box.cox.powers(Xnew1) Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate) # Adding a larger constant Xnew2 = X +abs(min(X)) + 1 box.cox.powers(Xnew2) Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate) #Plotting it all par(mfrow=c(3,2)) hist(X) qqnorm(X) qqline(X,lty=2) hist(Xtrans1) qqnorm(Xtrans1) qqline(Xtrans1,lty=2) hist(Xtrans2) qqnorm(Xtrans2) qqline(Xtrans2,lty=2) #correlation among original and transformed variables round(cor(cbind(X,Xtrans1,Xtrans2)),2) -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-Cox Transformation: Drastic differences when varying added constants
On 2010-05-16 6:22, Holger Steinmetz wrote: Dear experts, I tried to learn about Box-Cox-transformation but found the following thing: When I had to add a constant to make all values of the original variable positive, I found that the lambda estimates (box.cox.powers-function) differed dramatically depending on the specific constant chosen. Let's say that x is such that 1/x has a Normal distribution, i.e. lambda = -1. Then y = (1/x) + b also has a Normal distribution. But you're expecting 1/(x+b) to also have a Normal distribution. In addition, the correlation between the transformed variable and the original were not 1 (as I think it should be to use the transformed variable meaningfully) but much lower. Again, your expectation is faulty. The relationship between the original and transformed variables is not linear (otherwise, why do the transformation?), but cor() computes the Pearson correlation coefficient by default. Try method='spearman'. Better yet, plot the transformed variables vs the original variable for further enlightenment. -Peter Ehlers With higher added values (and a right skewed variable) the lambda estimate was even negative and the correlation between the transformed variable and the original varible was -.91!!? I guess that is something fundmental missing in my current thinking about box-cox... Best, Holger P.S. Here is what i did: # Creating of a skewed variable X (mixture of two normals) x1 = rnorm(120,0,.5) x2 = rnorm(40,2.5,2) X = c(x1,x2) # Adding a small constant Xnew1 = X +abs(min(X))+ .1 box.cox.powers(Xnew1) Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate) # Adding a larger constant Xnew2 = X +abs(min(X)) + 1 box.cox.powers(Xnew2) Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate) #Plotting it all par(mfrow=c(3,2)) hist(X) qqnorm(X) qqline(X,lty=2) hist(Xtrans1) qqnorm(Xtrans1) qqline(Xtrans1,lty=2) hist(Xtrans2) qqnorm(Xtrans2) qqline(Xtrans2,lty=2) #correlation among original and transformed variables round(cor(cbind(X,Xtrans1,Xtrans2)),2) -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.