Re: [R] Reproducibility issue in gbm (32 vs 64 bit)
On Sat, 26-Feb-2011 at 08:46AM -0800, Ridgeway, Greg wrote: | I have heard about this before happening on other | platforms. Frankly I'm not positive how this happens. My best guess | is that there's a tiny bit of numeric instability in the 9+ decimal | place so that on a given iteration a one variable choice at random | looks better than the other. Any other ideas? Greg I played around with this some time ago and noticed that it happens only when there's perfect or very nearly perfect correlation. I even tried a third variable and it was ignored almost completely. I concluded it's highly unlikely to cause a problem since real data wouldn't have perfectly correlated variables -- or if they did, they'd be easy enough to detect. | | - Original Message - | From: Joshua Wiley jwiley.ps...@gmail.com | To: Axel Urbiz axel.ur...@gmail.com | Cc: R-help@r-project.org R-help@r-project.org; Ridgeway, Greg | Sent: Fri Feb 25 22:16:02 2011 | Subject: Re: [R] Reproducibility issue in gbm (32 vs 64 bit) | | Hi Axel, | | I do not have a nice explanation why the results differ off the top of | my head. I can say I can replicate what you get on 32/64 (both | Windows 7) bit with the development version of R and gbm_1.6-3.1. | | Here is an even simpler example that shows the difference: | | gbmfit - gbm(1:50 ~ I(50:1) + I(60:11), distribution = gaussian) | summary(gbmfit) | | I copied that package maintainer. | | Cheers, | | Josh | | On Fri, Feb 25, 2011 at 7:29 PM, Axel Urbiz axel.ur...@gmail.com wrote: | Dear List, | | The gbm package on Win 7 produces different results for the | relative importance of input variables in R 32-bit relative to R 64-bit. Any | idea why? Any idea which one is correct? | | Based on this example, it looks like the relative importance of 2 perfectly | correlated predictors is diluted by half in 32-bit, whereas in 64-bit, one | of these predictors gets all the importance and the other gets none. I found | this interesting. | | ### Sample code | | library(gbm) | set.seed(12345) | xc=matrix(rnorm(100*20),100,20) | y=sample(1:2,100,replace=TRUE) | xc[,2] - xc[,1] | gbmfit - gbm(y~xc[,1]+xc[,2] +xc[,3], distribution=gaussian) | summary(gbmfit) | | ### Results on R 2.12.0 (32-bit) | | var rel.inf | 1 xc[, 3] 49.76143 | 2 xc[, 1] 27.27432 | 3 xc[, 2] 22.96425 | | ### Results on R 2.12.0 (64-bit) | summary(gbmfit) | var rel.inf | 1 xc[, 1] 50.23857 | 2 xc[, 3] 49.76143 | 3 xc[, 2] 0.0 | | Thanks, | Axel. | | [[alternative HTML version deleted]] | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. | | | | | -- | Joshua Wiley | Ph.D. Student, Health Psychology | University of California, Los Angeles | http://www.joshuawiley.com/ | | __ | | This email message is for the sole use of the intended recipient(s) and | may contain confidential information. Any unauthorized review, use, | disclosure or distribution is prohibited. If you are not the intended | recipient, please contact the sender by reply email and destroy all copies | of the original message. | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reproducibility issue in gbm (32 vs 64 bit)
I'm guessing this has something to do with numerical precision on the two platforms. Leo. - Original Message - From: Joshua Wiley jwiley.ps...@gmail.com To: Axel Urbiz axel.ur...@gmail.com Cc: R-help@r-project.org R-help@r-project.org; Ridgeway, Greg Sent: Fri Feb 25 22:16:02 2011 Subject: Re: [R] Reproducibility issue in gbm (32 vs 64 bit) Hi Axel, I do not have a nice explanation why the results differ off the top of my head. I can say I can replicate what you get on 32/64 (both Windows 7) bit with the development version of R and gbm_1.6-3.1. Here is an even simpler example that shows the difference: gbmfit - gbm(1:50 ~ I(50:1) + I(60:11), distribution = gaussian) summary(gbmfit) I copied that package maintainer. Cheers, Josh On Fri, Feb 25, 2011 at 7:29 PM, Axel Urbiz axel.ur...@gmail.com wrote: Dear List, The gbm package on Win 7 produces different results for the relative importance of input variables in R 32-bit relative to R 64-bit. Any idea why? Any idea which one is correct? Based on this example, it looks like the relative importance of 2 perfectly correlated predictors is diluted by half in 32-bit, whereas in 64-bit, one of these predictors gets all the importance and the other gets none. I found this interesting. ### Sample code library(gbm) set.seed(12345) xc=matrix(rnorm(100*20),100,20) y=sample(1:2,100,replace=TRUE) xc[,2] - xc[,1] gbmfit - gbm(y~xc[,1]+xc[,2] +xc[,3], distribution=gaussian) summary(gbmfit) ### Results on R 2.12.0 (32-bit) var rel.inf 1 xc[, 3] 49.76143 2 xc[, 1] 27.27432 3 xc[, 2] 22.96425 ### Results on R 2.12.0 (64-bit) summary(gbmfit) var rel.inf 1 xc[, 1] 50.23857 2 xc[, 3] 49.76143 3 xc[, 2] 0.0 Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ This email message is for the sole use of the intended r...{{dropped:30}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reproducibility issue in gbm (32 vs 64 bit)
I have heard about this before happening on other platforms. Frankly I'm not positive how this happens. My best guess is that there's a tiny bit of numeric instability in the 9+ decimal place so that on a given iteration a one variable choice at random looks better than the other. Any other ideas? Greg - Original Message - From: Joshua Wiley jwiley.ps...@gmail.com To: Axel Urbiz axel.ur...@gmail.com Cc: R-help@r-project.org R-help@r-project.org; Ridgeway, Greg Sent: Fri Feb 25 22:16:02 2011 Subject: Re: [R] Reproducibility issue in gbm (32 vs 64 bit) Hi Axel, I do not have a nice explanation why the results differ off the top of my head. I can say I can replicate what you get on 32/64 (both Windows 7) bit with the development version of R and gbm_1.6-3.1. Here is an even simpler example that shows the difference: gbmfit - gbm(1:50 ~ I(50:1) + I(60:11), distribution = gaussian) summary(gbmfit) I copied that package maintainer. Cheers, Josh On Fri, Feb 25, 2011 at 7:29 PM, Axel Urbiz axel.ur...@gmail.com wrote: Dear List, The gbm package on Win 7 produces different results for the relative importance of input variables in R 32-bit relative to R 64-bit. Any idea why? Any idea which one is correct? Based on this example, it looks like the relative importance of 2 perfectly correlated predictors is diluted by half in 32-bit, whereas in 64-bit, one of these predictors gets all the importance and the other gets none. I found this interesting. ### Sample code library(gbm) set.seed(12345) xc=matrix(rnorm(100*20),100,20) y=sample(1:2,100,replace=TRUE) xc[,2] - xc[,1] gbmfit - gbm(y~xc[,1]+xc[,2] +xc[,3], distribution=gaussian) summary(gbmfit) ### Results on R 2.12.0 (32-bit) var rel.inf 1 xc[, 3] 49.76143 2 xc[, 1] 27.27432 3 xc[, 2] 22.96425 ### Results on R 2.12.0 (64-bit) summary(gbmfit) var rel.inf 1 xc[, 1] 50.23857 2 xc[, 3] 49.76143 3 xc[, 2] 0.0 Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reproducibility issue in gbm (32 vs 64 bit)
Hi Axel, I do not have a nice explanation why the results differ off the top of my head. I can say I can replicate what you get on 32/64 (both Windows 7) bit with the development version of R and gbm_1.6-3.1. Here is an even simpler example that shows the difference: gbmfit - gbm(1:50 ~ I(50:1) + I(60:11), distribution = gaussian) summary(gbmfit) I copied that package maintainer. Cheers, Josh On Fri, Feb 25, 2011 at 7:29 PM, Axel Urbiz axel.ur...@gmail.com wrote: Dear List, The gbm package on Win 7 produces different results for the relative importance of input variables in R 32-bit relative to R 64-bit. Any idea why? Any idea which one is correct? Based on this example, it looks like the relative importance of 2 perfectly correlated predictors is diluted by half in 32-bit, whereas in 64-bit, one of these predictors gets all the importance and the other gets none. I found this interesting. ### Sample code library(gbm) set.seed(12345) xc=matrix(rnorm(100*20),100,20) y=sample(1:2,100,replace=TRUE) xc[,2] - xc[,1] gbmfit - gbm(y~xc[,1]+xc[,2] +xc[,3], distribution=gaussian) summary(gbmfit) ### Results on R 2.12.0 (32-bit) var rel.inf 1 xc[, 3] 49.76143 2 xc[, 1] 27.27432 3 xc[, 2] 22.96425 ### Results on R 2.12.0 (64-bit) summary(gbmfit) var rel.inf 1 xc[, 1] 50.23857 2 xc[, 3] 49.76143 3 xc[, 2] 0.0 Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.