Re: [Rd] Change in the RNG implementation?
Hi Duncan, Martin, Thanks for your answers. For my real case I was generating millions of random positions on a genome. I compared sample.int() performance between R-2.15.1 and R-devel, and, for me, it performs better in R-2.15.1 (almost 3x faster and also uses slightly less memory): With R-2.15.1: set.seed(33) system.time(random_chrom_pos - sample(199000666L, 95000777L)) user system elapsed 4.964 0.268 5.242 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 137285 7.4 35 18.735 18.7 Vcells 47633785 363.5 154735917 1180.6 147135703 1122.6 sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base With R-devel: set.seed(33) system.time(random_chrom_pos - sample(199000666L, 95000777L)) user system elapsed 14.532 0.296 14.854 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 145525 7.8 35 18.735 18.7 Vcells 47644082 363.5 152959996 1167.0 182023372 1388.8 sessionInfo() R Under development (unstable) (2012-10-02 r60861) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base FWIW my R-2.15.1 and R-devel were configured with --disable-byte-compiled-packages, otherwise, I use all the defaults. Also my system is a standard Ubuntu 12.04 installation with no fancy settings/tweakings/customizations. Thanks, H. On 10/20/2012 12:50 PM, Martin Maechler wrote: Duncan Murdoch murdoch.dun...@gmail.com on Fri, 19 Oct 2012 19:26:39 -0400 writes: On 12-10-19 7:04 PM, Hervé Pagès wrote: Hi, Looks like the implementation of random number generation changed in R-devel with respect to R-2.15.1. With R-2.15.1: set.seed(33) sample(49821115, 10) [1] 22217252 19661919 24099911 45779422 42043111 25774933 21778053 17098516 [9] 773073 5878451 With recent R-devel: set.seed(33) sample(49821115, 10) [1] 22217252 19661919 24099912 45779425 42043115 25774935 21778056 17098518 [9] 773073 5878452 This is on a 64-bit Ubuntu system. Is this change intended? I didn't see anything in the NEWS file. A potential problem with this is that it will break unit tests for algorithms that make use of RNG. Another more practical problem (at least for me) is the following: Bioconductor package maintainers are sometimes working hard on the development version of their package to improve the performance of some key functions. Comparing performance between BioC release (based on R-2.15) and devel (based on R-devel) often requires big input data that is randomly generated, because it's easiest than working with real data. Typically a small script is written that takes care of loading the required packages, generating the input data, and running a simple analysis. The same script is sourced in R-2.15 and R-devel, and performance and results are compared. Not being able to generate exactly the same input in the script is a problem. It can be worked around by generating the input once, serializing it, and use load() in the script, but that makes things more complicated and the script is not a standalone script anymore (cannot be passed around without also passing around the big .rda file). Thanks, H. I think it was mentioned in the NEWS: \code{sample.int()} has some support for \eqn{n \ge 2^{31}}{n = 2^31}: see its help for the limitations. A different algorithm is used for \code{(n, size, replace = FALSE, prob = NULL)} for \code{n 1e7} and \code{size = n/2}. This is much faster and uses less memory, but does give different results. So, to iterate : The RNG has not been changed at all, but sample() has, for extreme cases (large n) like yours. I don't think the old algorithm is available, but perhaps it could be made available by an optional parameter. I do think we should ideally add
Re: [Rd] suppress *specific* warnings?
On Sun, 21 Oct 2012, Martin Morgan wrote: On 10/21/2012 12:28 PM, Ben Bolker wrote: Not desperately important, but nice to have and possibly of use to others, is the ability to suppress specific warnings rather than suppressing warnings indiscriminately. I often know of a specific warning that I want to ignore (because I know that's it's a false positive/ignorable), but the current design of suppressWarnings() forces me to ignore *any* warnings coming from the expression. I started to write a new version that would check and, if supplied with a regular expression, would only block matching warnings and otherwise would produce the warnings as usual, but I don't quite know enough about what I'm doing: see ??? in expression below. Can anyone help, or suggest pointers to relevant examples/documentation (I've looked at demo(error.catching), which isn't helping me ... ?) suppressWarnings2 - function(expr,regex=NULL) { opts - options(warn = -1) on.exit(options(opts)) I'm not really sure what the options(warn=-1) is doing there, maybe its for efficiency to avoid generating a warning message (as distinct from signalling The sources in srs/library/base/conditions.R have suppressWarnings - function(expr) { ops - options(warn = -1) ## FIXME: temporary hack until R_tryEval on.exit(options(ops)) ## calls are removed from methods code withCallingHandlers(expr, warning=function(w) invokeRestart(muffleWarning)) } I uspect we have still not entirely eliminated R_tryEval in this context but I'm not sure. Will check when I get a chance. a warning). I think you're after something like suppressWarnings2 - function(expr, regex=character()) { withCallingHandlers(expr, warning=function(w) { if (length(regex) == 1 length(grep(regex, conditionMessage(w { invokeRestart(muffleWarning) } }) } A problem with using expression matching is of course that this fails with internationalized messages. Ideally warnings should be signaled as warning conditions of a particular class, and that class can be used to discriminate. Unfortunately very few warnings are designed this way. Best, luke If the restart isn't invoked, then the next handler is called and the warning is handled as normal. So with f - function() { warning(oops) 2 } there is suppressWarnings2(f()) [1] 2 Warning message: In f() : oops suppressWarnings2(f(), oops) [1] 2 For your own code I think a better strategy is to create a sub-class of warnings that can be handled differently mywarn - function(..., call.=TRUE, immediate.=FALSE, domain=NULL) { msg - .makeMessage(..., domain=domain, appendLF=FALSE) call - NULL if (call.) call - sys.call(1L) class - c(silencable, simpleWarning, warning, condition) cond - structure(list(message=msg, call=call), class=class) warning(cond) } suppressWarnings3 - function(expr) { withCallingHandlers(expr, silencable=function(w) { invokeRestart(muffleWarning) }) } then with g - function() { mywarn(oops) 3 } suppressWarnings3(f()) [1] 2 Warning message: In f() : oops g() [1] 3 Warning message: In g() : oops suppressWarnings3(g()) [1] 3 withCallingHandlers(expr, warning = function(w) { ## browser() if (is.null(regex) || grepl(w[[message]],regex)) { invokeRestart(muffleWarning) } else { ## ? what do I here to get the warning issued? ## browser() ## computeRestarts() shows browser, ##muffleWarning, and abort ... options(opts) warning(w$message) ## how can I get back from here to the calling point ## *without* muffling warnings ... ? } }) } suppressWarnings2(sqrt(-1)) suppressWarnings2(sqrt(-1),abc) It seems to me I'd like to have a restart option that just returns to the point where the warning was caught, *without* muffling warnings ... ? But I don't quite understand how to set one up ... Ben Bolker __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel