Re: [Rd] Change in the RNG implementation?

2012-10-22 Thread Hervé Pagès

Hi Duncan, Martin,

Thanks for your answers.

For my real case I was generating millions of random positions
on a genome.

I compared sample.int() performance between R-2.15.1 and R-devel,
and, for me, it performs better in R-2.15.1 (almost 3x faster and
also uses slightly less memory):

With R-2.15.1:

   set.seed(33)

   system.time(random_chrom_pos - sample(199000666L, 95000777L))
 user  system elapsed
4.964   0.268   5.242

   gc()
 used  (Mb) gc trigger   (Mb)  max used   (Mb)
  Ncells   137285   7.4 35   18.735   18.7
  Vcells 47633785 363.5  154735917 1180.6 147135703 1122.6

   sessionInfo()
  R version 2.15.1 (2012-06-22)
  Platform: x86_64-unknown-linux-gnu (64-bit)

  locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=C LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base

With R-devel:

   set.seed(33)

   system.time(random_chrom_pos - sample(199000666L, 95000777L))
 user  system elapsed
   14.532   0.296  14.854

   gc()
 used  (Mb) gc trigger   (Mb)  max used   (Mb)
  Ncells   145525   7.8 35   18.735   18.7
  Vcells 47644082 363.5  152959996 1167.0 182023372 1388.8

   sessionInfo()
  R Under development (unstable) (2012-10-02 r60861)
  Platform: x86_64-unknown-linux-gnu (64-bit)

  locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=C LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base

FWIW my R-2.15.1 and R-devel were configured with
--disable-byte-compiled-packages, otherwise, I use all the
defaults. Also my system is a standard Ubuntu 12.04 installation
with no fancy settings/tweakings/customizations.

Thanks,
H.


On 10/20/2012 12:50 PM, Martin Maechler wrote:

Duncan Murdoch murdoch.dun...@gmail.com
 on Fri, 19 Oct 2012 19:26:39 -0400 writes:


  On 12-10-19 7:04 PM, Hervé Pagès wrote:
  Hi,
 
  Looks like the implementation of random number generation changed in
  R-devel with respect to R-2.15.1.
 
  With R-2.15.1:
 
   set.seed(33)
   sample(49821115, 10)
  [1] 22217252 19661919 24099911 45779422 42043111 25774933 21778053
  17098516
  [9]   773073  5878451
 
  With recent R-devel:
 
   set.seed(33)
   sample(49821115, 10)
  [1] 22217252 19661919 24099912 45779425 42043115 25774935 21778056
  17098518
  [9]   773073  5878452
 
  This is on a 64-bit Ubuntu system.
 
  Is this change intended? I didn't see anything in the NEWS file.
 
  A potential problem with this is that it will break unit tests
  for algorithms that make use of RNG.
 
  Another more practical problem (at least for me) is the following:
  Bioconductor package maintainers are sometimes working hard on the
  development version of their package to improve the performance of
  some key functions. Comparing performance between BioC release
  (based on R-2.15) and devel (based on R-devel) often requires big
  input data that is randomly generated, because it's easiest than
  working with real data. Typically a small script is written that
  takes care of loading the required packages, generating the input
  data, and running a simple analysis. The same script is sourced in
  R-2.15 and R-devel, and performance and results are compared.
 
  Not being able to generate exactly the same input in the script is
  a problem. It can be worked around by generating the input once,
  serializing it, and use load() in the script, but that makes things
  more complicated and the script is not a standalone script anymore
  (cannot be passed around without also passing around the big .rda
  file).
 
  Thanks,
  H.
 

  I think it was mentioned in the NEWS:

  \code{sample.int()} has some support for  \eqn{n \ge
  2^{31}}{n = 2^31}: see its help for the limitations.

  A different algorithm is used for \code{(n, size, replace = FALSE,
  prob = NULL)} for \code{n  1e7} and \code{size = n/2}.  This
  is much faster and uses less memory, but does give different results.

So, to iterate : The  RNG  has not been changed at all,
but  sample() has, for extreme cases (large n) like yours.

  I don't think the old algorithm is available, but perhaps it could be
  made available by an optional parameter.

I do think we should ideally add 

Re: [Rd] suppress *specific* warnings?

2012-10-22 Thread luke-tierney

On Sun, 21 Oct 2012, Martin Morgan wrote:


On 10/21/2012 12:28 PM, Ben Bolker wrote:


   Not desperately important, but nice to have and possibly of use to
others, is the ability to suppress specific warnings rather than
suppressing warnings indiscriminately.  I often know of a specific
warning that I want to ignore (because I know that's it's a false
positive/ignorable), but the current design of suppressWarnings() forces
me to ignore *any* warnings coming from the expression.

   I started to write a new version that would check and, if supplied
with a regular expression, would only block matching warnings and
otherwise would produce the warnings as usual, but I don't quite know
enough about what I'm doing: see ??? in expression below.

   Can anyone help, or suggest pointers to relevant
examples/documentation (I've looked at demo(error.catching), which isn't
helping me ... ?)

suppressWarnings2 - function(expr,regex=NULL) {
 opts - options(warn = -1)
 on.exit(options(opts))


I'm not really sure what the options(warn=-1) is doing there, maybe its for 
efficiency to avoid generating a warning message (as distinct from signalling


The sources in srs/library/base/conditions.R have

suppressWarnings - function(expr) {
ops - options(warn = -1) ## FIXME: temporary hack until R_tryEval
on.exit(options(ops)) ## calls are removed from methods code
withCallingHandlers(expr,
warning=function(w)
invokeRestart(muffleWarning))
}

I uspect we have still not entirely eliminated R_tryEval in this context
but I'm not sure. Will check when I get a chance.


a warning). I think you're after something like

 suppressWarnings2 -
 function(expr, regex=character())
 {
 withCallingHandlers(expr, warning=function(w) {
 if (length(regex) == 1  length(grep(regex, conditionMessage(w 
{

 invokeRestart(muffleWarning)
 }
 })
 }


A problem with using expression matching is of course that this fails
with internationalized messages. Ideally warnings should be signaled as
warning conditions of a particular class, and that class can be used
to discriminate. Unfortunately very few warnings are designed this way.

Best,

luke



If the  restart isn't invoked, then the next handler is called and the 
warning is handled as normal. So with


 f - function() {
 warning(oops)
 2
 }

there is


suppressWarnings2(f())

[1] 2
Warning message:
In f() : oops

suppressWarnings2(f(), oops)

[1] 2

For your own code I think a better strategy is to create a sub-class of 
warnings that can be handled differently


 mywarn -
 function(..., call.=TRUE, immediate.=FALSE, domain=NULL)
 {
 msg - .makeMessage(..., domain=domain, appendLF=FALSE)
 call - NULL
 if (call.)
 call - sys.call(1L)
 class - c(silencable, simpleWarning,  warning, condition)
 cond - structure(list(message=msg, call=call), class=class)
 warning(cond)
 }

 suppressWarnings3 -
 function(expr)
 {
 withCallingHandlers(expr, silencable=function(w) {
 invokeRestart(muffleWarning)
 })
 }

then with

 g - function() {
 mywarn(oops)
 3
 }


suppressWarnings3(f())

[1] 2
Warning message:
In f() : oops

g()

[1] 3
Warning message:
In g() : oops

suppressWarnings3(g())

[1] 3


 withCallingHandlers(expr, warning = function(w) {
 ## browser()
 if (is.null(regex) || grepl(w[[message]],regex)) {
 invokeRestart(muffleWarning)
 } else {
 ## ? what do I here to get the warning issued?
 ## browser()
 ## computeRestarts() shows browser,
 ##muffleWarning, and abort ...
 options(opts)
 warning(w$message)
 ## how can I get back from here to the calling point
 ##   *without* muffling warnings ... ?
 }
 })
}

suppressWarnings2(sqrt(-1))
suppressWarnings2(sqrt(-1),abc)

   It seems to me I'd like to have a restart option that just returns to
the point where the warning was caught, *without* muffling warnings ...
?  But I don't quite understand how to set one up ...

   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel