Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-23 Thread Martin Maechler
> Benjamin Tyner 
> on Thu, 23 Jan 2020 08:16:03 -0500 writes:

> On 1/20/20 12:33 PM, Martin Maechler wrote:
>> 
>> It's really something that should be discussed (possibly not
>> here, .. but then I've started it here ...).
>> 
>> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
>> 
>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>> rsignrank() and rwilcox() now return integer (not double)
>> vectors.  This halves the storage requirements for large
>> simulations.
>> 
>> and what I've been suggesting is to revert this change
>> (svn rev r60225-6) which was purposefully and diligently done by
>> a fellow R core member, so indeed must be debatable.
>> 
>> Martin

> For the record, I don't personally objects to the change here (as my 
> philosophy tends toward treating most warnings as errors anyway) but for 
> the sake of other useRs who may get bitten, perhaps we should be more 
> explicit that backwards-compatibility won't be preserved under certain 
> use patterns, for example:

> # works (with warning) in R 3.6.2 but fails (with error) in R-devel:
> vapply(list(1e9, 1e10),
>    function(lambda) {
>   rpois(1L, lambda)
>    },
>    FUN.VALUE = integer(1L)
>    )

Well, some people are too picky...
use numeric(), not integer() in such cases :

> vapply(1:10, function(i) if(runif(1) < 0.5) 1L else 2, FUN.VALUE=pi)
 [1] 1 1 2 2 2 2 1 1 2 1
> 

No, really,  I don't plan to spend time "bloating" the
documentation any further,
when noticing that only a "few parts in a billion"  people carefully read
our help pages where the remaining  99.999% percent rather try
things in the R console and draw (often) wrong conclusions...

I *am* glad and grateful for careful and accurate R users and
bug-squashing helpers such as you or Suharto or ...

Martin

> # in R-devel, a little extra work to achieve a warning as before:
> vapply(list(1e9, 1e10),
>    function(lambda) {
>   tmp <- rpois(1L, lambda)
>   if (!is.integer(tmp)) {
>  warning("NAs produced")
>  tmp <- NA_integer_
>   }
>   tmp
>    },
>    FUN.VALUE = integer(1L)
>    )

> (and yes I realize that rpois() vectorizes on lambda, so vapply is 
> re-inventing the wheel in this toy example, but there could be (?) a 
> justified use for it in more complicated simulations).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-23 Thread Benjamin Tyner

On 1/20/20 12:33 PM, Martin Maechler wrote:


It's really something that should be discussed (possibly not
here, .. but then I've started it here ...).

The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :

 * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
   rsignrank() and rwilcox() now return integer (not double)
   vectors.  This halves the storage requirements for large
   simulations.

and what I've been suggesting is to revert this change
(svn rev r60225-6) which was purposefully and diligently done by
a fellow R core member, so indeed must be debatable.

Martin


For the record, I don't personally objects to the change here (as my 
philosophy tends toward treating most warnings as errors anyway) but for 
the sake of other useRs who may get bitten, perhaps we should be more 
explicit that backwards-compatibility won't be preserved under certain 
use patterns, for example:


   # works (with warning) in R 3.6.2 but fails (with error) in R-devel:
   vapply(list(1e9, 1e10),
   function(lambda) {
  rpois(1L, lambda)
   },
   FUN.VALUE = integer(1L)
   )

   # in R-devel, a little extra work to achieve a warning as before:
   vapply(list(1e9, 1e10),
   function(lambda) {
  tmp <- rpois(1L, lambda)
  if (!is.integer(tmp)) {
 warning("NAs produced")
 tmp <- NA_integer_
  }
  tmp
   },
   FUN.VALUE = integer(1L)
   )

(and yes I realize that rpois() vectorizes on lambda, so vapply is 
re-inventing the wheel in this toy example, but there could be (?) a 
justified use for it in more complicated simulations).


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-22 Thread Avraham Adler
Fantastic!!

Thanks,

Avi

On Wed, Jan 22, 2020 at 11:14 AM Spencer Graves 
wrote:

>
>
> On 2020-01-22 02:54, Martin Maechler wrote:
> >> Martin Maechler
> >>  on Tue, 21 Jan 2020 09:25:19 +0100 writes:
> >> Ben Bolker
> >>  on Mon, 20 Jan 2020 12:54:52 -0500 writes:
> >  >> Ugh, sounds like competing priorities.
> >
> >  > indeed.
> >
> >  >> * maintain type consistency
> >  >> * minimize storage (= current version, since 3.0.0)
> >  >> * maximize utility for large lambda (= proposed change)
> >  >> * keep user interface, and code, simple (e.g., it would be easy
> enough
> >  >> to add a switch that provided user control of int vs double
> return value)
> >  >> * backward compatibility
> >
> >  > Last night, it came to my mind that we should do what we have
> >  > been doing in quite a few places in R, the last couple of years:
> >
> >  > Return integer when possible, and switch to return double when
> >  > integers don't fit.
> >
> >  > We've been doing so even for  1:N  (well, now with additional
> ALTREP wrapper),
> >  > seq(), and even the fundamental  length()  function.
> >
> >  > So I sat down and implemented it .. and it seemed to work
> >  > perfectly:  Returning the same random numbers as now, but
> >  > switching to use double (instead of returning NAs) when the
> >  > values are too large.
> >
> >  > I'll probably commit that to R-devel quite soonish.
> >  > Martin
> >
> > Committed in svn rev 77690; this is really very advantageous, as
> > in some cases / applications or even just limit cases, you'd
> > easily get into overflow sitations.
> >
> > The new R 4.0.0 behavior is IMO  "the best of" being memory
> > efficient (integer storage) in most cases (back compatible to R 3.x.x)
> and
> > returning desired random numbers in large cases (compatible to R <=
> 2.x.x).
> >
> > Martin
>
>
> Wunderbar!  Sehr gut gemacht!  ("Wonderful!  Very well done!") Thanks,
> Spencer
> >
> >  >> On 2020-01-20 12:33 p.m., Martin Maechler wrote:
> >   Benjamin Tyner
> >   on Mon, 20 Jan 2020 08:10:49 -0500 writes:
> >  >>>
> >  >>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
> >  >>> >> Coming late here -- after enjoying a proper weekend ;-) --
> >  >>> >> I have been agreeing (with Spencer, IIUC) on this for a long
> >  >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as
> a
> >  >>> >> "design bug" that  rpois() {and similar} must return return
> typeof() "integer".
> >  >>> >>
> >  >>> >> More strongly, I'm actually pretty convinced they should
> return
> >  >>> >> (integer-valued) double instead of NA_integer_   and for that
> >  >>> >> reason should always return double:
> >  >>> >> Even if we have (hopefully) a native 64bit integer in R,
> >  >>> >> 2^64 is still teeny tiny compared .Machine$double.max
> >  >>> >>
> >  >>> >> (and then maybe we'd have .Machine$longdouble.max  which
> would
> >  >>> >> be considerably larger than double.max unless on Windows,
> where
> >  >>> >> the wise men at Microsoft decided to keep their workload
> simple
> >  >>> >> by defining "long double := double" - as 'long double'
> >  >>> >> unfortunately is not well defined by C standards)
> >  >>> >>
> >  >>> >> Martin
> >  >>> >>
> >  >>> > Martin if you are in favor, then certainly no objection from
> me! ;-)
> >  >>>
> >  >>> > So now what about other discrete distributions e.g. could a
> similar
> >  >>> > enhancement apply here?
> >  >>>
> >  >>>
> >  >>> >> rgeom(10L, 1e-10)
> >  >>> >  [1] NA 1503061294 NA NA
> 1122447583 NA
> >  >>> >  [7] NA NA NA NA
> >  >>> > Warning message:
> >  >>> > In rgeom(10L, 1e-10) : NAs produced
> >  >>>
> >  >>> yes, of course there are several such distributions.
> >  >>>
> >  >>> It's really something that should be discussed (possibly not
> >  >>> here, .. but then I've started it here ...).
> >  >>>
> >  >>> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
> >  >>>
> >  >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
> >  >>> rsignrank() and rwilcox() now return integer (not double)
> >  >>> vectors.  This halves the storage requirements for large
> >  >>> simulations.
> >  >>>
> >  >>> and what I've been suggesting is to revert this change
> >  >>> (svn rev r60225-6) which was purposefully and diligently done by
> >  >>> a fellow R core member, so indeed must be debatable.
> >  >>>
> >  >>> Martin
> >  >>>
> >  >>> __
> >  >>> R-devel@r-project.org mailing list
> >  >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >  >>>
> >
> >  >> __
> >  

Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-22 Thread Spencer Graves




On 2020-01-22 02:54, Martin Maechler wrote:

Martin Maechler
 on Tue, 21 Jan 2020 09:25:19 +0100 writes:
Ben Bolker
 on Mon, 20 Jan 2020 12:54:52 -0500 writes:

 >> Ugh, sounds like competing priorities.

 > indeed.

 >> * maintain type consistency
 >> * minimize storage (= current version, since 3.0.0)
 >> * maximize utility for large lambda (= proposed change)
 >> * keep user interface, and code, simple (e.g., it would be easy enough
 >> to add a switch that provided user control of int vs double return 
value)
 >> * backward compatibility

 > Last night, it came to my mind that we should do what we have
 > been doing in quite a few places in R, the last couple of years:

 > Return integer when possible, and switch to return double when
 > integers don't fit.

 > We've been doing so even for  1:N  (well, now with additional ALTREP 
wrapper),
 > seq(), and even the fundamental  length()  function.

 > So I sat down and implemented it .. and it seemed to work
 > perfectly:  Returning the same random numbers as now, but
 > switching to use double (instead of returning NAs) when the
 > values are too large.

 > I'll probably commit that to R-devel quite soonish.
 > Martin

Committed in svn rev 77690; this is really very advantageous, as
in some cases / applications or even just limit cases, you'd
easily get into overflow sitations.

The new R 4.0.0 behavior is IMO  "the best of" being memory
efficient (integer storage) in most cases (back compatible to R 3.x.x) and
returning desired random numbers in large cases (compatible to R <= 2.x.x).

Martin



Wunderbar!  Sehr gut gemacht!  ("Wonderful!  Very well done!") Thanks, 
Spencer


 >> On 2020-01-20 12:33 p.m., Martin Maechler wrote:
  Benjamin Tyner
  on Mon, 20 Jan 2020 08:10:49 -0500 writes:
 >>>
 >>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
 >>> >> Coming late here -- after enjoying a proper weekend ;-) --
 >>> >> I have been agreeing (with Spencer, IIUC) on this for a long
 >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
 >>> >> "design bug" that  rpois() {and similar} must return return typeof() 
"integer".
 >>> >>
 >>> >> More strongly, I'm actually pretty convinced they should return
 >>> >> (integer-valued) double instead of NA_integer_   and for that
 >>> >> reason should always return double:
 >>> >> Even if we have (hopefully) a native 64bit integer in R,
 >>> >> 2^64 is still teeny tiny compared .Machine$double.max
 >>> >>
 >>> >> (and then maybe we'd have .Machine$longdouble.max  which would
 >>> >> be considerably larger than double.max unless on Windows, where
 >>> >> the wise men at Microsoft decided to keep their workload simple
 >>> >> by defining "long double := double" - as 'long double'
 >>> >> unfortunately is not well defined by C standards)
 >>> >>
 >>> >> Martin
 >>> >>
 >>> > Martin if you are in favor, then certainly no objection from me! ;-)
 >>>
 >>> > So now what about other discrete distributions e.g. could a similar
 >>> > enhancement apply here?
 >>>
 >>>
 >>> >> rgeom(10L, 1e-10)
 >>> >  [1] NA 1503061294 NA NA 1122447583 
NA
 >>> >  [7] NA NA NA NA
 >>> > Warning message:
 >>> > In rgeom(10L, 1e-10) : NAs produced
 >>>
 >>> yes, of course there are several such distributions.
 >>>
 >>> It's really something that should be discussed (possibly not
 >>> here, .. but then I've started it here ...).
 >>>
 >>> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
 >>>
 >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
 >>> rsignrank() and rwilcox() now return integer (not double)
 >>> vectors.  This halves the storage requirements for large
 >>> simulations.
 >>>
 >>> and what I've been suggesting is to revert this change
 >>> (svn rev r60225-6) which was purposefully and diligently done by
 >>> a fellow R core member, so indeed must be debatable.
 >>>
 >>> Martin
 >>>
 >>> __
 >>> R-devel@r-project.org mailing list
 >>> https://stat.ethz.ch/mailman/listinfo/r-devel
 >>>

 >> __
 >> R-devel@r-project.org mailing list
 >> https://stat.ethz.ch/mailman/listinfo/r-devel

 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-22 Thread Martin Maechler
> Martin Maechler 
> on Tue, 21 Jan 2020 09:25:19 +0100 writes:

> Ben Bolker 
> on Mon, 20 Jan 2020 12:54:52 -0500 writes:

>> Ugh, sounds like competing priorities.

> indeed.

>> * maintain type consistency
>> * minimize storage (= current version, since 3.0.0)
>> * maximize utility for large lambda (= proposed change)
>> * keep user interface, and code, simple (e.g., it would be easy enough
>> to add a switch that provided user control of int vs double return value)
>> * backward compatibility

> Last night, it came to my mind that we should do what we have
> been doing in quite a few places in R, the last couple of years:

> Return integer when possible, and switch to return double when
> integers don't fit.

> We've been doing so even for  1:N  (well, now with additional ALTREP 
wrapper),
> seq(), and even the fundamental  length()  function.

> So I sat down and implemented it .. and it seemed to work
> perfectly:  Returning the same random numbers as now, but
> switching to use double (instead of returning NAs) when the
> values are too large.

> I'll probably commit that to R-devel quite soonish.
> Martin

Committed in svn rev 77690; this is really very advantageous, as
in some cases / applications or even just limit cases, you'd
easily get into overflow sitations.

The new R 4.0.0 behavior is IMO  "the best of" being memory
efficient (integer storage) in most cases (back compatible to R 3.x.x) and
returning desired random numbers in large cases (compatible to R <= 2.x.x).

Martin

>> On 2020-01-20 12:33 p.m., Martin Maechler wrote:
 Benjamin Tyner 
 on Mon, 20 Jan 2020 08:10:49 -0500 writes:
>>> 
>>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
>>> >> Coming late here -- after enjoying a proper weekend ;-) --
>>> >> I have been agreeing (with Spencer, IIUC) on this for a long
>>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
>>> >> "design bug" that  rpois() {and similar} must return return typeof() 
"integer".
>>> >> 
>>> >> More strongly, I'm actually pretty convinced they should return
>>> >> (integer-valued) double instead of NA_integer_   and for that
>>> >> reason should always return double:
>>> >> Even if we have (hopefully) a native 64bit integer in R,
>>> >> 2^64 is still teeny tiny compared .Machine$double.max
>>> >> 
>>> >> (and then maybe we'd have .Machine$longdouble.max  which would
>>> >> be considerably larger than double.max unless on Windows, where
>>> >> the wise men at Microsoft decided to keep their workload simple
>>> >> by defining "long double := double" - as 'long double'
>>> >> unfortunately is not well defined by C standards)
>>> >> 
>>> >> Martin
>>> >> 
>>> > Martin if you are in favor, then certainly no objection from me! ;-)
>>> 
>>> > So now what about other discrete distributions e.g. could a similar 
>>> > enhancement apply here?
>>> 
>>> 
>>> >> rgeom(10L, 1e-10)
>>> >  [1] NA 1503061294 NA NA 1122447583 NA
>>> >  [7] NA NA NA NA
>>> > Warning message:
>>> > In rgeom(10L, 1e-10) : NAs produced
>>> 
>>> yes, of course there are several such distributions.
>>> 
>>> It's really something that should be discussed (possibly not
>>> here, .. but then I've started it here ...).
>>> 
>>> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
>>> 
>>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>>> rsignrank() and rwilcox() now return integer (not double)
>>> vectors.  This halves the storage requirements for large
>>> simulations.
>>> 
>>> and what I've been suggesting is to revert this change
>>> (svn rev r60225-6) which was purposefully and diligently done by
>>> a fellow R core member, so indeed must be debatable. 
>>> 
>>> Martin
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-22 Thread Martin Maechler
> Martin Maechler 
> on Tue, 21 Jan 2020 09:25:19 +0100 writes:

> Ben Bolker 
> on Mon, 20 Jan 2020 12:54:52 -0500 writes:

>> Ugh, sounds like competing priorities.

> indeed.

>> * maintain type consistency
>> * minimize storage (= current version, since 3.0.0)
>> * maximize utility for large lambda (= proposed change)
>> * keep user interface, and code, simple (e.g., it would be easy enough
>> to add a switch that provided user control of int vs double return value)
>> * backward compatibility

> Last night, it came to my mind that we should do what we have
> been doing in quite a few places in R, the last couple of years:

> Return integer when possible, and switch to return double when
> integers don't fit.

> We've been doing so even for  1:N  (well, now with additional ALTREP 
wrapper),
> seq(), and even the fundamental  length()  function.

> So I sat down and implemented it .. and it seemed to work
> perfectly:  Returning the same random numbers as now, but
> switching to use double (instead of returning NAs) when the
> values are too large.

> I'll probably commit that to R-devel quite soonish.
> Martin

Committed in svn rev 77690; this is really very advantageous, as
in some cases / applications or even just limit cases, you'd
easily get into overflow sitations.

The new R 4.0.0 behavior is IMO  "the best of" being memory
efficient (integer storage) in most cases (back compatible to R 3.x.x) and
returning desired random numbers in large cases (compatible to R <= 2.x.x).

Martin

>> On 2020-01-20 12:33 p.m., Martin Maechler wrote:
 Benjamin Tyner 
 on Mon, 20 Jan 2020 08:10:49 -0500 writes:
>>> 
>>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
>>> >> Coming late here -- after enjoying a proper weekend ;-) --
>>> >> I have been agreeing (with Spencer, IIUC) on this for a long
>>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
>>> >> "design bug" that  rpois() {and similar} must return return typeof() 
"integer".
>>> >> 
>>> >> More strongly, I'm actually pretty convinced they should return
>>> >> (integer-valued) double instead of NA_integer_   and for that
>>> >> reason should always return double:
>>> >> Even if we have (hopefully) a native 64bit integer in R,
>>> >> 2^64 is still teeny tiny compared .Machine$double.max
>>> >> 
>>> >> (and then maybe we'd have .Machine$longdouble.max  which would
>>> >> be considerably larger than double.max unless on Windows, where
>>> >> the wise men at Microsoft decided to keep their workload simple
>>> >> by defining "long double := double" - as 'long double'
>>> >> unfortunately is not well defined by C standards)
>>> >> 
>>> >> Martin
>>> >> 
>>> > Martin if you are in favor, then certainly no objection from me! ;-)
>>> 
>>> > So now what about other discrete distributions e.g. could a similar 
>>> > enhancement apply here?
>>> 
>>> 
>>> >> rgeom(10L, 1e-10)
>>> >  [1] NA 1503061294 NA NA 1122447583 NA
>>> >  [7] NA NA NA NA
>>> > Warning message:
>>> > In rgeom(10L, 1e-10) : NAs produced
>>> 
>>> yes, of course there are several such distributions.
>>> 
>>> It's really something that should be discussed (possibly not
>>> here, .. but then I've started it here ...).
>>> 
>>> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
>>> 
>>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>>> rsignrank() and rwilcox() now return integer (not double)
>>> vectors.  This halves the storage requirements for large
>>> simulations.
>>> 
>>> and what I've been suggesting is to revert this change
>>> (svn rev r60225-6) which was purposefully and diligently done by
>>> a fellow R core member, so indeed must be debatable. 
>>> 
>>> Martin
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-21 Thread Martin Maechler
> Ben Bolker 
> on Mon, 20 Jan 2020 12:54:52 -0500 writes:

> Ugh, sounds like competing priorities.

indeed.

> * maintain type consistency
> * minimize storage (= current version, since 3.0.0)
> * maximize utility for large lambda (= proposed change)
> * keep user interface, and code, simple (e.g., it would be easy enough
>   to add a switch that provided user control of int vs double return 
value)
> * backward compatibility

Last night, it came to my mind that we should do what we have
been doing in quite a few places in R, the last couple of years:

  Return integer when possible, and switch to return double when
  integers don't fit.

We've been doing so even for  1:N  (well, now with additional ALTREP wrapper),
seq(), and even the fundamental  length()  function.

So I sat down and implemented it .. and it seemed to work
perfectly:  Returning the same random numbers as now, but
switching to use double (instead of returning NAs) when the
values are too large.

I'll probably commit that to R-devel quite soonish.
Martin

> On 2020-01-20 12:33 p.m., Martin Maechler wrote:
>>> Benjamin Tyner 
>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes:
>> 
>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
>> >> Coming late here -- after enjoying a proper weekend ;-) --
>> >> I have been agreeing (with Spencer, IIUC) on this for a long
>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
>> >> "design bug" that  rpois() {and similar} must return return typeof() 
"integer".
>> >> 
>> >> More strongly, I'm actually pretty convinced they should return
>> >> (integer-valued) double instead of NA_integer_   and for that
>> >> reason should always return double:
>> >> Even if we have (hopefully) a native 64bit integer in R,
>> >> 2^64 is still teeny tiny compared .Machine$double.max
>> >> 
>> >> (and then maybe we'd have .Machine$longdouble.max  which would
>> >> be considerably larger than double.max unless on Windows, where
>> >> the wise men at Microsoft decided to keep their workload simple
>> >> by defining "long double := double" - as 'long double'
>> >> unfortunately is not well defined by C standards)
>> >> 
>> >> Martin
>> >> 
>> > Martin if you are in favor, then certainly no objection from me! ;-)
>> 
>> > So now what about other discrete distributions e.g. could a similar 
>> > enhancement apply here?
>> 
>> 
>> >> rgeom(10L, 1e-10)
>> >  [1] NA 1503061294 NA NA 1122447583 NA
>> >  [7] NA NA NA NA
>> > Warning message:
>> > In rgeom(10L, 1e-10) : NAs produced
>> 
>> yes, of course there are several such distributions.
>> 
>> It's really something that should be discussed (possibly not
>> here, .. but then I've started it here ...).
>> 
>> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
>> 
>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>> rsignrank() and rwilcox() now return integer (not double)
>> vectors.  This halves the storage requirements for large
>> simulations.
>> 
>> and what I've been suggesting is to revert this change
>> (svn rev r60225-6) which was purposefully and diligently done by
>> a fellow R core member, so indeed must be debatable. 
>> 
>> Martin
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-20 Thread Ben Bolker


 Ugh, sounds like competing priorities.

  * maintain type consistency
  * minimize storage (= current version, since 3.0.0)
  * maximize utility for large lambda (= proposed change)
  * keep user interface, and code, simple (e.g., it would be easy enough
to add a switch that provided user control of int vs double return value)
  * backward compatibility



On 2020-01-20 12:33 p.m., Martin Maechler wrote:
>> Benjamin Tyner 
>> on Mon, 20 Jan 2020 08:10:49 -0500 writes:
> 
> > On 1/20/20 4:26 AM, Martin Maechler wrote:
> >> Coming late here -- after enjoying a proper weekend ;-) --
> >> I have been agreeing (with Spencer, IIUC) on this for a long
> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
> >> "design bug" that  rpois() {and similar} must return return typeof() 
> "integer".
> >> 
> >> More strongly, I'm actually pretty convinced they should return
> >> (integer-valued) double instead of NA_integer_   and for that
> >> reason should always return double:
> >> Even if we have (hopefully) a native 64bit integer in R,
> >> 2^64 is still teeny tiny compared .Machine$double.max
> >> 
> >> (and then maybe we'd have .Machine$longdouble.max  which would
> >> be considerably larger than double.max unless on Windows, where
> >> the wise men at Microsoft decided to keep their workload simple
> >> by defining "long double := double" - as 'long double'
> >> unfortunately is not well defined by C standards)
> >> 
> >> Martin
> >> 
> > Martin if you are in favor, then certainly no objection from me! ;-)
> 
> > So now what about other discrete distributions e.g. could a similar 
> > enhancement apply here?
> 
> 
> >> rgeom(10L, 1e-10)
> >  [1] NA 1503061294 NA NA 1122447583 NA
> >  [7] NA NA NA NA
> > Warning message:
> > In rgeom(10L, 1e-10) : NAs produced
> 
> yes, of course there are several such distributions.
> 
> It's really something that should be discussed (possibly not
> here, .. but then I've started it here ...).
> 
> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
> 
> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>   rsignrank() and rwilcox() now return integer (not double)
>   vectors.  This halves the storage requirements for large
>   simulations.
> 
> and what I've been suggesting is to revert this change
> (svn rev r60225-6) which was purposefully and diligently done by
> a fellow R core member, so indeed must be debatable. 
> 
> Martin
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-20 Thread Martin Maechler
> Benjamin Tyner 
> on Mon, 20 Jan 2020 08:10:49 -0500 writes:

> On 1/20/20 4:26 AM, Martin Maechler wrote:
>> Coming late here -- after enjoying a proper weekend ;-) --
>> I have been agreeing (with Spencer, IIUC) on this for a long
>> time (~ 3 yrs, or more?), namely that I've come to see it as a
>> "design bug" that  rpois() {and similar} must return return typeof() 
"integer".
>> 
>> More strongly, I'm actually pretty convinced they should return
>> (integer-valued) double instead of NA_integer_   and for that
>> reason should always return double:
>> Even if we have (hopefully) a native 64bit integer in R,
>> 2^64 is still teeny tiny compared .Machine$double.max
>> 
>> (and then maybe we'd have .Machine$longdouble.max  which would
>> be considerably larger than double.max unless on Windows, where
>> the wise men at Microsoft decided to keep their workload simple
>> by defining "long double := double" - as 'long double'
>> unfortunately is not well defined by C standards)
>> 
>> Martin
>> 
> Martin if you are in favor, then certainly no objection from me! ;-)

> So now what about other discrete distributions e.g. could a similar 
> enhancement apply here?


>> rgeom(10L, 1e-10)
>  [1] NA 1503061294 NA NA 1122447583 NA
>  [7] NA NA NA NA
> Warning message:
> In rgeom(10L, 1e-10) : NAs produced

yes, of course there are several such distributions.

It's really something that should be discussed (possibly not
here, .. but then I've started it here ...).

The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :

* Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
  rsignrank() and rwilcox() now return integer (not double)
  vectors.  This halves the storage requirements for large
  simulations.

and what I've been suggesting is to revert this change
(svn rev r60225-6) which was purposefully and diligently done by
a fellow R core member, so indeed must be debatable. 

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-20 Thread Benjamin Tyner

On 1/20/20 4:26 AM, Martin Maechler wrote:

Coming late here -- after enjoying a proper weekend ;-) --
I have been agreeing (with Spencer, IIUC) on this for a long
time (~ 3 yrs, or more?), namely that I've come to see it as a
"design bug" that  rpois() {and similar} must return return typeof() "integer".

More strongly, I'm actually pretty convinced they should return
(integer-valued) double instead of NA_integer_   and for that
reason should always return double:
Even if we have (hopefully) a native 64bit integer in R,
2^64 is still teeny tiny compared .Machine$double.max

(and then maybe we'd have .Machine$longdouble.max  which would
  be considerably larger than double.max unless on Windows, where
  the wise men at Microsoft decided to keep their workload simple
  by defining "long double := double" - as 'long double'
  unfortunately is not well defined by C standards)

Martin


Martin if you are in favor, then certainly no objection from me! ;-)

So now what about other discrete distributions e.g. could a similar 
enhancement apply here?


> rgeom(10L, 1e-10)
 [1] NA 1503061294 NA NA 1122447583 NA
 [7] NA NA NA NA
Warning message:
In rgeom(10L, 1e-10) : NAs produced

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-20 Thread Martin Maechler
> Spencer Graves 
> on Sun, 19 Jan 2020 21:35:04 -0600 writes:

> Thanks to Luke and Avi for their comments.  I wrapped "round" around the 
> call to "rnorm" inside my "rpois.".  For "lambda" really big, that 
> "round" won't do anything.  However, it appears to give integers in 
> floating point representation that are larger than 
> .Machine$integer.max.  That sounds very much like what someone would 
> want.  Spencer

Coming late here -- after enjoying a proper weekend ;-) --

I have been agreeing (with Spencer, IIUC) on this for a long
time (~ 3 yrs, or more?), namely that I've come to see it as a
"design bug" that  rpois() {and similar} must return return typeof() "integer".

More strongly, I'm actually pretty convinced they should return
(integer-valued) double instead of NA_integer_   and for that
reason should always return double: 
Even if we have (hopefully) a native 64bit integer in R,
2^64 is still teeny tiny compared .Machine$double.max

(and then maybe we'd have .Machine$longdouble.max  which would
 be considerably larger than double.max unless on Windows, where
 the wise men at Microsoft decided to keep their workload simple
 by defining "long double := double" - as 'long double'
 unfortunately is not well defined by C standards) 

Martin

> On 2020-01-19 21:00, Tierney, Luke wrote:
>> R uses the C 'int' type for its integer data and that is pretty much
>> universally 32 bit these days. In fact R wont' compile if it is not.
>> That means the range for integer data is the integers in [-2^31,
>> +2^31).
>> 
>> It would be good to allow for a larger integer range for R integer
>> objects, and several of us are thinking about how me might get there.
>> But it isn't easy to get right, so it may take some time. I doubt
>> anything can happen for R 4.0.0 this year, but 2021 may be possible.
>> 
>> I few notes inline below:
>> 
>> On Sun, 19 Jan 2020, Spencer Graves wrote:
>> 
>>> On my Mac:
>>> 
>>> 
>>> str(.Machine)
>>> ...
>>> $ integer.max  : int 2147483647
>>>  $ sizeof.long  : int 8
>>>  $ sizeof.longlong  : int 8
>>>  $ sizeof.longdouble    : int 16
>>>  $ sizeof.pointer   : int 8
>>> 
>>> 
>>>   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
>>> the same as on my Mac.
>> One of many annoyances of Windows -- done for compatibility with
>> ancient Window apps.
>> 
>>>   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
>>> log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
>>> double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
>>> 16 = 4 words of 4 bytes each?
>> double precision is a floating point concept, not related to integers.
>> 
>> If you want to figure out whether you are running a 32 bit or 64 bit R
>> look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.
>> 
>> Best,
>> 
>> luke
>> 
>> 
>>> 
>>>   Spencer
>>> 
>>> 
>>> On 2020-01-19 15:41, Avraham Adler wrote:
 Floor (maybe round) of non-negative numerics, though. Poisson should
 never have anything after decimal.
 
 Still think it’s worth allowing long long for R64 bit, just for purity
 sake.
 
 Avi
 
 On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
 mailto:spencer.gra...@prodsyse.com>> 
wrote:
 
 
 
 On 2020-01-19 13:01, Avraham Adler wrote:
> Crazy thought, but being that a sum of Poissons is Poisson in the
> sum, can you break your “big” simulation into the sum of a few
> smaller ones? Or is the order of magnitude difference just too great?
 
   I don't perceive that as feasible.  Once I found what was
 generating NAs, it was easy to code a function to return
 pseudo-random numbers using the standard normal approximation to
 the Poisson for those extreme cases.  [For a Poisson with mean =
 1e6, for example, the skewness (third standardized moment) is
 0.001.  At least for my purposes, that should be adequate.][1]
 
 
   What are the negative consequences of having rpois return
 numerics that are always nonnegative?
 
 
   Spencer
 
 
 [1]  In the code I reported before, I just changed the threshold
 of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
 .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
 that a Poisson distributed pseudo-random number just under that
 would have to be over 23000 standard deviations above the mean to
 exceed .Machine$integer.max.
 
> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>  

Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-19 Thread Spencer Graves
Thanks to Luke and Avi for their comments.  I wrapped "round" around the 
call to "rnorm" inside my "rpois.".  For "lambda" really big, that 
"round" won't do anything.  However, it appears to give integers in 
floating point representation that are larger than 
.Machine$integer.max.  That sounds very much like what someone would 
want.  Spencer



On 2020-01-19 21:00, Tierney, Luke wrote:

R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:


On my Mac:


str(.Machine)
...
$ integer.max  : int 2147483647
  $ sizeof.long  : int 8
  $ sizeof.longlong  : int 8
  $ sizeof.longdouble    : int 16
  $ sizeof.pointer   : int 8


   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.


   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke




   Spencer


On 2020-01-19 15:41, Avraham Adler wrote:

Floor (maybe round) of non-negative numerics, though. Poisson should
never have anything after decimal.

Still think it’s worth allowing long long for R64 bit, just for purity
sake.

Avi

On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
mailto:spencer.gra...@prodsyse.com>> wrote:



 On 2020-01-19 13:01, Avraham Adler wrote:

 Crazy thought, but being that a sum of Poissons is Poisson in the
 sum, can you break your “big” simulation into the sum of a few
 smaller ones? Or is the order of magnitude difference just too great?


   I don't perceive that as feasible.  Once I found what was
 generating NAs, it was easy to code a function to return
 pseudo-random numbers using the standard normal approximation to
 the Poisson for those extreme cases.  [For a Poisson with mean =
 1e6, for example, the skewness (third standardized moment) is
 0.001.  At least for my purposes, that should be adequate.][1]


   What are the negative consequences of having rpois return
 numerics that are always nonnegative?


   Spencer


 [1]  In the code I reported before, I just changed the threshold
 of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
 .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
 that a Poisson distributed pseudo-random number just under that
 would have to be over 23000 standard deviations above the mean to
 exceed .Machine$integer.max.


 On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
 mailto:spencer.gra...@prodsyse.com>> wrote:

   This issue arose for me in simulations to estimate
 confidence, prediction, and tolerance intervals from glm(.,
 family=poisson) fits embedded in a BMA::bic.glm fit using a
 simulate.bic.glm function I added to the development version
 of Ecfun, available at "https://github.com/sbgraves237/Ecfun;
 . This is part of a
 vignette I'm developing, available at
 
"https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd;
 
.
 This includes a simulated mean of a mixture of Poissons that
 exceeds 2e22.  It doesn't seem unreasonable to me to have
 rpois output a numerics rather than integers when a number
 simulated exceeds .Machine$integer.max.  And it does seem to
 make less sense in such cases to return NAs.


    Alternatively, might it make sense to add another
 argument to rpois to give the user the choice?  E.g., an
 argument "bigOutput" with (I hope) default = "numeric" and
 "NA" as a second option.  Or NA is the default, so no code
 that relied that feature of the current code would be broken
 by the change.  If someone wanted to use arbitrary precision
 arithmetic, they could write their own version of this
 function with "arbitraryPrecision" as an optional value for
 the "bigOutput" argument.


  

Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-19 Thread Tierney, Luke
R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:

> On my Mac:
>
>
> str(.Machine)
> ...
> $ integer.max  : int 2147483647
>  $ sizeof.long  : int 8
>  $ sizeof.longlong  : int 8
>  $ sizeof.longdouble    : int 16
>  $ sizeof.pointer   : int 8
>
>
>   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
> the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.

>   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
> log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
> double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
> 16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke


>
>
>   Spencer
>
>
> On 2020-01-19 15:41, Avraham Adler wrote:
>> Floor (maybe round) of non-negative numerics, though. Poisson should
>> never have anything after decimal.
>>
>> Still think it’s worth allowing long long for R64 bit, just for purity
>> sake.
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>> mailto:spencer.gra...@prodsyse.com>> wrote:
>>
>>
>>
>> On 2020-01-19 13:01, Avraham Adler wrote:
>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>> sum, can you break your “big” simulation into the sum of a few
>>> smaller ones? Or is the order of magnitude difference just too great?
>>
>>
>>   I don't perceive that as feasible.  Once I found what was
>> generating NAs, it was easy to code a function to return
>> pseudo-random numbers using the standard normal approximation to
>> the Poisson for those extreme cases.  [For a Poisson with mean =
>> 1e6, for example, the skewness (third standardized moment) is
>> 0.001.  At least for my purposes, that should be adequate.][1]
>>
>>
>>   What are the negative consequences of having rpois return
>> numerics that are always nonnegative?
>>
>>
>>   Spencer
>>
>>
>> [1]  In the code I reported before, I just changed the threshold
>> of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>> that a Poisson distributed pseudo-random number just under that
>> would have to be over 23000 standard deviations above the mean to
>> exceed .Machine$integer.max.
>>
>>>
>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>> >> > wrote:
>>>
>>>   This issue arose for me in simulations to estimate
>>> confidence, prediction, and tolerance intervals from glm(.,
>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>> simulate.bic.glm function I added to the development version
>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun;
>>> . This is part of a
>>> vignette I'm developing, available at
>>> 
>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd;
>>> 
>>> .
>>> This includes a simulated mean of a mixture of Poissons that
>>> exceeds 2e22.  It doesn't seem unreasonable to me to have
>>> rpois output a numerics rather than integers when a number
>>> simulated exceeds .Machine$integer.max.  And it does seem to
>>> make less sense in such cases to return NAs.
>>>
>>>
>>>    Alternatively, might it make sense to add another
>>> argument to rpois to give the user the choice?  E.g., an
>>> argument "bigOutput" with (I hope) default = "numeric" and
>>> "NA" as a second option.  Or NA is the default, so no code
>>> that relied that feature of the current code would be broken
>>> by the change.  If someone wanted to use arbitrary precision
>>> arithmetic, they could write their own version of this
>>> function with "arbitraryPrecision" as an optional value for
>>> the "bigOutput" argument.
>>>
>>>
>>>   Comments?
>>>   Thanks,
>>>   Spencer Graves
>>>
>>>
>>>
>>> On 2020-01-19 10:28, Avraham Adler wrote: