R uses the C 'int' type for its integer data and that is pretty much universally 32 bit these days. In fact R wont' compile if it is not. That means the range for integer data is the integers in [-2^31, +2^31).
It would be good to allow for a larger integer range for R integer objects, and several of us are thinking about how me might get there. But it isn't easy to get right, so it may take some time. I doubt anything can happen for R 4.0.0 this year, but 2021 may be possible. I few notes inline below: On Sun, 19 Jan 2020, Spencer Graves wrote: > On my Mac: > > > str(.Machine) > ... > $ integer.max : int 2147483647 > $ sizeof.long : int 8 > $ sizeof.longlong : int 8 > $ sizeof.longdouble : int 16 > $ sizeof.pointer : int 8 > > > On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise > the same as on my Mac. One of many annoyances of Windows -- done for compatibility with ancient Window apps. > Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits? > log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called > double precision (2 words of 4 bytes each)? And $ sizeof.longdouble = > 16 = 4 words of 4 bytes each? double precision is a floating point concept, not related to integers. If you want to figure out whether you are running a 32 bit or 64 bit R look at sizeof.pointer -- 4 means 32 bits, 8 64 bits. Best, luke > > > Spencer > > > On 2020-01-19 15:41, Avraham Adler wrote: >> Floor (maybe round) of non-negative numerics, though. Poisson should >> never have anything after decimal. >> >> Still think it’s worth allowing long long for R64 bit, just for purity >> sake. >> >> Avi >> >> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves >> <spencer.gra...@prodsyse.com <mailto:spencer.gra...@prodsyse.com>> wrote: >> >> >> >> On 2020-01-19 13:01, Avraham Adler wrote: >>> Crazy thought, but being that a sum of Poissons is Poisson in the >>> sum, can you break your “big” simulation into the sum of a few >>> smaller ones? Or is the order of magnitude difference just too great? >> >> >> I don't perceive that as feasible. Once I found what was >> generating NAs, it was easy to code a function to return >> pseudo-random numbers using the standard normal approximation to >> the Poisson for those extreme cases. [For a Poisson with mean = >> 1e6, for example, the skewness (third standardized moment) is >> 0.001. At least for my purposes, that should be adequate.][1] >> >> >> What are the negative consequences of having rpois return >> numerics that are always nonnegative? >> >> >> Spencer >> >> >> [1] In the code I reported before, I just changed the threshold >> of 1e6 to 0.5*.Machine$integer.max. On my Mac, >> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means >> that a Poisson distributed pseudo-random number just under that >> would have to be over 23000 standard deviations above the mean to >> exceed .Machine$integer.max. >> >>> >>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves >>> <spencer.gra...@prodsyse.com >>> <mailto:spencer.gra...@prodsyse.com>> wrote: >>> >>> This issue arose for me in simulations to estimate >>> confidence, prediction, and tolerance intervals from glm(., >>> family=poisson) fits embedded in a BMA::bic.glm fit using a >>> simulate.bic.glm function I added to the development version >>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun" >>> <https://github.com/sbgraves237/Ecfun>. This is part of a >>> vignette I'm developing, available at >>> >>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd" >>> >>> <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>. >>> This includes a simulated mean of a mixture of Poissons that >>> exceeds 2e22. It doesn't seem unreasonable to me to have >>> rpois output a numerics rather than integers when a number >>> simulated exceeds .Machine$integer.max. And it does seem to >>> make less sense in such cases to return NAs. >>> >>> >>> Alternatively, might it make sense to add another >>> argument to rpois to give the user the choice? E.g., an >>> argument "bigOutput" with (I hope) default = "numeric" and >>> "NA" as a second option. Or NA is the default, so no code >>> that relied that feature of the current code would be broken >>> by the change. If someone wanted to use arbitrary precision >>> arithmetic, they could write their own version of this >>> function with "arbitraryPrecision" as an optional value for >>> the "bigOutput" argument. >>> >>> >>> Comments? >>> Thanks, >>> Spencer Graves >>> >>> >>> >>> On 2020-01-19 10:28, Avraham Adler wrote: >>>> Technically, lambda can always be numeric. It is the >>>> observations which must be integral. >>>> >>>> Would hitting everything larger than maxint or maxlonglong >>>> with floor or round fundamentally change the distribution? >>>> Well, yes, but enough that it would matter over process risk? >>>> >>>> Avi >>>> >>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner >>>> <bty...@gmail.com <mailto:bty...@gmail.com>> wrote: >>>> >>>> So imagine rpois is changed, such that the storage mode >>>> of its return >>>> value is sometimes integer and sometimes numeric. Then >>>> imagine the case >>>> where lambda is itself a realization of a random >>>> variable. Do we really >>>> want the storage mode to inherit that randomness? >>>> >>>> >>>> On 1/19/20 10:47 AM, Avraham Adler wrote: >>>> > Maybe there should be code for 64 bit R to use long >>>> long or the like? >>>> > >>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves >>>> > <spencer.gra...@prodsyse.com >>>> <mailto:spencer.gra...@prodsyse.com> >>>> <mailto:spencer.gra...@prodsyse.com >>>> <mailto:spencer.gra...@prodsyse.com>>> wrote: >>>> > >>>> > >>>> > >>>> > On 2020-01-19 09:34, Benjamin Tyner wrote: >>>> > >> >>>> > >>>> >>>> ------------------------------------------------------------------------ >>>> > >> Hello, All: >>>> > >> >>>> > >> >>>> > >> Consider: >>>> > >> >>>> > >> >>>> > >> Browse[2]> set.seed(1) >>>> > >> Browse[2]> rpois(9, 1e10) >>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA >>>> > >> >>>> > >> >>>> > >> Should this happen? >>>> > >> >>>> > >> >>>> > >> I think that for, say, lambda>1e6, >>>> rpois should return >>>> > rnorm(., >>>> > >> lambda, sqrt(lambda)). >>>> > > But need to implement carefully; rpois should >>>> always return a >>>> > > non-negative integer, whereas rnorm always >>>> returns numeric... >>>> > > >>>> > >>>> > Thanks for the reply. >>>> > >>>> > >>>> > However, I think it's not acceptable to get >>>> an NA from a >>>> > number >>>> > that cannot be expressed as an integer. Whenever >>>> a randomly >>>> > generated >>>> > number would exceed .Machine$integer.max, the >>>> choice is between >>>> > returning NA or a non-integer numeric. Consider: >>>> > >>>> > >>>> > > 2*.Machine$integer.max >>>> > [1] 4294967294 >>>> > > as.integer(2*.Machine$integer.max) >>>> > [1] NA >>>> > Warning message: >>>> > NAs introduced by coercion to integer range >>>> > >>>> > >>>> > I'd rather have the non-integer numeric. >>>> > >>>> > >>>> > Spencer >>>> > >>>> > ______________________________________________ >>>> > R-devel@r-project.org <mailto:R-devel@r-project.org> >>>> <mailto:R-devel@r-project.org >>>> <mailto:R-devel@r-project.org>> mailing list >>>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>>> > >>>> > -- >>>> > Sent from Gmail Mobile >>>> >>>> -- >>>> Sent from Gmail Mobile >>> >>> -- >>> Sent from Gmail Mobile >> >> -- >> Sent from Gmail Mobile > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel