On Tue, 10 Aug 2010, Martin Maechler wrote:

{Hijacking the thread from from R-help to R-devel -- as I am
consciously shifting the focus away from the original question
...
}

David Winsemius <dwinsem...@comcast.net>
    on Tue, 10 Aug 2010 08:42:12 -0400 writes:

   > On Aug 9, 2010, at 2:45 PM, Theo Tannen wrote:

   >> Are integers strictly a signed 32 bit number on R even if
   >> I am running a 64 bit version of R on a x86_64 bit
   >> machine?
   >>
   >> I ask because I have integers stored in a hdf5 file where
   >> some of the data is 64 bit integers. When I read that
   >> into R using the hdf5 library it seems any integer
   >> greater than 2**31 returns NA.

   > That's the limit. It's hard coded and not affected by the
   > memory pointer size.

   >>
   >> Any solutions?

   > I have heard of packages that handle "big numbers". A bit
   > of searching produces suggestions to look at gmp on CRAN
   > and Rmpfr on R-Forge.

Note that Rmpfr has been on CRAN, too, for a while now.
If you only need large integers (and rationals), 'gmp' is enough
though.

*However* note that the gmp or Rmpfr (or any other arbitray
precision) implementation will be considerably slower in usage
than if there was native 64-bit integer support.

Introducing 64-bit integers natively into "base R" is an
"interesting" project, notably if we also allowed using them for
indices, and changed the internal structures to use them instead
of 32-bit.
This would allow to free ourselves from the increasingly
relevant  maximum-atomic-object-length = 2^31 problem.
The latter is something we have planned to address, possibly for
R 3.0.
However, for that, using 64-bit integers is just one
possibility, another being to use "double precision integers".
Personally, I'd prefer the "long long" (64-bit) integers quite
a bit, but there are other considerations, e.g.,
one big challenge will be to go there in a way such that not
all R packages using compiled code will have to be patched
extensively...
another aspect is how the BLAS / Lapack team will address the
problem.

At the moment, all the following are the same type:
 length of an R vector
 R integer type
 C int type
 Fortran INTEGER type

The last two are fixed at 32 bits (in practice for C, by standard for Fortran), 
and we would like the first and perhaps the second to become 64bit.

If both the R length type and the R integer type become the same 64bit type and 
replace the current integer type then every compiled package has to change to 
declare the arguments as int64 (or long, on most 64bit systems) and INTEGER*8. 
That should be all that is needed for most code, since C compilers nowadays 
already complain if you do unclean things like stuffing an int into a pointer.

If the R length type changes to something /different/ from the integer type 
then any compiled code has to be checked to see if  C int arguments are lengths 
or integers, which is more work and more error-prone.

On the other hand, changing the integer type to 64bit will presumably make 
integer code run noticeably more slowly on 32bit systems.

In both cases, the changes could be postponed by having an option to .C/.Call 
forcing lengths and integers to be passed as 32-bit. This would mean that the 
code couldn't use large integers or large vectors, but it would keep working 
indefinitely.

    -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to