Re: [R] rounding down with as.integer

Mike Miller Thu, 01 Jan 2015 10:33:05 -0800

On Thu, 1 Jan 2015, Duncan Murdoch wrote:

On 31/12/2014 8:44 PM, David Winsemius wrote:
On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
This is probably a FAQ, and I don't really have a question about it, but I just 
ran across this in something I was working on:
as.integer(1000*1.003)
[1] 1002

I didn't expect it, but maybe I should have.  I guess it's about the machine 
precision added to the fact that as.integer always rounds down:
as.integer(1000*1.003 + 255 * .Machine$double.eps)
[1] 1002
as.integer(1000*1.003 + 256 * .Machine$double.eps)
[1] 1003


This does it right...
as.integer( round( 1000*1.003 ) )
[1] 1003

...but this seems to always give the same answer and it is a little faster in 
my application:
as.integer( 1000*1.003 + .1 )
[1] 1003


FYI - I'm reading in a long vector of numbers from a text file with no more 
than three digits to the right of the decimal.  I'm converting them to integers 
and saving them in binary format.
So just add 0.0001 or even .0000001 to all of them and coerce to integer.
I don't think the original problem was stated clearly, so I'm not surewhether this is a solution, but it looks wrong to me. If you want toround to the nearest integer, why not use round() (without theas.integer afterwards)? Or if you really do want an integer, why add0.1 or 0.0001, why not add 0.5 before calling as.integer()? This is theclassical way to implement round().
To state the problem clearly, I'd like to know what result is expectedfor any real number x. Since R's numeric type only approximates thereal numbers we might not be able to get a perfect match, but at leastwe could quantify how close we get. Or is the input really characterdata? The original post mentioned reading numbers from a text file.

Maybe you'd like to know what I'm really doing. I have 1600 text fileseach with up to 16,000 lines with 3100 numbers per line, delimited by asingle space. The numbers are between 0 and 2, inclusive, and they haveup to three digits to the right of the decimal. Every possible value inthat range will occur in the data. Some examples numbers: 0 1 2 0.3251.12 1.9. I want to multiply by 1000 and store them as 16-bit integers(uint16).


I've been reading in the data like so:

data <- scan( file=FILE, what=double(), nmax=3100*16000)


At first I tried making the integers like so:

ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm

   user  system elapsed
  0.187   0.387   0.574

I decided I should compare with the result I got using round():

ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ; 
proc.time()-ptm

   user  system elapsed
  1.595   0.757   2.352

It is a curious fact that only a few of the values from 0 to 2000 disagreebetween the two methods:

table( ints2[ ints2 != ints ] )


 1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112

I understand that it's all about the problem of representing digitalnumbers in binary, but I still find some of the results a littlesurprising, like that list of numbers from the table() output. Foranother example:

1000+3 - 1000*(1+3/1000)

[1] 1.136868e-13

3 - 1000*(0+3/1000)

[1] 0

2000+3 - 1000*(2+3/1000)

[1] 0

See what I mean? So there is something special about the numbers around1000.

Back to the quesion at hand: I can avoid use of round() and speed thingsup a little bit by just adding a small number after multiplying by 1000:

ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ; proc.time()-ptm

   user  system elapsed
  0.224   0.594   0.818

You point out that adding .5 makes sense. That is probably a better ideaand I should take that approach under most conditions, but in this case wecan add anything between 2e-13 and about 0.99999999999 and always get thesame answer. We also have to remember that if a number might be negative(not a problem for me in this application), we need to subtract 0.5instead of adding it.


Anyway, right now this is what I'm actually doing:

con <- file( paste0(FILE, ".uint16"), "wb" )
ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE, 
what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm

Read 48013406 items
   user  system elapsed
 10.263   0.733  10.991

close(con)

By the way, writeBin() is something that I learned about here, from you,Duncan. Thanks for that, too.


Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding down with as.integer

Reply via email to