I've been followeing this little tour round the murkier bistros in the back-streets of R with interest! Then it occurred to me: What is wrong with [using example data]:
x0 <- c(0,1,2,0.325,1.12,1.9,1.003) x1 <- as.integer(as.character(1000*x0)) n1 <- c(0,1000,2000,325,1120,1900,1003) x1 - n1 ## [1] 0 0 0 0 0 0 0 ## But, of course: 1000*x0 - n1 ## [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ## [5] 0.000000e+00 0.000000e+00 -1.136868e-13 Or am I missing somthing else in what Mike Miller is seeking to do? Ted. On 01-Jan-2015 19:58:02 Mike Miller wrote: > I'd have to say thanks, but no thanks, to that one! ;-) The problem is > that it will take a long time and it will give the same answer. > > The first time I did this kind of thing, a year or two ago, I manipulated > the text data to produce integers before putting the data into R. The > data were a little different -- already zero padded with three digits to > the right of the decimal and one to the left, so all I had to do was drop > the decimal point. The as.integer(1000*x+.5) method is very fast and it > works great. > > I could have done that this time, but I was also saving to other formats, > so I had the data already in the format I described. > > Mike > > > On Thu, 1 Jan 2015, Richard M. Heiberger wrote: > >> Interesting. Following someone on this list today the goal is input >> the data correctly. >> My inclination would be to read the file as text, pad each number to >> the right, drop the decimal point, >> and then read it as an integer. >> 0 1 2 0.325 1.12 1.9 >> 0.000 1.000 2.000 0.325 1.120 1.900 >> 0000 1000 2000 0325 1120 1900 >> >> The pad step is the interesting step. >> >> ## 0 1 2 0.325 1.12 1.9 >> ## 0.000 1.000 2.000 0.325 1.120 1.900 >> ## 0000 1000 2000 0325 1120 1900 >> >> x.in <- scan(text=" >> 0 1 2 0.325 1.12 1.9 1. >> ", what="") >> >> padding <- c(".000", "000", "00", "0", "") >> >> x.pad <- paste(x.in, padding[nchar(x.in)], sep="") >> >> x.nodot <- sub(".", "", x.pad, fixed=TRUE) >> >> x <- as.integer(x.nodot) >> >> >> Rich >> >> >> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmille...@gmail.com> wrote: >>> On Thu, 1 Jan 2015, Duncan Murdoch wrote: >>> >>>> On 31/12/2014 8:44 PM, David Winsemius wrote: >>>>> >>>>> >>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote: >>>>> >>>>>> This is probably a FAQ, and I don't really have a question about it, but >>>>>> I just ran across this in something I was working on: >>>>>> >>>>>>> as.integer(1000*1.003) >>>>>> >>>>>> [1] 1002 >>>>>> >>>>>> I didn't expect it, but maybe I should have. I guess it's about the >>>>>> machine precision added to the fact that as.integer always rounds down: >>>>>> >>>>>> >>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps) >>>>>> >>>>>> [1] 1002 >>>>>> >>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps) >>>>>> >>>>>> [1] 1003 >>>>>> >>>>>> >>>>>> This does it right... >>>>>> >>>>>>> as.integer( round( 1000*1.003 ) ) >>>>>> >>>>>> [1] 1003 >>>>>> >>>>>> ...but this seems to always give the same answer and it is a little >>>>>> faster in my application: >>>>>> >>>>>>> as.integer( 1000*1.003 + .1 ) >>>>>> >>>>>> [1] 1003 >>>>>> >>>>>> >>>>>> FYI - I'm reading in a long vector of numbers from a text file with no >>>>>> more than three digits to the right of the decimal. I'm converting them >>>>>> to >>>>>> integers and saving them in binary format. >>>>>> >>>>> >>>>> So just add 0.0001 or even .0000001 to all of them and coerce to integer. >>>> >>>> >>>> I don't think the original problem was stated clearly, so I'm not sure >>>> whether this is a solution, but it looks wrong to me. If you want to >>>> round >>>> to the nearest integer, why not use round() (without the as.integer >>>> afterwards)? Or if you really do want an integer, why add 0.1 or 0.0001, >>>> why not add 0.5 before calling as.integer()? This is the classical way to >>>> implement round(). >>>> >>>> To state the problem clearly, I'd like to know what result is expected for >>>> any real number x. Since R's numeric type only approximates the real >>>> numbers we might not be able to get a perfect match, but at least we could >>>> quantify how close we get. Or is the input really character data? The >>>> original post mentioned reading numbers from a text file. >>> >>> >>> >>> Maybe you'd like to know what I'm really doing. I have 1600 text files >>> each >>> with up to 16,000 lines with 3100 numbers per line, delimited by a single >>> space. The numbers are between 0 and 2, inclusive, and they have up to >>> three digits to the right of the decimal. Every possible value in that >>> range will occur in the data. Some examples numbers: 0 1 2 0.325 1.12 1.9. >>> I want to multiply by 1000 and store them as 16-bit integers (uint16). >>> >>> I've been reading in the data like so: >>> >>>> data <- scan( file=FILE, what=double(), nmax=3100*16000) >>> >>> >>> At first I tried making the integers like so: >>> >>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm >>> >>> user system elapsed >>> 0.187 0.387 0.574 >>> >>> I decided I should compare with the result I got using round(): >>> >>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ; >>>> proc.time()-ptm >>> >>> user system elapsed >>> 1.595 0.757 2.352 >>> >>> It is a curious fact that only a few of the values from 0 to 2000 disagree >>> between the two methods: >>> >>>> table( ints2[ ints2 != ints ] ) >>> >>> >>> 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 1021 1023 >>> 35651 27020 15993 11505 8967 7549 6885 6064 5512 4828 4533 4112 >>> >>> I understand that it's all about the problem of representing digital >>> numbers >>> in binary, but I still find some of the results a little surprising, like >>> that list of numbers from the table() output. For another example: >>> >>>> 1000+3 - 1000*(1+3/1000) >>> >>> [1] 1.136868e-13 >>> >>>> 3 - 1000*(0+3/1000) >>> >>> [1] 0 >>> >>>> 2000+3 - 1000*(2+3/1000) >>> >>> [1] 0 >>> >>> See what I mean? So there is something special about the numbers around >>> 1000. >>> >>> Back to the quesion at hand: I can avoid use of round() and speed things >>> up >>> a little bit by just adding a small number after multiplying by 1000: >>> >>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ; >>>> proc.time()-ptm >>> >>> user system elapsed >>> 0.224 0.594 0.818 >>> >>> You point out that adding .5 makes sense. That is probably a better idea >>> and I should take that approach under most conditions, but in this case we >>> can add anything between 2e-13 and about 0.99999999999 and always get the >>> same answer. We also have to remember that if a number might be negative >>> (not a problem for me in this application), we need to subtract 0.5 instead >>> of adding it. >>> >>> Anyway, right now this is what I'm actually doing: >>> >>>> con <- file( paste0(FILE, ".uint16"), "wb" ) >>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE, >>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm >>> >>> Read 48013406 items >>> user system elapsed >>> 10.263 0.733 10.991 >>>> >>>> close(con) >>> >>> >>> By the way, writeBin() is something that I learned about here, from you, >>> Duncan. Thanks for that, too. >>> >>> Mike >>> >>> -- >>> Michael B. Miller, Ph.D. >>> University of Minnesota >>> http://scholar.google.com/citations?user=EV_phq4AAAAJ >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@wlandres.net> Date: 01-Jan-2015 Time: 21:28:22 This message was sent by XFMail ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.