[R] reading data saved with writeBin() into anything other than R

Mike Miller Mon, 21 Apr 2014 18:02:08 -0700

After saving a file like so...

con <- gzcon("file.gz", "wb"))
writeBin(vector, con, size=2)
close(con)


I can read it back into R like so...

con <- gzcon("file.gz", "rb"))
vector <- readBin(con, integer(), 48000000, size=2, signed=FALSE)
close(con)

...and I'm wondering what other programs might be able to read in thesedata. It seems to be very straightforward: When I store 5436 integersfor each of 7694 subjects, at two bytes per integer that ought to be5436*7696*2 = 83670912 bytes, and it is exactly that:


$ zcat file.gz | wc -c
83670912

So if I just convert every pair of bytes to an integer, I guess that willdo it. I stored them this way because it was compact, but I guess thissystem also can work well when other software needs to read the data.For me that other software would probably be Octave. I'm interested ifanyone here has read in these files using Octave, or a C program oranything else. If I don't get a good answer here, I'll try the Octavelist, and I'll send my best answers here.

The rest of this is some related info for readers of this list. You don'tneed to read below to answer my question above. Thanks.

In case anyone is interested, I did some comparisons of loading speed andfile size for a number of ways of storing my data. These data all consistof positive numbers between 0 and 2, with three digits to the right of thedecimal, so I can save them as floating point double-precision, ormultiply by 1000 and store them as integers. The test here as for amatrix of 5000 x 7845 = 39,225,000 values. These are the file sizes:


   202.1 MB  tab-delimited text file, original, uncompressed
    29.9 MB  tab-delimited text file, original, gzip compressed
   187.7 MB  tab-delimited text file, integers, uncompressed
    24.6 MB  tab-delimited text file, integers, gzip compressed
    38.9 MB  R save() original numeric values (doubles)
    27.0 MB  R save() integers
    19.7 MB  R writeBin() 16-bit integer gzipped

So, for file size (important in my case), the gzipped writeBin() methodstoring 16-bit integers was the winner. Impressively, storing the datathat way and dividing by 1000 on the fly to return the original numberswas faster than reading an Rdata file of the matrix:


The integer text file:

system.time( D <- matrix( scan( file = "D/D000", what=integer(0) ), ncol=7845, 
byrow=TRUE ) )

Read 39225000 items
    user  system elapsed
  10.626   0.344  10.971


The R save() original numeric values (doubles):

system.time( load("D000_test.Rdata") )

    user  system elapsed
   5.579   0.119   5.698


The R save() integers:

system.time( load("D000_test.Rdata") )

    user  system elapsed
   4.863   0.050   4.913


The writeBin() 16-bit integer gzipped file:

con <- gzcon(file("D000_test.gz", "rb"))
system.time( D <- matrix( readBin( con, integer(), 7845*5000, size=2, 
signed=FALSE ), ncol=7845, byrow=TRUE ) )

    user  system elapsed
   3.769   0.138   3.906

close(con)

The writeBin() 16-bit integer gzipped file, converted to numeric bydividing by 1000 on the fly:

system.time( D <- matrix( readBin( con, integer(), 7845*5000, size=2, 
signed=FALSE ), ncol=7845, byrow=TRUE )/1000 )

    user  system elapsed
   4.159   0.237   4.397

close(con)



Best,

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reading data saved with writeBin() into anything other than R

Reply via email to