On May 6, 2016, at 6:03 PM, Brandon Hurr <bhiv...@gmail.com> wrote: > Simon, > > Absolutely was about RDS, but R is all about choices and the > underlying issue was time to read in data which fread and feather are > quite fast at. I assume when you say efficient you are referring to > disk space? >
No, parsing data is always slower than native formats. Really fastest is readBin (and similar direct I/O approaches), followed by feather and RDS (the only reason RDS is not the fastest is that there is an extra copy in-memory) -- unless you have slow disk, of course. > I put together a script to look at this further with and without > compression*. If speed is a priority over disk space then Feather and > data.table (CSV) are good options**. CSV is portable to any system and > feather can be used by python/Julia. RDS/RDA saves a lot of space and, > but are slower to write and read due to compression. > That's why I said uncompressed RDS [compress=FALSE] - you compress only if you want to save space, not speed :). FWIW according to our benchmarks iotools is the fastest for reading CSV if you want to get into that arena, but that's whole another story - my point was that the question was NOT about CSV or anything parsed - and neither about writing - which is why this is getting really OT. Cheers, Simon > I hope that's helpful to those thinking about their priorities for > file IO in R. > > Brandon > > * http://rpubs.com/bhive01/fileioinr > ** writing a CSV with data.table is freaky fast if you can get OpenMP > working on your machine > https://github.com/Rdatatable/data.table/issues/1692 Reading that same > CSV is comparable to RDS. > > > On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek > <simon.urba...@r-project.org> wrote: >> Brandon, >> note that the post was about RDS which is more efficient than all the >> options you list (in particular when not compressed). General advice is to >> avoid strings. Numeric vectors are several orders of magnitude faster than >> strings to load/save. >> Cheers, >> Simon >> >> >>> On May 5, 2016, at 6:49 PM, Brandon Hurr <bhiv...@gmail.com> wrote: >>> >>> You might be interested in the speed wars that are happening in the >>> file reading/writing space currently. >>> >>> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes >>> McKinney's Feather have made huge speed advances in reading/writing >>> large datasets from disks (mostly csv). >>> >>> Data Table fread()/fwrite(): >>> https://github.com/Rdatatable/data.table >>> https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files >>> http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ >>> >>> >>> Feather read_feather()/write_feather() >>> https://github.com/wesm/feather >>> >>> I don't often have big datasets (10s of MBs) so I don't see the >>> benefits of these much, but you might. >>> >>> HTH, >>> B >>> >>> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio >>> <charles.dimag...@gmail.com> wrote: >>>> Been a while, but wanted to close the page on a previous post describing R >>>> hanging on readRDS() and load() for largish (say 500MB or larger) files. >>>> Tried again with recent release (3.3.0). Am able to read in large files >>>> under El Cap. While the file is reading in, I get a disconcerting >>>> spinning pinwheel of death and a check under Force Quit reports R is not >>>> responding. But if I wait it out, it eventually reads in. Odd. But I >>>> can live with it. >>>> >>>> Cheers >>>> >>>> Charles >>>> >>>> >>>> >>>> >>>> >>>> >>>> Charles DiMaggio, PhD, MPH >>>> Professor of Surgery and Population Health >>>> Director of Injury Research >>>> Department of Surgery >>>> New York University School of Medicine >>>> 462 First Avenue, NBV 15 >>>> New York, NY 10016-9196 >>>> charles.dimag...@nyumc.org >>>> Office: 212.263.3202 >>>> Mobile: 516.308.6426 >>>> >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> R-SIG-Mac mailing list >>>> R-SIG-Mac@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac >>> >>> _______________________________________________ >>> R-SIG-Mac mailing list >>> R-SIG-Mac@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac >>> >> > _______________________________________________ R-SIG-Mac mailing list R-SIG-Mac@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac