On May 6, 2016, at 6:03 PM, Brandon Hurr <bhiv...@gmail.com> wrote:

> Simon,
> 
> Absolutely was about RDS, but R is all about choices and the
> underlying issue was time to read in data which fread and feather are
> quite fast at. I assume when you say efficient you are referring to
> disk space?
> 

No, parsing data is always slower than native formats. Really fastest is 
readBin (and similar direct I/O approaches), followed by feather and RDS (the 
only reason RDS is not the fastest is that there is an extra copy in-memory) -- 
unless you have slow disk, of course.


> I put together a script to look at this further with and without
> compression*. If speed is a priority over disk space then Feather and
> data.table (CSV) are good options**. CSV is portable to any system and
> feather can be used by python/Julia. RDS/RDA saves a lot of space and,
> but are slower to write and read due to compression.
> 

That's why I said uncompressed RDS [compress=FALSE] - you compress only if you 
want to save space, not speed :).

FWIW according to our benchmarks iotools is the fastest for reading CSV if you 
want to get into that arena, but that's whole another story - my point was that 
the question was NOT about CSV or anything parsed - and neither about writing - 
which is why this is getting really OT.

Cheers,
Simon



> I hope that's helpful to those thinking about their priorities for
> file IO in R.
> 
> Brandon
> 
> * http://rpubs.com/bhive01/fileioinr
> **  writing a CSV with data.table is freaky fast if you can get OpenMP
> working on your machine
> https://github.com/Rdatatable/data.table/issues/1692 Reading that same
> CSV is comparable to RDS.
> 
> 
> On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek
> <simon.urba...@r-project.org> wrote:
>> Brandon,
>> note that the post was about RDS which is more efficient than all the 
>> options you list (in particular when not compressed). General advice is to 
>> avoid strings. Numeric vectors are several orders of magnitude faster than 
>> strings to load/save.
>> Cheers,
>> Simon
>> 
>> 
>>> On May 5, 2016, at 6:49 PM, Brandon Hurr <bhiv...@gmail.com> wrote:
>>> 
>>> You might be interested in the speed wars that are happening in the
>>> file reading/writing space currently.
>>> 
>>> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes
>>> McKinney's Feather have made huge speed advances in reading/writing
>>> large datasets from disks (mostly csv).
>>> 
>>> Data Table fread()/fwrite():
>>> https://github.com/Rdatatable/data.table
>>> https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files
>>> http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/
>>> 
>>> 
>>> Feather read_feather()/write_feather()
>>> https://github.com/wesm/feather
>>> 
>>> I don't often have big datasets (10s of MBs) so I don't see the
>>> benefits of these much, but you might.
>>> 
>>> HTH,
>>> B
>>> 
>>> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio
>>> <charles.dimag...@gmail.com> wrote:
>>>> Been a while, but wanted to close the page on a previous post describing R 
>>>> hanging on readRDS() and load() for largish (say 500MB or larger) files. 
>>>> Tried again with recent release (3.3.0).  Am able to read in large files 
>>>> under El Cap.  While the file is reading in, I get a disconcerting 
>>>> spinning pinwheel of death and a check under Force Quit reports R is not 
>>>> responding.  But if I wait it out, it eventually reads in.  Odd.  But I 
>>>> can live with it.
>>>> 
>>>> Cheers
>>>> 
>>>> Charles
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Charles DiMaggio, PhD, MPH
>>>> Professor of Surgery and Population Health
>>>> Director of Injury Research
>>>> Department of Surgery
>>>> New York University School of Medicine
>>>> 462 First Avenue, NBV 15
>>>> New York, NY 10016-9196
>>>> charles.dimag...@nyumc.org
>>>> Office: 212.263.3202
>>>> Mobile: 516.308.6426
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>       [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>> 
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>> 
>> 
> 

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Reply via email to