Glad the post elicited some discussion.  Haven�t played with feather.  I�ve 
used data.table and it is indeed appreciably faster than base approaches for 
getting big csv�s into R.  I also find dplyr (with say MonetDB) to be a 
solution for out-of-memory approaches to large data sets. But, for native R 
files, I�ve found RDS to be fastest.  



On May 6, 2016, at 9:01 PM, Simon Urbanek <> wrote:

> On May 6, 2016, at 6:03 PM, Brandon Hurr <> wrote:
>> Simon,
>> Absolutely was about RDS, but R is all about choices and the
>> underlying issue was time to read in data which fread and feather are
>> quite fast at. I assume when you say efficient you are referring to
>> disk space?
> No, parsing data is always slower than native formats. Really fastest is 
> readBin (and similar direct I/O approaches), followed by feather and RDS (the 
> only reason RDS is not the fastest is that there is an extra copy in-memory) 
> -- unless you have slow disk, of course.
>> I put together a script to look at this further with and without
>> compression*. If speed is a priority over disk space then Feather and
>> data.table (CSV) are good options**. CSV is portable to any system and
>> feather can be used by python/Julia. RDS/RDA saves a lot of space and,
>> but are slower to write and read due to compression.
> That's why I said uncompressed RDS [compress=FALSE] - you compress only if 
> you want to save space, not speed :).
> FWIW according to our benchmarks iotools is the fastest for reading CSV if 
> you want to get into that arena, but that's whole another story - my point 
> was that the question was NOT about CSV or anything parsed - and neither 
> about writing - which is why this is getting really OT.
> Cheers,
> Simon
>> I hope that's helpful to those thinking about their priorities for
>> file IO in R.
>> Brandon
>> *
>> **  writing a CSV with data.table is freaky fast if you can get OpenMP
>> working on your machine
>> Reading that same
>> CSV is comparable to RDS.
>> On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek
>> <> wrote:
>>> Brandon,
>>> note that the post was about RDS which is more efficient than all the 
>>> options you list (in particular when not compressed). General advice is to 
>>> avoid strings. Numeric vectors are several orders of magnitude faster than 
>>> strings to load/save.
>>> Cheers,
>>> Simon
>>>> On May 5, 2016, at 6:49 PM, Brandon Hurr <> wrote:
>>>> You might be interested in the speed wars that are happening in the
>>>> file reading/writing space currently.
>>>> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes
>>>> McKinney's Feather have made huge speed advances in reading/writing
>>>> large datasets from disks (mostly csv).
>>>> Data Table fread()/fwrite():
>>>> Feather read_feather()/write_feather()
>>>> I don't often have big datasets (10s of MBs) so I don't see the
>>>> benefits of these much, but you might.
>>>> HTH,
>>>> B
>>>> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio
>>>> <> wrote:
>>>>> Been a while, but wanted to close the page on a previous post describing 
>>>>> R hanging on readRDS() and load() for largish (say 500MB or larger) 
>>>>> files. Tried again with recent release (3.3.0).  Am able to read in large 
>>>>> files under El Cap.  While the file is reading in, I get a disconcerting 
>>>>> spinning pinwheel of death and a check under Force Quit reports R is not 
>>>>> responding.  But if I wait it out, it eventually reads in.  Odd.  But I 
>>>>> can live with it.
>>>>> Cheers
>>>>> Charles
>>>>> Charles DiMaggio, PhD, MPH
>>>>> Professor of Surgery and Population Health
>>>>> Director of Injury Research
>>>>> Department of Surgery
>>>>> New York University School of Medicine
>>>>> 462 First Avenue, NBV 15
>>>>> New York, NY 10016-9196
>>>>> Office: 212.263.3202
>>>>> Mobile: 516.308.6426
>>>>>      [[alternative HTML version deleted]]
>>>>> _______________________________________________
>>>>> R-SIG-Mac mailing list
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list

        [[alternative HTML version deleted]]

R-SIG-Mac mailing list

Reply via email to