Re: [Rd] read.csv reads more rows than indicated by wc -l

Matthew Dowle Thu, 20 Dec 2012 15:48:30 -0800


Ben,

Somewhere on my wish/TO DO list is for someone to rewrite read.tablefor
better robustness *and* efficiency ...


Wish granted. New in data.table 1.8.7 :

=====
New function fread(), a fast and friendly file reader.
*  header, skip, nrows, sep and colClasses are all auto detected.
*  integers>2^31 are detected and read natively as bit64::integer64.
*  accepts filenames, URLs and "A,B\n1,2\n3,4" directly
*  new implementation entirely in C
*  with a 50MB .csv, 1 million rows x 6 columns :

read.csv("test.csv") #30-60 secread.table("test.csv",<all known tricks and known nrows>) #10 secfread("test.csv") #3 sec

* airline data: 658MB csv (7 million rows x 29 columns)

read.table("2008.csv",<all known tricks and known nrows>) #360 secfread("2008.csv") #50 secSee ?fread. Many thanks to Chris Neff and Garrett See for ideas,discussions

and beta testing.
=====

The help page ?fread is fairly well developed :
https://r-forge.r-project.org/scm/viewvc.php/pkg/man/fread.Rd?view=markup&root=datatable

Comments, feedback and bug reports very welcome.

Matthew

http://datatable.r-forge.r-project.org/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] read.csv reads more rows than indicated by wc -l

Reply via email to