Ben,

Somewhere on my wish/TO DO list is for someone to rewrite read.table for
better robustness *and* efficiency ...

Wish granted. New in data.table 1.8.7 :

=====
New function fread(), a fast and friendly file reader.
*  header, skip, nrows, sep and colClasses are all auto detected.
*  integers>2^31 are detected and read natively as bit64::integer64.
*  accepts filenames, URLs and "A,B\n1,2\n3,4" directly
*  new implementation entirely in C
*  with a 50MB .csv, 1 million rows x 6 columns :
read.csv("test.csv") # 30-60 sec read.table("test.csv",<all known tricks and known nrows>) # 10 sec fread("test.csv") # 3 sec
* airline data: 658MB csv (7 million rows x 29 columns)
read.table("2008.csv",<all known tricks and known nrows>) # 360 sec fread("2008.csv") # 50 sec See ?fread. Many thanks to Chris Neff and Garrett See for ideas, discussions
and beta testing.
=====

The help page ?fread is fairly well developed :
https://r-forge.r-project.org/scm/viewvc.php/pkg/man/fread.Rd?view=markup&root=datatable

Comments, feedback and bug reports very welcome.

Matthew

http://datatable.r-forge.r-project.org/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to