[R] Help troubleshooting silent failure reading huge file with read.delim

Earl F. Glynn Wed, 06 Oct 2010 00:16:23 -0700

I am trying to read a tab-delimited 1.25 GB file of 4,115,119 records each
with 52 fields.


I am using R 2.11.0 on a 64-bit Windows 7 machine with 8 GB memory.

I have tried the two following statements with the same results:

d <- read.delim(filename, as.is=TRUE)

d <- read.delim(filename, as.is=TRUE, nrows=4200000)    

I have tried starting R with this parameter but that changed nothing:
--max-mem-size=6GB

Everything appeared to have worked fine until I studied frequency counts of
the fields and realized data were missing.

> dim(d)
[1] 3388444      52

R read 3,388,444 records and missed 726,754 records.  There were no error
messages or exceptions.  I plotted a chart using the data and later
discovered not all the data were represented in the chart. 

R didn't just read the first 3,388,444 records and quit.

Here's what I believe happened (based on frequency counts of the first field
in the data.frame from R, and independently from another source):
* R read the first 1,866,296 records and then skipped 419,340 records.  
* Next, R read 1,325,552 records and skipped 307,414 records.  
* R read the last 196,596 records without any problems.

Questions:

Is there some memory-related parameter that I should adjust that might
explain the observed details above?

Shouldn't read.delim catch this failure instead of being silent about
dropping data?

Thanks for any help with this.

Earl F Glynn
Overland Park, KS

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help troubleshooting silent failure reading huge file with read.delim

Reply via email to