I am trying to read a tab-delimited 1.25 GB file of 4,115,119 records each with 52 fields.
I am using R 2.11.0 on a 64-bit Windows 7 machine with 8 GB memory. I have tried the two following statements with the same results: d <- read.delim(filename, as.is=TRUE) d <- read.delim(filename, as.is=TRUE, nrows=4200000) I have tried starting R with this parameter but that changed nothing: --max-mem-size=6GB Everything appeared to have worked fine until I studied frequency counts of the fields and realized data were missing. > dim(d) [1] 3388444 52 R read 3,388,444 records and missed 726,754 records. There were no error messages or exceptions. I plotted a chart using the data and later discovered not all the data were represented in the chart. R didn't just read the first 3,388,444 records and quit. Here's what I believe happened (based on frequency counts of the first field in the data.frame from R, and independently from another source): * R read the first 1,866,296 records and then skipped 419,340 records. * Next, R read 1,325,552 records and skipped 307,414 records. * R read the last 196,596 records without any problems. Questions: Is there some memory-related parameter that I should adjust that might explain the observed details above? Shouldn't read.delim catch this failure instead of being silent about dropping data? Thanks for any help with this. Earl F Glynn Overland Park, KS ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.