Frank McCown wrote: > I have been trying to read in a large data set using read.table, but > I've only been able to grab the first 50,871 rows of the total 122,269 rows. > > > f <- > read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat", > header=TRUE, nrows=123000, comment.char="", sep="\t") > > length(f$change_rate) > [1] 50871 > > From searching the email archives, I believe this is due to size limits > of a data frame. So... > I think you believe wrongly... > 1) Why doesn't read.table give a proper warning when it doesn't place > every read item into a data frame? > That isn't the problem, it is a somewhat obscure interaction between quote= and sep= that is doing you in. Remove the sep="\t" and/or add quote="" and your life should be easier. > 2) Why isn't there a parameter to read.table that allows the user to > specify which columns s/he is interested in? This functionality would > allow extraneous columns to be ignored which would improve memory usage. > > There is! check out colClasses
> cc <- rep("NULL",5) > cc[4:5] <- NA > f <- read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat", header=TRUE, sep="\t", quote="", colClasses=cc) > str(f) 'data.frame': 122271 obs. of 2 variables: $ recovered : Factor w/ 5 levels "changed","identical",..: 5 3 3 3 2 2 2 2 1 2 ... $ change_rate: num 1 0 0 1 0 0 0 0 0 0 ... > I've already made a work-around by loading the table into mysql and > doing a select on the 2 columns I need. I just wonder why the above 2 > points aren't implemented. Maybe they are and I'm totally missing it. > > Thanks, > Frank > > > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.