I've spent some time trying to wrap my head around reading in large csv files with the ff-package. I think I know how to do it, but am bumping into some problems. I've tried to recreate the issues as best as I can with a smaller example and maybe someone can help explain the problems.
The following code just creates a csv file with an integer column, character column and logical column. ------------------------------------------------- library(ff) #Create data size = 2000 fake.data = data.frame("Integer"=round(100000*runif(size)),"Character"=sample(LETTERS,size,replace=T),"Logical"=sample(c(T,F),size,replace=T)) #Write to csv write.csv(fake.data,"data.csv",row.names=F) ------------------------------------------------- Now to read it in as a 'ffdf' class, I can do the following: ------------------------------------------------- data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500, next.rows = 1005,sep=",") ------------------------------------------------- That works. But with my current large data set, read.csv.ffdf is debating with me about the classes it's importing. I was also messing around with the first.rows/next.rows, but that's a question for another time. So I'll try to load the data in, specifying the column types (same exact command, except with specifying colClasses): ------------------------------------------------- > data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500, > next.rows = 1005,sep=",",colClasses = c("integer","integer","logical"))Error > in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer', got '"J"'> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500, next.rows = 1005,sep=",",colClasses = c("integer","character","logical"))Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, : vmode 'character' not implemented> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500, next.rows = 1005,sep=",",colClasses = rep("character",3))Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, : vmode 'character' not implemented> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500, next.rows = 1005,sep=",",colClasses = rep("raw",3))Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a raw', got '8601' ------------------------------------------------- I just can't find a combination of classes that will result in this reading in. I really don't understand why the classes 'character' won't work for all of them. Any thoughts as to why? I appreciate the help and time. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.