Hi,

Jason Thibodeau wrote:
I am attempting to perform some simple data manipulation on a large data
set. I have a snippet of the whole data set, and my small snippet is 2GB in
CSV.

Is there a way I can read my csv, select a few columns, and write it to an
output file in real time? This is what I do right now to a small test file:

data <- read.csv('data.csv', header = FALSE)

data_filter <- data[c(1,3,4)]

write.table(data_filter, file = "filter_data.csv", sep = ",", row.names =
FALSE, col.names = FALSE)

in this case, I think R is not the best tool for the job. I would rather suggest to use an implementation of the awk language (e.g. gawk). I just tried the following on WinXP (zipped file (87MB zipped, 1.2GB unzipped), piped into gawk)
unzip -p myzipfile.zip | gawk '{print $1, $3, $4}' > myfiltereddata.txt
and it took about 90 seconds.

Please note that you might need to specify your delimiter (field separator (FS) and output field separator (OFS)) =>
gawk '{FS=","; OFS=","} {print $1, $3, $4}' data.csv > filter_data.scv

I hope this helps (despite not encouraging the usage of R),
Roland

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to