Just a note of thanks for all the help I have received. I haven't gotten a
chance to implement any of your suggestions because I'm still trying to catalog
all of them! Thank you so much!
Just to recap (for my own benefit and to create a summary for others):
Bruce Bernzweig suggested using the R.huge package.
Ben Bolker pointed out that my original message wasn't clear and asked what I
want to do with the data. At this point, just getting a dataset loaded would be
wonderful, so I'm trying to trim variables (and if possible, I would also like
to trim observations). He also provided an example of "vectorizing."
Ted Harding suggested that I use AWK to process the data and provided the
necessary code. He also tested his code on older hardware running GNU-Linux (or
Unix?) and showed that AWK can process the data even when the computer has very
little memory and processing power. Jim Holtman had similar success when he
used Cygwin's UNIX utilities on a machine running MS Windows. They both used
the following code:
gawk 'BEGIN{FS=","}{print $(1) "," $(1000) "," $(1275) "," $(5678)}'
< tempxx.txt > newdata.csv
Fortunately, there is a version of GAWK for MS Windows. ... Not that I like MS
Windows. It's just that I'm forced to use that 19th century operating system on
the job. (After using Debian at home and happily running RKWard for my
dissertation, returning to Windows World is downright depressing).
Roland Rau suggested that I use a database with RSQLite and pointed out that
RODBC can work with MS Access. He also pointed me to a sub-chapter in Venables
and Ripley's _S Programming_ and "The Whole-Object View" pages in John
Chamber's _Programming with Data_.
Greg Snow recommended biglm for regression analysis with data that is too
large to fit into memory.
Last, but not least, Peter Dalgaard pointed out that there are options within
R. He suggests using the colClasses= argument for when "reading" data and the
what= argument for "scanning" data, so that you don't load more columns than
necessary. He also provided the following script:
dict <- readLines("ftp://www.sipp.census.gov/pub/sipp/2004/l04puw1d.txt")
D.lines <- grep("^D ", dict)
vdict <- read.table(con <- textConnection(dict[D.lines])); close(con)
head(vdict)
I'll try these solutions and report back on my success.
Thanks again!
- Eric
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.