If you've got access to unix tools (i.e. linux or cygwin), consider the "cut" command. Great for column selection.
Thomas Lumley <[EMAIL PROTECTED]> writes: > On Mon, 9 Aug 2004, F Duan wrote: > >> Dear R people, >> >> I have a very big tab-delim txt file with header and I only want to import >> several columns into R. I checked the options for “read.table” and only >> found “nrows” which lets you specify the maximum number of rows to read in. >> Although I can use some text editors (e.g., wordpad) to edit the txt file first >> before running R, I feel it’s not very convenient. The reason for me to do this >> is that if I import the whole file into R, it will eat up too much of my >> system’s memory. Even after I remove it later, I still can’t release the memory. >> > > You can't avoid reading the whole file, but you can avoid having it in > memory. > > I'll assume you know how many lines are in the file, call it N. (this > isn't necessary but it is tidier) and that you are interested in columns > 10 and 110, both numeric > > If you do something like > > inputfile<-file("inputfile.txt",open="r") > result<-data.frame(col10=numeric(N), col110=numeric(N)) > chunksize<-1000 > nchunks<- ceiling(N/1000) > > for(i in 1:nchunks){ > chunk<-read.table(inputfile,nrows=chunksize) > result[ (i-1)*chunksize+ (1:chunksize),]<-chunk[,c(10,110)] > } > > close(inputfile) > > you can choose the chunk size so that the memory use is not too bad. > > There are also more efficient ways that make you do more of the work (eg > read in lines of text with readLines and use regular expressions to > extract the columns you need) > > -thomas > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Anthony Rossini Research Associate Professor [EMAIL PROTECTED] http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html