Feng, I had the same question as you, how to read a subset of data, and the same reaction as Wensui when I discovered that read.table could not. Even if my computer's memory were up to it, I am troubled by the idea of reading in 1.8 GB of data (in my case) to get just 4,000 numbers, for instance, particularly if I'm then going to iterating through the entire dataset in 4,000-number chunks. I ended up defining a NetCDF format to hold my data using the RNetCDF package, since that package's var.get.nc() function is perfectly able to read subsets of a NetCDF variable. Furthermore, NetCDF files allow data to be matrices and even higher order arrays, from which you can then retrieve any chunk by including var.get.nc 'start' and 'count' arguments in the form of vectors of length equal to the number of array dimensions. Once a NetCDF format is defined, all else is painless. One limitation is that the RNetCDF package only supports version 3 of the NetCDF library, a version that puts a 2 GB limit on a variable's size. Version 4 removes this limitation; I'm hopeful some day that an R package will be an interface to the NetCDF version 4 library. John Thaden
Message: 22 Date: Sun, 11 Mar 2007 21:33:04 -0500 From: "jim holtman" <[EMAIL PROTECTED]> Subject: Re: [R] read.table for a subset of data To: "Wensui Liu" <[EMAIL PROTECTED]> Cc: r-help <r-help@stat.math.ethz.ch> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain If you know what 10 rows to read, then you can 'skip' to them, but it the system still has to read each line at a time. I have a 200,000 line csv file of numerics that takes me 4 seconds to read in with 'read.csv' using 'colClasses', so I would guess your 100K line file would take half of that. Is 2 seconds of time a waste of resources? On 3/11/07, Wensui Liu <[EMAIL PROTECTED]> wrote: > > Jim, > > Glad to see your reply. > > Refering to your email, what if I just want to read 10 rows from a csv > table with 100000 rows? Do you think it a waste of resource to read > the whole table in? > Anything thought? > > wensui > > On 3/11/07, jim holtman <[EMAIL PROTECTED]> wrote: > > Why cann't you read in the whole data set and then create the > subsets? This > > is easily done with 'split'. If the data is too large, then consider a > data > > base. > > > > On 3/11/07, gnv shqp <[EMAIL PROTECTED]> wrote: > > > > > > Hi R-experts, > > > > > > I have data from four conditions of an experiment. I tried to create > four > > > subsets of the data with read.table, for example, > > > read.table("Experiment.csv",subset=(condition=="1")) > > > . I found a similar post in the archive, but the answer to that post > was > > > no. Any new ideas about reading subsets of data with read.table? > > > > > > Thanks! > > > > > > Feng > > > > > > [[alternative HTML version deleted]] > > > Confidentiality Notice: This e-mail message, including any a...{{dropped}} ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.