Dennis and all,
Thank you for the help as I try to get this method for importing and batch processing files organized. I currently have this set-up to import data from two files in my working directory. "Var1" specifies data from file 1 and file 2. filenames=list.files() library(plyr) import.list=adply(filenames, 1, read.csv) import.list Var1 Time O2_conc Chla_conc 1 1 0 273.5 300 2 1 10 268.1 240 3 1 20 262.8 210 4 1 30 257.4 175 5 1 40 252.0 155 6 1 50 246.6 138 7 1 60 241.3 129 8 1 70 235.9 121 9 1 80 230.5 117 10 2 0 270.0 300 11 2 10 269.0 240 12 2 20 257.0 210 13 2 30 259.0 175 14 2 40 230.0 155 15 2 50 220.0 138 16 2 60 221.0 129 17 2 70 450.0 121 18 2 80 250.0 117 For my calculation I am only interested in column 3, so I used split() to pull out this column based upon the value of Var1. split(import.list[,3],import.list$Var1) $`1` [1] 273.5 268.1 262.8 257.4 252.0 246.6 241.3 235.9 230.5 $`2` [1] 270 269 257 259 230 220 221 450 250 Finally I want to find the minimum value in column 3 (for each file) and the three previous data points, and then calculate the mean of these four values. > a=which.min(data$"1") > b=(which.min(data$"1")-3) > c=c(a:b) > d=mean(data$"1"[c]) > d [1] 238.575 So far this works, but this last set of calculations (the min, mean) are not very automated and are probably a bit cumbersome. Rather than having to replace a=which.min(data$1) with a=which.min(data$2) to do this calculation on the second file I would like to write a loop or use some R package with looping abilities to cycle through this (I have more than 2 files, this is just a test set). Im imagining a psuedocode something like this For (i in 1:20) { a[i]=which.min(data$[i]) b[i]=which.min(data$[i])-3 c[i]=c(a[i]:b[i]) etc ...though this particular code is most certainly too cumbersome. Any thoughts on how to make a loop so that I can cycle through the files and run these calculations? Thank you, Nate On Mon, Nov 15, 2010 at 9:12 PM, Dennis Murphy <djmu...@gmail.com> wrote: > Hi: > > See inline. > > On Mon, Nov 15, 2010 at 4:26 PM, Nate Miller <natemille...@gmail.com>wrote: > >> Hi All! >> >> I have some experience with R, but less experience writing scripts using >> R and have run into a challenge that I hope someone can help me with. >> >> I have multiple .csv files of data each with the same 3 columns of >> data, but potentially of varying lengths (some data files are from short >> measurements, others from longer ones). One file for example might look >> like this... >> >> Time, O2_conc, Chla_conc >> >> 0,270,300 >> >> 10, 260, 280 >> >> 20, 245, 268 >> >> 30, 233, 238 >> >> 40, 222, 212 >> >> 50, 215, 201 >> >> 60, 208, 193 >> >> 70, 206, 191 >> >> 80, 207,189 >> >> 90, 206, 186 >> >> 100, 206, 183 >> >> 110, 207, 178 >> >> 120, 205, 174 >> >> 130, 240, 171 >> >> 140, 270, 155 >> >> I am looking for an efficient means of batch (or sequentially) >> processing these files so that I can >> 1. import each data file >> >> 2. find the minimum value recorded in column 2 and the previous 5 data >> points >> > > Don't know what you mean by the 'previous 5 data points' ...are you > referring to a rolling minimum? > >> >> 3. and average these 10 values to get a mean, minimum value. >> > > If the surmise above is correct, you should get 11 rolling mins for a > vector of length 15. Here's an example using the rollapply() function from > the zoo package: > > library(zoo) > > x <- rpois(15, 10) > > x > [1] 17 12 17 9 8 10 7 11 15 11 11 15 5 9 12 > > rollapply(zoo(x), 5, FUN = min) > 3 4 5 6 7 8 9 10 11 12 13 > 8 8 7 7 7 7 7 11 5 5 5 > > mean(rollapply(zoo(x), 5, FUN = min)) > [1] 7 > > > Currently I have imported the data files using the following >> >> filenames=list.files() >> >> library(plyr) >> >> import.list=adply(filenames, 1, read.csv) >> > > This seems to be a reasonable approach. Does the result keep a column for > the file names? > >> >> and I know how to write a code to calculate the minimum value and the 5 >> preceding values in a single column, in a single file. I think the >> problem I am running into is scaling this code up so that I can import >> multiple files and calculating mean, minimum value for the 2^nd column >> in each of them. >> > > As long as you have an indicator for each file, this should be pretty > straightforward. Write a function that produces the summaries for one data > frame and then do something like > > ddply(slurpedFiles, .(dsIndicator), myfunction) > > to map the function to all of them. > > The data.table package is an alternative, where you should be able to do > similar things using the data set names as a key. data.table 'thinks' more > like an SQL, but it can be very efficient. > > You should be able to do what you're asking for with either package. > > HTH, > Dennis > >> >> Can anyone offer some advice on how to batch processes a whole bunch of >> files? I need to load them in, but then analyze them too. >> >> Thank you so much, >> >> Nate >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.