Re: [R] Batch Processing Files

Nathan Miller Tue, 16 Nov 2010 14:19:07 -0800

Dennis and all,


Thank you for the help as I try to get this method for importing and batch
processing files organized. I currently have this set-up to import data from
two files in my working directory. "Var1" specifies data from file 1 and
file 2.



filenames=list.files()

library(plyr)

import.list=adply(filenames, 1, read.csv)

import.list



   Var1 Time O2_conc Chla_conc

1     1    0   273.5       300

2     1   10   268.1       240

3     1   20   262.8       210

4     1   30   257.4       175

5     1   40   252.0       155

6     1   50   246.6       138

7     1   60   241.3       129

8     1   70   235.9       121

9     1   80   230.5       117

10    2    0   270.0       300

11    2   10   269.0       240

12    2   20   257.0       210

13    2   30   259.0       175

14    2   40   230.0       155

15    2   50   220.0       138

16    2   60   221.0       129

17    2   70   450.0       121

18    2   80   250.0       117



For my calculation I am only interested in column 3, so I used split() to
pull out this column based upon the value of Var1.





split(import.list[,3],import.list$Var1)

$`1`

[1] 273.5 268.1 262.8 257.4 252.0 246.6 241.3 235.9 230.5



$`2`

[1] 270 269 257 259 230 220 221 450 250



Finally I want to find the minimum value in column 3 (for each file) and the
three previous data points, and then calculate the mean of these four
values.



> a=which.min(data$"1")

> b=(which.min(data$"1")-3)

> c=c(a:b)

> d=mean(data$"1"[c])

> d

[1] 238.575



So far this works, but this last set of calculations (the min, mean) are not
very automated and are probably a bit cumbersome.

Rather than having to replace a=which.min(data$1) with  a=which.min(data$2)
to do this calculation on the second file I would like to write a loop or
use some R package with looping abilities to cycle through this (I have more
than 2 files, this is just a test set).



Im imagining a psuedocode something like this



For (i in 1:20) {

a[i]=which.min(data$[i])

b[i]=which.min(data$[i])-3

c[i]=c(a[i]:b[i])

etc

 ...though this particular code is most certainly too cumbersome.



Any thoughts on how to make a loop so that I can cycle through the files and
run these calculations?



Thank you,

Nate


On Mon, Nov 15, 2010 at 9:12 PM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
> See inline.
>
> On Mon, Nov 15, 2010 at 4:26 PM, Nate Miller <natemille...@gmail.com>wrote:
>
>> Hi All!
>>
>> I have some experience with R, but less experience writing scripts using
>> R and have run into a challenge that I hope someone can help me with.
>>
>>  I have multiple .csv files of data each with the same 3 columns of
>> data, but potentially of varying lengths (some data files are from short
>> measurements, others from longer ones). One file for example might look
>> like this...
>>
>> Time, O2_conc, Chla_conc
>>
>> 0,270,300
>>
>> 10, 260, 280
>>
>> 20, 245, 268
>>
>> 30, 233, 238
>>
>> 40, 222, 212
>>
>> 50, 215, 201
>>
>> 60, 208, 193
>>
>> 70, 206, 191
>>
>> 80, 207,189
>>
>> 90, 206, 186
>>
>> 100, 206, 183
>>
>> 110, 207, 178
>>
>> 120, 205, 174
>>
>> 130, 240, 171
>>
>> 140, 270, 155
>>
>> I am looking for an efficient means of batch (or sequentially)
>> processing these files so that I can
>> 1. import each data file
>>
>> 2. find the minimum value recorded in column 2 and the previous 5 data
>> points
>>
>
> Don't know what you mean by the 'previous 5 data points' ...are you
> referring to a rolling minimum?
>
>>
>> 3. and average these 10 values to get a mean, minimum value.
>>
>
> If the surmise above is correct, you should get 11 rolling mins for a
> vector of length 15. Here's an example using the rollapply() function from
> the zoo package:
>
> library(zoo)
> > x <- rpois(15, 10)
> > x
>  [1] 17 12 17  9  8 10  7 11 15 11 11 15  5  9 12
> > rollapply(zoo(x), 5, FUN = min)
>  3  4  5  6  7  8  9 10 11 12 13
>  8  8  7  7  7  7  7 11  5  5  5
> > mean(rollapply(zoo(x), 5, FUN = min))
> [1] 7
>
>
> Currently I have imported the data files using the following
>>
>> filenames=list.files()
>>
>> library(plyr)
>>
>> import.list=adply(filenames, 1, read.csv)
>>
>
> This seems to be a reasonable approach. Does the result keep a column for
> the file names?
>
>>
>> and I know how to write a code to calculate the minimum value and the 5
>> preceding values in a single column, in a single file. I think the
>> problem I am running into is scaling this code up so that I can import
>> multiple files and calculating mean, minimum value for the 2^nd column
>> in each of them.
>>
>
> As long as you have an indicator for each file, this should be pretty
> straightforward. Write a function that produces the summaries for one data
> frame and then do something like
>
> ddply(slurpedFiles, .(dsIndicator), myfunction)
>
> to map the function to all of them.
>
> The data.table package is an alternative, where you should be able to do
> similar things using the data set names as a key. data.table 'thinks' more
> like an SQL, but it can be very efficient.
>
> You should be able to do what you're asking for with either package.
>
> HTH,
> Dennis
>
>>
>> Can anyone offer some advice on how to batch processes a whole bunch of
>> files? I need to load them in, but then analyze them too.
>>
>> Thank you so much,
>>
>> Nate
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Batch Processing Files

Reply via email to