Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-17 Thread Zilefac Elvis
Hi AK,
Thanks very much for the updated code.
My simulated results are even more consistent with observations after apply the 
updated version of the code.

Cheers,
Atem.



On Wednesday, April 16, 2014 11:31 PM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
Thanks very much.
Atem.




On Wednesday, April 16, 2014 9:32 PM, arun smartpink...@yahoo.com wrote:
Hi,
Use this code after `lst2`.
lapply(seq_along(lst2), function(i) {
    lstN - lapply(lst2[[i]], function(x) {
        datN - as.data.frame(matrix(NA, nrow = 101, ncol = length(names1), 
dimnames = list(NULL, 
            names1)))
        x1 - x[, -1]
        qt - numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = 
TRUE))(x1)
        datN[, match(names(x1), names(datN))] - qt
        datN
    })
    arr1 - array(unlist(lstN), dim = c(dim(lstN[[1]]), length(lstN)), dimnames 
= list(NULL, 
        names1))
    res - rowMeans(arr1, dims = 2, na.rm = TRUE)
    colnames(res) - gsub( , _, colnames(res))
    res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, 
stringsAsFactors = FALSE)
    write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], 
Quantile, 
        sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
sapply(ReadOut1, function(x) dim(x)) 

lstNew - simplify2array(ReadOut1)
nrow(lstNew)
#[1] 258
dir.create(Indices)
lapply(2:nrow(lstNew), function(i) {
    dat1 - data.frame(Percentiles = lstNew[1], do.call(cbind, 
        lstNew[i, ]), stringsAsFactors = FALSE)
    colnames(dat1) - c(Percentiles, paste(names(lst2), 
rep(rownames(lstNew)[i], 
        length(lst2)), sep = _))
    write.csv(dat1, paste0(paste(getwd(), Indices, gsub( , _, 
rownames(lstNew)[i]), 
        sep = /), .csv), row.names = FALSE, quote = FALSE)
})

## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
names(ReadOut2) - gsub(.*\\/(.*)\\.csv,\\1,list.files(recursive = 
TRUE)[grep(Indices, list.files(recursive = TRUE))])
ReadOut2$pint_DJF[1:3,1:3]
#  Percentiles G100_pint_DJF G101_pint_DJF
#1          0%      0.982001      1.020892
#2          1%      1.005563      1.039288
#3          2%      1.029126      1.057685
any(is.na(ReadOut2$pint_DJF))
 [1] FALSE 
A.K.








On Wednesday, April 16, 2014 12:34 PM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
I tried the updated Quantilecode.txt. It works well but when I open the files 
in Indices, I find some columns filled with NAs. This should not be the case 
given that I am working with simulations and there are no missing values in the 
process. The ##not correct section yielded no NAs. Check for example, 
pint_..._DJF in Indices.

Let be be sure we are in the same page. I removed the ##not correct section of 
the code, ran the code from beginning to end; Q1 and then Q2. My results are 
found in the Indices folder.

Thanks,
Atem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-16 Thread arun
Hi,
Use this code after `lst2`.
lapply(seq_along(lst2), function(i) {
lstN - lapply(lst2[[i]], function(x) {
datN - as.data.frame(matrix(NA, nrow = 101, ncol = length(names1), 
dimnames = list(NULL, 
names1)))
x1 - x[, -1]
qt - numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = 
TRUE))(x1)
datN[, match(names(x1), names(datN))] - qt
datN
})
arr1 - array(unlist(lstN), dim = c(dim(lstN[[1]]), length(lstN)), dimnames 
= list(NULL, 
names1))
res - rowMeans(arr1, dims = 2, na.rm = TRUE)
colnames(res) - gsub( , _, colnames(res))
res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, 
stringsAsFactors = FALSE)
write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], 
Quantile, 
sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
sapply(ReadOut1, function(x) dim(x)) 

lstNew - simplify2array(ReadOut1)
 nrow(lstNew)
#[1] 258
dir.create(Indices)
lapply(2:nrow(lstNew), function(i) {
dat1 - data.frame(Percentiles = lstNew[1], do.call(cbind, 
lstNew[i, ]), stringsAsFactors = FALSE)
colnames(dat1) - c(Percentiles, paste(names(lst2), 
rep(rownames(lstNew)[i], 
length(lst2)), sep = _))
write.csv(dat1, paste0(paste(getwd(), Indices, gsub( , _, 
rownames(lstNew)[i]), 
sep = /), .csv), row.names = FALSE, quote = FALSE)
})

## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))], 
function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
names(ReadOut2) - gsub(.*\\/(.*)\\.csv,\\1,list.files(recursive = 
TRUE)[grep(Indices, list.files(recursive = TRUE))])
ReadOut2$pint_DJF[1:3,1:3]
#  Percentiles G100_pint_DJF G101_pint_DJF
#1  0%  0.982001  1.020892
#2  1%  1.005563  1.039288
#3  2%  1.029126  1.057685
any(is.na(ReadOut2$pint_DJF))
 [1] FALSE 
A.K.







On Wednesday, April 16, 2014 12:34 PM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
I tried the updated Quantilecode.txt. It works well but when I open the files 
in Indices, I find some columns filled with NAs. This should not be the case 
given that I am working with simulations and there are no missing values in the 
process. The ##not correct section yielded no NAs. Check for example, 
pint_..._DJF in Indices.

Let be be sure we are in the same page. I removed the ##not correct section of 
the code, ran the code from beginning to end; Q1 and then Q2. My results are 
found in the Indices folder.

Thanks,
Atem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-16 Thread Zilefac Elvis
Hi AK,
Thanks very much.
Atem.



On Wednesday, April 16, 2014 9:32 PM, arun smartpink...@yahoo.com wrote:
Hi,
Use this code after `lst2`.
lapply(seq_along(lst2), function(i) {
    lstN - lapply(lst2[[i]], function(x) {
        datN - as.data.frame(matrix(NA, nrow = 101, ncol = length(names1), 
dimnames = list(NULL, 
            names1)))
        x1 - x[, -1]
        qt - numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = 
TRUE))(x1)
        datN[, match(names(x1), names(datN))] - qt
        datN
    })
    arr1 - array(unlist(lstN), dim = c(dim(lstN[[1]]), length(lstN)), dimnames 
= list(NULL, 
        names1))
    res - rowMeans(arr1, dims = 2, na.rm = TRUE)
    colnames(res) - gsub( , _, colnames(res))
    res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, 
stringsAsFactors = FALSE)
    write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], 
Quantile, 
        sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
sapply(ReadOut1, function(x) dim(x)) 

lstNew - simplify2array(ReadOut1)
nrow(lstNew)
#[1] 258
dir.create(Indices)
lapply(2:nrow(lstNew), function(i) {
    dat1 - data.frame(Percentiles = lstNew[1], do.call(cbind, 
        lstNew[i, ]), stringsAsFactors = FALSE)
    colnames(dat1) - c(Percentiles, paste(names(lst2), 
rep(rownames(lstNew)[i], 
        length(lst2)), sep = _))
    write.csv(dat1, paste0(paste(getwd(), Indices, gsub( , _, 
rownames(lstNew)[i]), 
        sep = /), .csv), row.names = FALSE, quote = FALSE)
})

## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
names(ReadOut2) - gsub(.*\\/(.*)\\.csv,\\1,list.files(recursive = 
TRUE)[grep(Indices, list.files(recursive = TRUE))])
ReadOut2$pint_DJF[1:3,1:3]
#  Percentiles G100_pint_DJF G101_pint_DJF
#1          0%      0.982001      1.020892
#2          1%      1.005563      1.039288
#3          2%      1.029126      1.057685
any(is.na(ReadOut2$pint_DJF))
 [1] FALSE 
A.K.








On Wednesday, April 16, 2014 12:34 PM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
I tried the updated Quantilecode.txt. It works well but when I open the files 
in Indices, I find some columns filled with NAs. This should not be the case 
given that I am working with simulations and there are no missing values in the 
process. The ##not correct section yielded no NAs. Check for example, 
pint_..._DJF in Indices.

Let be be sure we are in the same page. I removed the ##not correct section of 
the code, ran the code from beginning to end; Q1 and then Q2. My results are 
found in the Indices folder.

Thanks,
Atem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-15 Thread Zilefac Elvis
Hi AK,
All codes for simulation files work great.
I will try the code for observations and let you know.
Thanks very much.
Atem.







On Tuesday, April 15, 2014 12:01 AM, arun smartpink...@yahoo.com wrote:
Yes,
my new solution ignores such cases.







On Monday, April 14, 2014 11:58 PM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
Please ignore any such site.
I will check it and include in the analysis.
Thanks,
Atem.



On Monday, April 14, 2014 9:34 PM, arun smartpink...@yahoo.com wrote:



Hi,

I looked at your Observed.zip.  In that one of the file is without any data:
GG83_Sim.csv.ind.csv
The contents of the file are just:

Year    
Year    
trend    
p     
 

A.K.


On Monday, April 14, 2014 10:41 PM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
Q1) Please try to correct the error using the larger data set (Sample.zip). The 
issue is that once you write the codes and restrict it to smaller data sets, I 
find it difficult to generalize it to larger data sets.

Q2) From the Quantilecode2.txt you just sent, you forgot to do the following 
section using the Observed.zip file. I tried to run the code to section Q1 in 
Quantilecode2.txt using a larger data set and received the same error :Error in 
2:nrow(lstNew) : argument of length 0. I have attached a larger data set too 
for you to generalize the code to suit the larger data set. Please do not 
forget to include the code below in the final code of Q2.


Once you fix these two, I should be able to fix the rest following these 
examples.

Thanks AK. Sorry for overloading you with much work.
Atem.

#==
dir.create(Indices) 
names1 - lapply(ReadOut1, function(x) names(x))[[1]]
lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - 
data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) 
colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), 
rep(rownames(lstNew)[i],  length(lst1)), sep = _)) 
write.csv(dat1, paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = 
/),  .csv), row.names = FALSE, quote = FALSE)
})  
## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE, 
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257
head(ReadOut2[[1]], 2) 

#==




On Monday, April 14, 2014 8:07 PM, arun smartpink...@yahoo.com wrote:

HI,

Please send your emails in plain text.  If you had looked at the dimensions of 
`lst2`:
sapply(lst2,function(x) sapply(x,ncol))[1:6,]
     G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114
[1,]  258  258  258  258  258  257  258  258  258  258  258  258  258  258  247
[2,]  258  258  258  258  258  258  258  258  258  258  258  258  258  258  258
[3,]  258  258  258  258  258  258  258  258  258  258  258  258  258  258  257
[4,]  258  258  258  258  258  257  258  258  258  258  258  258  258  258  258
[5,]  258  258  258  258  258  258  258  258  258  258  258  258  258  258  258
[6,]  258  258  258  258  258  258  258  258  258  258  258  258  258  258  258
     G115 G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18
[1,]  258  247  256  256  258  258  258  258  258  258  258  258  258  257  258
[2,]  258  250  257  258  258  256  258  258  258  258  258  258  258  258  258
[3,]  258  247  256  258  258  256  258  258  258  258  258  258  258  258  256
[4,]  258  258  258  257  258  258  258  258  258  258  258  258  258  257  258
[5,]  258  257  258  258  258  256  258  258  258  258  258  258  258  258  258
[6,]  258  257  249  257  258  258  258  258  258  258  258  258  258  258  258
     GG19 GG20 GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28
[1,]  258  258  258  258  258  258  258  258  258  258
[2,]  258  258  258  258  258  258  258  258  258  258
[3,]  258  258  257  258  256  257  258  258  258  258
[4,]  258  257  258  258  258  257  258  258  258  258
[5,]  258  258  257  258  257  258  258  258  258  258
[6,]  258  258  258  258  257  258  258  258  258  258 


#the dimensions are not consistent for the Simulations
within each Site.  My codes assumed that all the datasets were having the same 
number of columns, rows etc.






On Monday, April 14, 2014 6:26 PM, Zilefac Elvis zilefacel...@yahoo.com wrote:

Hi AK,
I have another request for help.
Attached is a larger file (~27MB) for sample.zip. All files are same as 
previous except that I am using more sites to do the same thing that you did 
with sample.zip.

When generalizing Quantilecode.R to many sites, I receive an error when I run:

dir.create(Indices)
names1 - lapply(ReadOut1, function(x) names(x))[[1]]
lstNew - simplify2array(ReadOut1)

lapply(2:nrow(lstNew), function(i) {
  dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = 

Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-15 Thread arun


Hi Atem,
May be this works.
### Q1: working directory: Observed #Only one file per Site.  Assuming this is 
the
### case for the full dataset, then I guess there is no need to average
dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))

lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) {
lines1 - readLines(x2)
header1 - lines1[1:2]
dat1 - read.table(text = lines1, header = FALSE, sep = ,, 
stringsAsFactors = FALSE, 
skip = 2)
colnames(dat1) - Reduce(paste, strsplit(header1, ,))
dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

 lst3 - lst2[sapply(seq_along(lst2),function(i){lstN - 
sapply(lst2[[i]],function(x) is.integer(ncol(x)))})]
length(lst2)
#[1] 120
 length(lst3)
#[1] 119

library(plyr)
library(stringr)

lst4 - setNames(lapply(seq_along(lst3), function(i) {
lapply(lst3[[i]], function(x) {
names(x)[-1] - paste(names(x)[-1], names(lst1)[i], sep = _)
names(x) - str_trim(names(x))
x
})[[1]]
}), names(lst3))
df1 - join_all(lst4, by = Year)
dim(df1)
# [1] 9 27311

dimCol - sapply(split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])), 
function(x) {
df2 - df1[, x]
df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), 
numcolwise(function(y) quantile(y, 
seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE)
ncol(df3)
})

lst5 - split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1]))

lapply(seq_along(lst5), function(i) {
df2 - df1[, lst5[[i]]]
df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), 
numcolwise(function(y) quantile(y, 
seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE)
 write.csv(df3, paste0(paste(getwd(), final, paste(names(lst4)[[i]], 
Quantile, 
 sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
dir.create(Indices)
 sapply(ReadOut1, dim)[,1:3]  ##different dimensions
# [,1] [,2] [,3]
#[1,]  101  101  101
#[2,]  157  258  258

names1 - unlist(lapply(ReadOut1, function(x) names(x)[-1]))
names2 - gsub(\\_.*, , names1)
names3 - unique(gsub([.],  , names2))

length(names3)
#[1] 264
#lstNew - simplify2array(ReadOut1)  ###results you got
# nrow(lstNew)
#NULL

ReadOut2 -  lapply(seq_along(ReadOut1),function(i) {df2 - ReadOut1[[i]]; df3 
-as.data.frame(matrix(NA,nrow=101,ncol=length(names3), dimnames=list(NULL, 
names3))); names(df2) - gsub([.], , gsub(\\_.*,, names(df2))); df2 - 
df2[,-1]; df3[,match(names(df2), names(df3))] - df2; df3})

lstNew - simplify2array(ReadOut2)
 nrow(lstNew)
#[1] 264

 lapply(1:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = 
paste0(seq(0, 100, by = 1), %), do.call(cbind, lstNew[i, ]), stringsAsFactors 
= FALSE); colnames(dat1) - c(Percentiles, paste(names(lst3), 
rep(rownames(lstNew)[i],length(lst3)),sep=_)); 
write.csv(dat1,paste0(paste(getwd(), Indices, gsub( , 
_,rownames(lstNew)[i]), sep=/),.csv),row.names=FALSE, quote=FALSE)})


## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))], 
function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
length(ReadOut2)
#[1] 264

ReadOut2[[1]][1:3,1:3]
#  Percentiles G100_pav.ANN G101_pav.ANN
#1  0% 0.766900  0.96240
#2  1% 0.796132  0.96572
#3  2% 0.825364  0.96904


Attached is the file.

A.K.



On Tuesday, April 15, 2014 4:00 AM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
I tried all codes for observations. All others work great except this (probably 
due to different dimensions.
What I did is that I took the Observed.zip file, deleted the station which had 
no data and applied the code. However, this section of the code did not work. 
The problem is that lstNew is NULL. So, nothing is actually written to 
Indices.

I will check ReadOut1 when I get up from sleep.

Thanks,
Atem.

dir.create(Indices)
names1 - lapply(ReadOut1, function(x) names(x))[[1]] 
lstNew - simplify2array(ReadOut1)
nrow(lstNew) 
#[1] NULL 
lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], 
do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - 
c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i],  
length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, 
rownames(lstNew)[i], sep = /),  .csv), row.names = FALSE, quote = FALSE)
}) 
===


On Tuesday, April 15, 2014 12:45 AM, arun smartpink...@yahoo.com wrote:
HI  Atem,

No problem.  Hope it works for Observation files too.  Remember that before you 
run the same code for sample in Observation, check the dimensions of the files 
(as I did previously).  If there is change of dimensions, make them the same 
dimensions 

Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-15 Thread Zilefac Elvis
Hi AK,
Thanks very much. I worked great.
Many thanks.
Atem.


On Tuesday, April 15, 2014 9:20 AM, arun smartpink...@yahoo.com wrote:


Hi Atem,
May be this works.
### Q1: working directory: Observed #Only one file per Site.  Assuming this is 
the
### case for the full dataset, then I guess there is no need to average
dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))

lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) {
    lines1 - readLines(x2)
    header1 - lines1[1:2]
    dat1 - read.table(text = lines1, header = FALSE, sep = ,, 
stringsAsFactors = FALSE, 
        skip = 2)
    colnames(dat1) - Reduce(paste, strsplit(header1, ,))
    dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

lst3 - lst2[sapply(seq_along(lst2),function(i){lstN - 
sapply(lst2[[i]],function(x) is.integer(ncol(x)))})]
length(lst2)
#[1] 120
length(lst3)
#[1] 119

library(plyr)
library(stringr)

lst4 - setNames(lapply(seq_along(lst3), function(i) {
    lapply(lst3[[i]], function(x) {
        names(x)[-1] - paste(names(x)[-1], names(lst1)[i], sep = _)
        names(x) - str_trim(names(x))
        x
    })[[1]]
}), names(lst3))
df1 - join_all(lst4, by = Year)
dim(df1)
# [1] 9 27311

dimCol - sapply(split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])), 
function(x) {
    df2 - df1[, x]
    df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), 
numcolwise(function(y) quantile(y, 
        seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE)
    ncol(df3)
})

lst5 - split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1]))

lapply(seq_along(lst5), function(i) {
    df2 - df1[, lst5[[i]]]
    df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), 
numcolwise(function(y) quantile(y, 
        seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE)
     write.csv(df3, paste0(paste(getwd(), final, paste(names(lst4)[[i]], 
Quantile, 
     sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
dir.create(Indices)
sapply(ReadOut1, dim)[,1:3]  ##different dimensions
#     [,1] [,2] [,3]
#[1,]  101  101  101
#[2,]  157  258  258

names1 - unlist(lapply(ReadOut1, function(x) names(x)[-1]))
names2 - gsub(\\_.*, , names1)
names3 - unique(gsub([.],  , names2))

length(names3)
#[1] 264
#lstNew - simplify2array(ReadOut1)  ###results you got
# nrow(lstNew)
#NULL

ReadOut2 -  lapply(seq_along(ReadOut1),function(i) {df2 - ReadOut1[[i]]; df3 
-as.data.frame(matrix(NA,nrow=101,ncol=length(names3), dimnames=list(NULL, 
names3))); names(df2) - gsub([.], , gsub(\\_.*,, names(df2))); df2 - 
df2[,-1]; df3[,match(names(df2), names(df3))] - df2; df3})

lstNew - simplify2array(ReadOut2)
nrow(lstNew)
#[1] 264

lapply(1:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = 
paste0(seq(0, 100, by = 1), %), do.call(cbind, lstNew[i, ]), stringsAsFactors 
= FALSE); colnames(dat1) - c(Percentiles, paste(names(lst3), 
rep(rownames(lstNew)[i],length(lst3)),sep=_)); 
write.csv(dat1,paste0(paste(getwd(), Indices, gsub( , 
_,rownames(lstNew)[i]), sep=/),.csv),row.names=FALSE, quote=FALSE)})


## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
length(ReadOut2)
#[1] 264

ReadOut2[[1]][1:3,1:3]
#  Percentiles G100_pav.ANN G101_pav.ANN
#1          0%     0.766900      0.96240
#2          1%     0.796132      0.96572
#3          2%     0.825364      0.96904


Attached is the file.

A.K.




On Tuesday, April 15, 2014 4:00 AM, Zilefac Elvis zilefacel...@yahoo.com 
wrote:
Hi AK,
I tried all codes for observations. All others work great except this (probably 
due to different dimensions.
What I did is that I took the Observed.zip file, deleted the station which had 
no data and applied the code. However, this section of the code did not work. 
The problem is that lstNew is NULL. So, nothing is actually written to 
Indices.

I will check ReadOut1 when I get up from sleep.

Thanks,
Atem.

dir.create(Indices)
names1 - lapply(ReadOut1, function(x) names(x))[[1]] 
lstNew - simplify2array(ReadOut1)
nrow(lstNew) 
#[1] NULL 
lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], 
do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - 
c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i],  
length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, 
rownames(lstNew)[i], sep = /),  .csv), row.names = FALSE, quote = FALSE)
}) 
===


On Tuesday, April 15, 2014 12:45 AM, arun smartpink...@yahoo.com wrote:
HI  Atem,

No problem.  Hope it works for Observation files too.  Remember that before you 
run the same code for sample in Observation, 

Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-14 Thread arun


Hi,
It is because of different dimensions of Simulation data  within each Site.
Try:
dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))
sapply(lst1,length)
#G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114 
G115 
# 100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  
100 
#G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18 GG19 
GG20 
# 100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  
100 
#GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28 
# 100  100  100  100  100  100  100  100 

lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) {
    lines1 - readLines(x2)
    header1 - lines1[1:2]
    dat1 - read.table(text = lines1, header = FALSE, sep = ,, 
stringsAsFactors = FALSE, 
        skip = 2)
    colnames(dat1) - Reduce(paste, strsplit(header1, ,))
    dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

##dimensions differ within each Site
sapply(lst2,function(x) sapply(x,ncol))[1:6,5:8]
#     G104 G105 G106 G107
#[1,]  258  257  258  258
#[2,]  258  258  258  258
#[3,]  258  258  258  258
#[4,]  258  257  258  258
#[5,]  258  258  258  258
#[6,]  258  258  258  258

##number of rows are consistent
sapply(lst2,function(x) any(sapply(x,nrow)!=9))
# G100  G101  G102  G103  G104  G105  G106  G107  G108  G109  G110  G111  G112 
#FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
# G113  G114  G115  G116  G117  G118  G119  G120  GG10  GG11  GG12  GG13  GG14 
#FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
# GG15  GG16  GG17  GG18  GG19  GG20  GG21  GG22  GG23  GG24  GG25  GG26  GG27 
#FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
# GG28 
#FALSE 
names1 - unique(unlist(lapply(lst2,function(x) unlist(lapply(x,function(y) 
names(y)[-1])
length(names1)
#[1] 257


# lstYear - lapply(lst2,function(x) lapply(x, function(y)
# y[,1,drop=FALSE])[[1]])

library(plyr)

lapply(seq_along(lst2),function(i) {lstN - lapply(lst2[[i]],function(x) {datN 
- as.data.frame(matrix(NA, nrow=9, 
ncol=length(names1),dimnames=list(NULL,names1)));datN[,names1] - x[,-1]; datN 
}); lstQ1 - lapply(lstN,function(x) numcolwise(function(y) 
quantile(y,seq(0,1,by=0.01), na.rm=TRUE))(x)); arr1 - array(unlist(lstQ1), 
dim=c(dim(lstQ1[[1]]),length(lstQ1)),dimnames=list(NULL,lapply(lstQ1,names)[[1]]));
 res - rowMeans(arr1, dims=2, na.rm=TRUE); colnames(res) - gsub( , _, 
colnames(res)); res1 - data.frame(Percentiles=paste0(seq(0,100, 
by=1),%),res, stringsAsFactors=FALSE); write.csv(res1,paste0(paste(getwd(), 
final, paste(names(lst1)[[i]], Quantile, sep=_), sep= /), .csv), 
row.names=FALSE, quote=FALSE)})



## output files
list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))]
#[1] final/G100_Quantile.csv final/G101_Quantile.csv
#[3] final/G102_Quantile.csv final/G103_Quantile.csv
#[5] final/G104_Quantile.csv final/G105_Quantile.csv
#[7] final/G106_Quantile.csv final/G107_Quantile.csv
#[9] final/G108_Quantile.csv final/G109_Quantile.csv
#[11] final/G110_Quantile.csv final/G111_Quantile.csv
#[13] final/G112_Quantile.csv final/G113_Quantile.csv
#[15] final/G114_Quantile.csv final/G115_Quantile.csv
#[17] final/G116_Quantile.csv final/G117_Quantile.csv
#[19] final/G118_Quantile.csv final/G119_Quantile.csv
#[21] final/G120_Quantile.csv final/GG10_Quantile.csv
#[23] final/GG11_Quantile.csv final/GG12_Quantile.csv
#[25] final/GG13_Quantile.csv final/GG14_Quantile.csv
#[27] final/GG15_Quantile.csv final/GG16_Quantile.csv
#[29] final/GG17_Quantile.csv final/GG18_Quantile.csv
#[31] final/GG19_Quantile.csv final/GG20_Quantile.csv
#[33] final/GG21_Quantile.csv final/GG22_Quantile.csv
#[35] final/GG23_Quantile.csv final/GG24_Quantile.csv
#[37] final/GG25_Quantile.csv final/GG26_Quantile.csv
#[39] final/GG27_Quantile.csv final/GG28_Quantile.csv


ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
    function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))
sapply(ReadOut1,function(x) dim(x))
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#[1,]  101  101  101  101  101  101  101  101  101   101   101   101   101   101
#[2,]  258  258  258  258  258  258  258  258  258   258   258   258   258   258
#     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
#[1,]   101   101   101   101   101   101   101   101   101   101   101   101
#[2,]   258   258   258   258   258   258   258   258   258   258   258   258
#     [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
#[1,]   101   101   101   101   101   101   101   101   101   101   101   101
#[2,]   258   258   258   258   258   258   258   258   258   258   258   258
#     [,39] [,40]
#[1,]   101   101
#[2,]   258   258

ReadOut1[[1]][1:3,1:3]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.56619 6.795429
#2          1% -12.45888 

Re: [R] : Quantile and rowMean from multiple files in a folder

2014-04-14 Thread arun




Hi,
Q1 solution already sent.

Regarding Q2, one of the files in the new Observed folder doesn't have any  
data (just the Year column alone).

That may be the reason for the problem.

### Q1: working directory: Observed #Only one file per Site.  Assuming this is 
the
### case for the full dataset, then I guess there is no need to average
dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))

lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) {
lines1 - readLines(x2)
header1 - lines1[1:2]
dat1 - read.table(text = lines1, header = FALSE, sep = ,, 
stringsAsFactors = FALSE, 
skip = 2)
colnames(dat1) - Reduce(paste, strsplit(header1, ,))
dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

lst3 - lst2[sapply(seq_along(lst2),function(i){lstN - 
sapply(lst2[[i]],function(x) is.integer(ncol(x)))})]


#difference in column number
sapply(seq_along(lst3), function(i) {
sapply(lst3[[i]], function(x) ncol(x))
})
# 
#[1] 157 258 258  98 157 258 256 258 250 258 258 147 157 250 250 256 249 240
# [19] 181 188 256 146 117 258 153 256 255 246 255 256 258 257 145 258 258 255
# [37] 258 157 164 144 265 258 254 258 258 157 258 176 258 256 257 258 258 258
# [55] 248 258 156 258 157 157 258 258 258 258 258 148 258 258 258 258 257 258
# [73] 258 258 157 154 153 258 248 255 257 256 258 258 157 256 256 257 257 250
# [91] 257 139 155 256 256 257 257 256 258 258 257 258 258 258 258 157 157 157
#[109] 258 258 258 258 256 258 157 258 258 256 258

library(plyr)
library(stringr)

lst4 - setNames(lapply(seq_along(lst3), function(i) {
lapply(lst3[[i]], function(x) {
names(x)[-1] - paste(names(x)[-1], names(lst1)[i], sep = _)
names(x) - str_trim(names(x))
x
})[[1]]
}), names(lst3))

df1 - join_all(lst4, by = Year)
dim(df1)
# [1] 9 27311

sapply(split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])), function(x) {
df2 - df1[, x]
df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), 
numcolwise(function(y) quantile(y, 
seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE)
ncol(df3)
})
# 
#G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114 
G115 
# 157  258  258   98  157  258  256  258  250  258  258  147  157  250  250  
256 
#G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18 GG19 
GG20 
# 249  240  181  188  256  146  117  258  153  256  255  246  255  256  258  
257 
#GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28 GG29 GG30 GG31 GG32 GG33 GG34 GG35 
GG36 
# 145  258  258  255  258  157  164  144  265  258  254  258  258  157  258  
176 
#GG37 GG38 GG39 GG40 GG41 GG42 GG43 GG44 GG45 GG46 GG47 GG48 GG49 GG50 GG51 
GG52 
# 258  256  257  258  258  258  248  258  156  258  157  157  258  258  258  
258 
#GG53 GG54 GG55 GG56 GG57 GG58 GG59 GG60 GG61 GG62 GG63 GG64 GG65 GG66 GG67 
GG68 
# 258  148  258  258  258  258  257  258  258  258  157  154  153  258  248  
255 
#GG69 GG70 GG71 GG72 GG73 GG74 GG75 GG76 GG77 GG78 GG79 GG80 GG81 GG82 GG83 
GG84 
# 257  256  258  258  157  256  256  257  257  250  257  139  155  256  256  
257 
#GG85 GG86 GG87 GG88 GG89 GG90 GG91 GG92 GG93 GG94 GG95 GG96 GG97 GG98 GG99 
GGG1 
# 257  256  258  258  257  258  258  258  258  157  157  157  258  258  258  
258 
#GGG2 GGG3 GGG4 GGG5 GGG6 GGG7 GGG8 
# 256  258  157  258  258  256  258 



lst5 - split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1]))

lapply(seq_along(lst5), function(i) {
df2 - df1[, lst5[[i]]]
df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), 
numcolwise(function(y) quantile(y, 
seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE)
df3[1:3, 1:3]
write.csv(df3, paste0(paste(getwd(), final, paste(names(lst4)[[i]], 
Quantile, 
sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))], 
function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE))

sapply(ReadOut1, dim)[,1:3]
# [,1] [,2] [,3]
#[1,]  101  101  101
#[2,]  157  258  258


lapply(ReadOut1, function(x) x[1:2, 1:3])[1:3]
#[[1]]
#  Percentiles pav.DJF_G100 pav.MAM_G100
#1  0%0 0.640500
#2  1%0 0.664604
#
#[[2]]
#  Percentiles txav.DJF_G101 txav.MAM_G101
#1  0%  -13.8756  4.742400
#2  1%  -13.8140  4.817184
#
#[[3]]
#  Percentiles txav.DJF_G102 txav.MAM_G102
#1  0% -15.05000  4.520700
#2  1% -14.96833  4.543828


### Q2: Observed data

dir.create(Indices)

names1 - unlist(lapply(ReadOut1, function(x) names(x)[-1]))
names2 - gsub(\\_.*, , names1)
names3 - unique(gsub([.],  , names2))

res - do.call(rbind, lapply(seq_along(lst5), function(i) {
df2 - df1[, lst5[[i]]]
vec1 - colMeans(df2, na.rm = TRUE)
vec2 - rep(NA, length(names3))
names(vec2) - paste(names3, names(lst5)[[i]], sep = _)
vec2[names(vec2) %in% 

Re: [R] Quantile and rowMean from multiple files in a folder

2014-04-14 Thread zilefacel...@yahoo.com

   Hi AK,

   Thanks very much.

   I  did  send  you  another  email  with  a larger Sample.zip file. The
   Quantilecode.R which you initially developed for a smaller sample.zip did
   not complete the task when I used it for a larger data set. Please check to
   rectify the error message.


   Thanks,

   Atem.
   -- Original Message --

 From : arun
 To : R. Help;
 Cc : Zilefac Elvis;
 Sent : 14-04-2014 18:57
 Subject : Re: Quantile and rowMean from multiple files in a folder

Hi Atem,

I guess this is what you wanted.

###Q1: 
###
###working directory: Observed
 #Only one file per Site.  Assuming this is the case for the full dataset, then
 I guess there is no need to average

dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(patter
n = .csv)))

lst2 -  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines(
x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=,
,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(head
er1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]}))


#different number of rows
 sapply(seq_along(lst2),function(i){lstN - lapply(lst2[[i]],function(x) x[,-1]
);sapply(lstN,function(x) nrow(x))})
 #[1] 9 9 9 8 2 9

#difference in number of columns
sapply(seq_along(lst2),function(i) {sapply(lst2[[i]],function(x) ncol(x))})
 #[1] 157 258 258  98 157 258

library(plyr)
library(stringr)

lst3 - setNames(lapply(seq_along(lst2),function(i) {lapply(lst2[[i]],function(
x) {names(x)[-1] - paste(names(x)[-1], names(lst1)[i],sep=_); names(x) - st
r_trim(names(x)); x})[[1]]}), names(lst1)) 

df1 - join_all(lst3,by=Year)
dim(df1)
 #[1]9 1181 


sapply(split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1])),function(x) {df2 
- df1[,x];df3 - data.frame(Percentiles=paste0(seq(0,100, by=1) ,%), numcolw
ise(function(y) quantile(y,seq(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=
FALSE);ncol(df3) })
 #G100 G101 G102 G103 G104 G105 
# 157  258  258   98  157  258 

lst4 - split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1]))

lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]]; df3 - data.frame(P
ercentiles=paste0(seq(0,100, by=1) ,%), numcolwise(function(y) quantile(y,seq
(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=FALSE);df3[1:3,1:3]; write.csv
(df3,paste0(paste(getwd(), final,paste(names(lst1)[[i]],Quantile,sep=_),s
ep=/),.csv),row.names=FALSE,quote=FALSE)}) 

ReadOut1 - lapply(list.files(recursive=TRUE)[grep(Quantile,list.files(recurs
ive=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) 

sapply(ReadOut1,dim)
# [,1] [,2] [,3] [,4] [,5] [,6]
 #[1,]  101  101  101  101  101  101 
#[2,]  157  258  258   98  157  258

lapply(ReadOut1,function(x) x[1:2,1:3])[1:3]
 #[[1]] 
#  Percentiles pav.DJF_G100 pav.MAM_G100 
#1  0%0 0.640500 
#2  1%0 0.664604 
# 
#[[2]] 
#  Percentiles txav.DJF_G101 txav.MAM_G101
 #1  0%  -13.8756  4.742400 
#2  1%  -13.8140  4.817184
 #
 #[[3]] 
#  Percentiles txav.DJF_G102 txav.MAM_G102
 #1  0% -15.05000  4.520700
 #2  1% -14.96833  4.543828 
#
###Q2: 
###Observed data 

dir.create(Indices)
 names1 - unlist(lapply(ReadOut1,function(x)
 names(x)[-1])) 
names2 -  gsub(\\_.*,,names1)
 names3 - unique(gsub([.],  , names2)) 

res - do.call(rbind,lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]]
;vec1 - colMeans(df2,na.rm=TRUE); vec2 - rep(NA,length(names3));names(vec2) 
- paste(names3,names(lst4)[[i]],sep=_); vec2[names(vec2) %in% names(vec1)] -
 vec1; names(vec2) - gsub(\\_.*,,names(vec2)); vec2  }))


lapply(seq_len(ncol(res)),function(i) {mat1 - t(res[,i,drop=FALSE]);colnames(m
at1) - names(lst4); write.csv(mat1,paste0(paste(getwd(),Indices, gsub( ,_
,rownames(mat1)),sep=/),.csv),row.names=FALSE,quote=FALSE)})

##Output2:
ReadOut2 - lapply(list.files(recursive=TRUE)[grep(Indices,list.files(recursi
ve=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) 

length(ReadOut2) 

#[1] 257


list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))][1]
#[1] Indices/pav_ANN.csv 

res[,pav ANN,drop=FALSE] 

#  pav ANN
#[1,] 1.298811
#[2,] 7.642922 

#[3,] 6.740011 

#[4,]   NA
#[5,] 1.296650 

#[6,] 6.887622 


ReadOut2[[1]]
#  G100 G101 G102 G103G104 G105
#1 1.298811 7.642922 6.740011   NA 1.29665 6.887622 

###Sample data 

###Working directory changed to sample 

dir.create(Indices_colMeans)

lst1 - split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.c
sv))) 

lst2 -  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines(
x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=,
,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(head
er1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]}))
res1 - do.call(rbind,lapply(seq_along(lst2),function(i) {rowMeans(do.call(cbin
d,lapply(lst2[[i]],function(x) 

Re: [R] Quantile and rowMean from multiple files in a folder

2014-04-14 Thread arun
Hi Atem,

I guess this is what you wanted.

###Q1: 
###
###working directory: Observed
 #Only one file per Site.  Assuming this is the case for the full dataset, then 
I guess there is no need to average

dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))

lst2 -  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - 
readLines(x2); header1 - lines1[1:2]; dat1 - 
read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); 
colnames(dat1) - 
Reduce(paste,strsplit(header1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]}))


#different number of rows
 sapply(seq_along(lst2),function(i){lstN - lapply(lst2[[i]],function(x) 
x[,-1]);sapply(lstN,function(x) nrow(x))})
 #[1] 9 9 9 8 2 9

#difference in number of columns
sapply(seq_along(lst2),function(i) {sapply(lst2[[i]],function(x) ncol(x))})
 #[1] 157 258 258  98 157 258

library(plyr)
library(stringr)

lst3 - setNames(lapply(seq_along(lst2),function(i) 
{lapply(lst2[[i]],function(x) {names(x)[-1] - paste(names(x)[-1], 
names(lst1)[i],sep=_); names(x) - str_trim(names(x)); x})[[1]]}), 
names(lst1)) 

df1 - join_all(lst3,by=Year)
dim(df1)
 #[1]    9 1181 


sapply(split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1])),function(x) {df2 
- df1[,x];df3 - data.frame(Percentiles=paste0(seq(0,100, by=1) ,%), 
numcolwise(function(y) 
quantile(y,seq(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=FALSE);ncol(df3) 
})
 #G100 G101 G102 G103 G104 G105 
# 157  258  258   98  157  258 

lst4 - split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1]))

lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]]; df3 - 
data.frame(Percentiles=paste0(seq(0,100, by=1) ,%), numcolwise(function(y) 
quantile(y,seq(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=FALSE);df3[1:3,1:3];
 write.csv(df3,paste0(paste(getwd(), 
final,paste(names(lst1)[[i]],Quantile,sep=_),sep=/),.csv),row.names=FALSE,quote=FALSE)})
 

ReadOut1 - 
lapply(list.files(recursive=TRUE)[grep(Quantile,list.files(recursive=TRUE))],function(x)
 read.csv(x,header=TRUE,stringsAsFactors=FALSE)) 

sapply(ReadOut1,dim)
#     [,1] [,2] [,3] [,4] [,5] [,6]
 #[1,]  101  101  101  101  101  101 
#[2,]  157  258  258   98  157  258

lapply(ReadOut1,function(x) x[1:2,1:3])[1:3]
 #[[1]] 
#  Percentiles pav.DJF_G100 pav.MAM_G100 
#1          0%            0     0.640500 
#2          1%            0     0.664604 
# 
#[[2]] 
#  Percentiles txav.DJF_G101 txav.MAM_G101
 #1          0%      -13.8756      4.742400 
#2          1%      -13.8140      4.817184
 #
 #[[3]] 
#  Percentiles txav.DJF_G102 txav.MAM_G102
 #1          0%     -15.05000      4.520700
 #2          1%     -14.96833      4.543828 
#
###Q2: 
###Observed data 

dir.create(Indices)
 names1 - unlist(lapply(ReadOut1,function(x)
 names(x)[-1])) 
names2 -  gsub(\\_.*,,names1)
 names3 - unique(gsub([.],  , names2)) 

res - do.call(rbind,lapply(seq_along(lst4),function(i) {df2 - 
df1[,lst4[[i]]];vec1 - colMeans(df2,na.rm=TRUE); vec2 - 
rep(NA,length(names3));names(vec2) - paste(names3,names(lst4)[[i]],sep=_); 
vec2[names(vec2) %in% names(vec1)] - vec1; names(vec2) - 
gsub(\\_.*,,names(vec2)); vec2  }))


lapply(seq_len(ncol(res)),function(i) {mat1 - 
t(res[,i,drop=FALSE]);colnames(mat1) - names(lst4); 
write.csv(mat1,paste0(paste(getwd(),Indices, gsub( 
,_,rownames(mat1)),sep=/),.csv),row.names=FALSE,quote=FALSE)})

##Output2:
ReadOut2 - 
lapply(list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))],function(x)
 read.csv(x,header=TRUE,stringsAsFactors=FALSE)) 

length(ReadOut2) 

#[1] 257


list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))][1]
#[1] Indices/pav_ANN.csv 

res[,pav ANN,drop=FALSE] 

#  pav ANN
#[1,] 1.298811
#[2,] 7.642922 

#[3,] 6.740011 

#[4,]   NA
#[5,] 1.296650 

#[6,] 6.887622 


ReadOut2[[1]]
#  G100 G101 G102 G103G104 G105
#1 1.298811 7.642922 6.740011   NA 1.29665 6.887622 

###Sample data 

###Working directory changed to sample 

dir.create(Indices_colMeans)

lst1 - 
split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.csv))) 

lst2 -  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - 
readLines(x2); header1 - lines1[1:2]; dat1 - 
read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); 
colnames(dat1) - 
Reduce(paste,strsplit(header1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]}))
res1 - do.call(rbind,lapply(seq_along(lst2),function(i) 
{rowMeans(do.call(cbind,lapply(lst2[[i]],function(x) 
colMeans(x[,-1],na.rm=TRUE))),na.rm=TRUE) })) 

lapply(seq_len(ncol(res1)),function(i){mat1 - t(res1[,i,drop=FALSE]); 
colnames(mat1) - 
names(lst2);write.csv(mat1,paste0(paste(getwd(),Indices_colMeans,gsub( 
,_,rownames(mat1)),sep=/),.csv),row.names=FALSE,quote=FALSE)})

##Output2 Sample
ReadOut2S - 
lapply(list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))],function(x)
 read.csv(x,header=TRUE,stringsAsFactors=FALSE)) 

length(ReadOut2S)
#[1] 257


Re: [R] Quantile and rowMean from multiple files in a folder

2014-04-13 Thread arun
Hi,

I am formatting the codes using library(formatR).  Hopefully, it will not be 
mangled in the email.
dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))

lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - 
readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = 
FALSE, sep = ,, stringsAsFactors = FALSE,  skip = 2) colnames(dat1) - 
Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

library(plyr) 

lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) x[, 
-1]) lstQ1 - lapply(lstN, function(x) numcolwise(function(y) quantile(y, 
seq(0, 1,  by = 0.01), na.rm = TRUE))(x)) arr1 - array(unlist(lstQ1), dim = 
c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL,  lapply(lstQ1, 
names)[[1]])) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - 
gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 
100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, 
paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile,  sep = _), 
sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE, 
stringsAsFactors = FALSE))
sapply(ReadOut1, dim)
# [,1] [,2]
#[1,]  101  101
#[2,]  258  258

lapply(ReadOut1,function(x) x[1:2,1:3])
#[[1]]
#  Percentiles  txav_DJF txav_MAM
#1  0% -12.68566  7.09702
#2  1% -12.59062  7.15338
#
#[[2]]
#  Percentiles  txav_DJF txav_MAM
#1  0% -12.75516 6.841840
#2  1% -12.68244 6.910664 


###Q2:

dir.create(Indices)
names1 - lapply(ReadOut1, function(x) names(x))[[1]]
lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - 
data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) 
colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), 
rep(rownames(lstNew)[i],  length(lst1)), sep = _)) write.csv(dat1, 
paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /),  .csv), 
row.names = FALSE, quote = FALSE)
}) ## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE, 
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257 


head(ReadOut2[[1]], 2)
#  Percentiles G100_pav_ANN G101_pav_ANN
#1  0% 1.054380 1.032740
#2  1% 1.069457 1.045689 


A.K.











On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis zilefacel...@yahoo.com wrote:

Hi AK,
Q1) I need your help again. Using the previous data (attached) and the previous 
code below,instead of taking rowMeans, let's do quantile(x,seq(0,1,by=0.01)). 

Delete the last 2 rows (Trend and p) in each file before doing 
quantile(x,seq(0,1,by=0.01)).

For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for 
each column of Site G100. I will do so for the 5 sims of site G100 and then 
take their average. This will be approximately close to the true value than 
just calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same 
thing for all the files.

So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01)) 
calculated from all sims in that Site.

Output

The number of files in final remains the same (2 files). The Year 
column(will be replaced) will contain  the names of 
quantile(x,seq(0,1,by=0.01)) such as  0%           1%           2%           3% 
          4%           5%           6%, ..., 98%          99%         100% . 
You can give this column any name such as Percentiles.


Q2)  From the folder final, please go to each file identified by site name, 
take a column, say col1 of txav  from each file, create a dataframe whose 
colnames are site codes (names of files in final). Create a folder called 
Indices and place this dataframe in it. The filename for the dataframe is 
txav, say. So, in Indices, you will have one file having 3 columns [, 
c(Percentiles, G100,G101)]. The idea is that I want to be able to pick any 
column from files in final and form a dataframe from which I will generate my 
qqplot or boxplot.

Thanks very much AK.
Atem
This should be the final step of this my drama, at least for now.
#==

dir.create(final)
lst1 - 
split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.csv))) 
lst2 -  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - 
readLines(x2); header1 - lines1[1:2]; dat1 - 
read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); 
colnames(dat1) - Reduce(paste,strsplit(header1,,));dat1}))

lstYear - lapply(lst2,function(x) lapply(x, function(y) 
y[,1,drop=FALSE])[[1]]) 


lapply(seq_along(lst2),function(i) {lstN -lapply(lst2[[i]],function(x) 
x[,-1]); arr1 - 

Re: [R] Quantile and rowMean from multiple files in a folder

2014-04-13 Thread Zilefac Elvis
Hi AK,
I must admit that you did an excellent job.
Thanks very much.
My analysis is manageable now.
Regards,
Atem.
On Sunday, April 13, 2014 8:54 AM, arun smartpink...@yahoo.com wrote:
 
Hi,

I am formatting the codes using library(formatR).  Hopefully, it will not be 
mangled in the email.
dir.create(final)
lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , 
list.files(pattern = .csv)))

lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - 
readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = 
FALSE, sep = ,, stringsAsFactors = FALSE,  skip = 2) colnames(dat1) - 
Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

library(plyr) 

lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) x[, 
-1]) lstQ1 - lapply(lstN, function(x) numcolwise(function(y) quantile(y, 
seq(0, 1,  by = 0.01), na.rm = TRUE))(x)) arr1 - array(unlist(lstQ1), dim = 
c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL,  lapply(lstQ1, 
names)[[1]])) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - 
gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 
100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, 
paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile,  sep = _), 
sep = /), .csv), row.names = FALSE, quote = FALSE)
})

ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, 
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE, 
stringsAsFactors = FALSE))
sapply(ReadOut1, dim)
#     [,1] [,2]
#[1,]  101  101
#[2,]  258  258

lapply(ReadOut1,function(x) x[1:2,1:3])
#[[1]]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.68566  7.09702
#2          1% -12.59062  7.15338
#
#[[2]]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.75516 6.841840
#2          1% -12.68244 6.910664 


###Q2:

dir.create(Indices)
names1 - lapply(ReadOut1, function(x) names(x))[[1]]
lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - 
data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) 
colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), 
rep(rownames(lstNew)[i],  length(lst1)), sep = _)) write.csv(dat1, 
paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /),  .csv), 
row.names = FALSE, quote = FALSE)
}) ## Output2:
ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, 
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE, 
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257 


head(ReadOut2[[1]], 2)
#  Percentiles G100_pav_ANN G101_pav_ANN
#1          0%     1.054380     1.032740
#2          1%     1.069457     1.045689 


A.K.












On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis zilefacel...@yahoo.com wrote:

Hi AK,
Q1) I need your help again. Using the previous data (attached) and the previous 
code below,instead of taking rowMeans, let's do quantile(x,seq(0,1,by=0.01)). 

Delete the last 2 rows (Trend and p) in each file before doing 
quantile(x,seq(0,1,by=0.01)).

For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for 
each column of Site G100. I will do so for the 5 sims of site G100 and then 
take their average. This will be approximately close to the true value than 
just calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same 
thing for all the files.

So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01)) 
calculated from all sims in that Site.

Output

The number of files in final remains the same (2 files). The Year 
column(will be replaced) will contain  the names of 
quantile(x,seq(0,1,by=0.01)) such as  0%           1%           2%           3% 
          4%           5%           6%, ..., 98%          99%         100% . 
You can give this column any name such as Percentiles.


Q2)  From the folder final, please go to each file identified by site name, 
take a column, say col1 of txav  from each file, create a dataframe whose 
colnames are site codes (names of files in final). Create a folder called 
Indices and place this dataframe in it. The filename for the dataframe is 
txav, say. So, in Indices, you will have one file having 3 columns [, 
c(Percentiles, G100,G101)]. The idea is that I want to be able to pick any 
column from files in final and form a dataframe from which I will generate my 
qqplot or boxplot.

Thanks very much AK.
Atem
This should be the final step of this my drama, at least for now.
#==

dir.create(final)
lst1 - 
split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.csv))) 
lst2 -  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - 
readLines(x2); header1 - lines1[1:2]; dat1 - 
read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); 
colnames(dat1) - Reduce(paste,strsplit(header1,,));dat1}))

lstYear - lapply(lst2,function(x) lapply(x,