Re: [R] : Quantile and rowMean from multiple files in a folder
Hi AK, Thanks very much for the updated code. My simulated results are even more consistent with observations after apply the updated version of the code. Cheers, Atem. On Wednesday, April 16, 2014 11:31 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, Thanks very much. Atem. On Wednesday, April 16, 2014 9:32 PM, arun smartpink...@yahoo.com wrote: Hi, Use this code after `lst2`. lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) { datN - as.data.frame(matrix(NA, nrow = 101, ncol = length(names1), dimnames = list(NULL, names1))) x1 - x[, -1] qt - numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x1) datN[, match(names(x1), names(datN))] - qt datN }) arr1 - array(unlist(lstN), dim = c(dim(lstN[[1]]), length(lstN)), dimnames = list(NULL, names1)) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, function(x) dim(x)) lstNew - simplify2array(ReadOut1) nrow(lstNew) #[1] 258 dir.create(Indices) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(Percentiles, paste(names(lst2), rep(rownames(lstNew)[i], length(lst2)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, gsub( , _, rownames(lstNew)[i]), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) names(ReadOut2) - gsub(.*\\/(.*)\\.csv,\\1,list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))]) ReadOut2$pint_DJF[1:3,1:3] # Percentiles G100_pint_DJF G101_pint_DJF #1 0% 0.982001 1.020892 #2 1% 1.005563 1.039288 #3 2% 1.029126 1.057685 any(is.na(ReadOut2$pint_DJF)) [1] FALSE A.K. On Wednesday, April 16, 2014 12:34 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I tried the updated Quantilecode.txt. It works well but when I open the files in Indices, I find some columns filled with NAs. This should not be the case given that I am working with simulations and there are no missing values in the process. The ##not correct section yielded no NAs. Check for example, pint_..._DJF in Indices. Let be be sure we are in the same page. I removed the ##not correct section of the code, ran the code from beginning to end; Q1 and then Q2. My results are found in the Indices folder. Thanks, Atem. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi, Use this code after `lst2`. lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) { datN - as.data.frame(matrix(NA, nrow = 101, ncol = length(names1), dimnames = list(NULL, names1))) x1 - x[, -1] qt - numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x1) datN[, match(names(x1), names(datN))] - qt datN }) arr1 - array(unlist(lstN), dim = c(dim(lstN[[1]]), length(lstN)), dimnames = list(NULL, names1)) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, function(x) dim(x)) lstNew - simplify2array(ReadOut1) nrow(lstNew) #[1] 258 dir.create(Indices) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(Percentiles, paste(names(lst2), rep(rownames(lstNew)[i], length(lst2)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, gsub( , _, rownames(lstNew)[i]), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) names(ReadOut2) - gsub(.*\\/(.*)\\.csv,\\1,list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))]) ReadOut2$pint_DJF[1:3,1:3] # Percentiles G100_pint_DJF G101_pint_DJF #1 0% 0.982001 1.020892 #2 1% 1.005563 1.039288 #3 2% 1.029126 1.057685 any(is.na(ReadOut2$pint_DJF)) [1] FALSE A.K. On Wednesday, April 16, 2014 12:34 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I tried the updated Quantilecode.txt. It works well but when I open the files in Indices, I find some columns filled with NAs. This should not be the case given that I am working with simulations and there are no missing values in the process. The ##not correct section yielded no NAs. Check for example, pint_..._DJF in Indices. Let be be sure we are in the same page. I removed the ##not correct section of the code, ran the code from beginning to end; Q1 and then Q2. My results are found in the Indices folder. Thanks, Atem. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi AK, Thanks very much. Atem. On Wednesday, April 16, 2014 9:32 PM, arun smartpink...@yahoo.com wrote: Hi, Use this code after `lst2`. lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) { datN - as.data.frame(matrix(NA, nrow = 101, ncol = length(names1), dimnames = list(NULL, names1))) x1 - x[, -1] qt - numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x1) datN[, match(names(x1), names(datN))] - qt datN }) arr1 - array(unlist(lstN), dim = c(dim(lstN[[1]]), length(lstN)), dimnames = list(NULL, names1)) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, function(x) dim(x)) lstNew - simplify2array(ReadOut1) nrow(lstNew) #[1] 258 dir.create(Indices) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(Percentiles, paste(names(lst2), rep(rownames(lstNew)[i], length(lst2)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, gsub( , _, rownames(lstNew)[i]), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) names(ReadOut2) - gsub(.*\\/(.*)\\.csv,\\1,list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))]) ReadOut2$pint_DJF[1:3,1:3] # Percentiles G100_pint_DJF G101_pint_DJF #1 0% 0.982001 1.020892 #2 1% 1.005563 1.039288 #3 2% 1.029126 1.057685 any(is.na(ReadOut2$pint_DJF)) [1] FALSE A.K. On Wednesday, April 16, 2014 12:34 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I tried the updated Quantilecode.txt. It works well but when I open the files in Indices, I find some columns filled with NAs. This should not be the case given that I am working with simulations and there are no missing values in the process. The ##not correct section yielded no NAs. Check for example, pint_..._DJF in Indices. Let be be sure we are in the same page. I removed the ##not correct section of the code, ran the code from beginning to end; Q1 and then Q2. My results are found in the Indices folder. Thanks, Atem. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi AK, All codes for simulation files work great. I will try the code for observations and let you know. Thanks very much. Atem. On Tuesday, April 15, 2014 12:01 AM, arun smartpink...@yahoo.com wrote: Yes, my new solution ignores such cases. On Monday, April 14, 2014 11:58 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, Please ignore any such site. I will check it and include in the analysis. Thanks, Atem. On Monday, April 14, 2014 9:34 PM, arun smartpink...@yahoo.com wrote: Hi, I looked at your Observed.zip. In that one of the file is without any data: GG83_Sim.csv.ind.csv The contents of the file are just: Year Year trend p A.K. On Monday, April 14, 2014 10:41 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, Q1) Please try to correct the error using the larger data set (Sample.zip). The issue is that once you write the codes and restrict it to smaller data sets, I find it difficult to generalize it to larger data sets. Q2) From the Quantilecode2.txt you just sent, you forgot to do the following section using the Observed.zip file. I tried to run the code to section Q1 in Quantilecode2.txt using a larger data set and received the same error :Error in 2:nrow(lstNew) : argument of length 0. I have attached a larger data set too for you to generalize the code to suit the larger data set. Please do not forget to include the code below in the final code of Q2. Once you fix these two, I should be able to fix the rest following these examples. Thanks AK. Sorry for overloading you with much work. Atem. #== dir.create(Indices) names1 - lapply(ReadOut1, function(x) names(x))[[1]] lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /), .csv), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) # [1] 257 head(ReadOut2[[1]], 2) #== On Monday, April 14, 2014 8:07 PM, arun smartpink...@yahoo.com wrote: HI, Please send your emails in plain text. If you had looked at the dimensions of `lst2`: sapply(lst2,function(x) sapply(x,ncol))[1:6,] G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114 [1,] 258 258 258 258 258 257 258 258 258 258 258 258 258 258 247 [2,] 258 258 258 258 258 258 258 258 258 258 258 258 258 258 258 [3,] 258 258 258 258 258 258 258 258 258 258 258 258 258 258 257 [4,] 258 258 258 258 258 257 258 258 258 258 258 258 258 258 258 [5,] 258 258 258 258 258 258 258 258 258 258 258 258 258 258 258 [6,] 258 258 258 258 258 258 258 258 258 258 258 258 258 258 258 G115 G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18 [1,] 258 247 256 256 258 258 258 258 258 258 258 258 258 257 258 [2,] 258 250 257 258 258 256 258 258 258 258 258 258 258 258 258 [3,] 258 247 256 258 258 256 258 258 258 258 258 258 258 258 256 [4,] 258 258 258 257 258 258 258 258 258 258 258 258 258 257 258 [5,] 258 257 258 258 258 256 258 258 258 258 258 258 258 258 258 [6,] 258 257 249 257 258 258 258 258 258 258 258 258 258 258 258 GG19 GG20 GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28 [1,] 258 258 258 258 258 258 258 258 258 258 [2,] 258 258 258 258 258 258 258 258 258 258 [3,] 258 258 257 258 256 257 258 258 258 258 [4,] 258 257 258 258 258 257 258 258 258 258 [5,] 258 258 257 258 257 258 258 258 258 258 [6,] 258 258 258 258 257 258 258 258 258 258 #the dimensions are not consistent for the Simulations within each Site. My codes assumed that all the datasets were having the same number of columns, rows etc. On Monday, April 14, 2014 6:26 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I have another request for help. Attached is a larger file (~27MB) for sample.zip. All files are same as previous except that I am using more sites to do the same thing that you did with sample.zip. When generalizing Quantilecode.R to many sites, I receive an error when I run: dir.create(Indices) names1 - lapply(ReadOut1, function(x) names(x))[[1]] lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors =
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi Atem, May be this works. ### Q1: working directory: Observed #Only one file per Site. Assuming this is the ### case for the full dataset, then I guess there is no need to average dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = FALSE, sep = ,, stringsAsFactors = FALSE, skip = 2) colnames(dat1) - Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) lst3 - lst2[sapply(seq_along(lst2),function(i){lstN - sapply(lst2[[i]],function(x) is.integer(ncol(x)))})] length(lst2) #[1] 120 length(lst3) #[1] 119 library(plyr) library(stringr) lst4 - setNames(lapply(seq_along(lst3), function(i) { lapply(lst3[[i]], function(x) { names(x)[-1] - paste(names(x)[-1], names(lst1)[i], sep = _) names(x) - str_trim(names(x)) x })[[1]] }), names(lst3)) df1 - join_all(lst4, by = Year) dim(df1) # [1] 9 27311 dimCol - sapply(split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])), function(x) { df2 - df1[, x] df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE) ncol(df3) }) lst5 - split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])) lapply(seq_along(lst5), function(i) { df2 - df1[, lst5[[i]]] df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE) write.csv(df3, paste0(paste(getwd(), final, paste(names(lst4)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) dir.create(Indices) sapply(ReadOut1, dim)[,1:3] ##different dimensions # [,1] [,2] [,3] #[1,] 101 101 101 #[2,] 157 258 258 names1 - unlist(lapply(ReadOut1, function(x) names(x)[-1])) names2 - gsub(\\_.*, , names1) names3 - unique(gsub([.], , names2)) length(names3) #[1] 264 #lstNew - simplify2array(ReadOut1) ###results you got # nrow(lstNew) #NULL ReadOut2 - lapply(seq_along(ReadOut1),function(i) {df2 - ReadOut1[[i]]; df3 -as.data.frame(matrix(NA,nrow=101,ncol=length(names3), dimnames=list(NULL, names3))); names(df2) - gsub([.], , gsub(\\_.*,, names(df2))); df2 - df2[,-1]; df3[,match(names(df2), names(df3))] - df2; df3}) lstNew - simplify2array(ReadOut2) nrow(lstNew) #[1] 264 lapply(1:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE); colnames(dat1) - c(Percentiles, paste(names(lst3), rep(rownames(lstNew)[i],length(lst3)),sep=_)); write.csv(dat1,paste0(paste(getwd(), Indices, gsub( , _,rownames(lstNew)[i]), sep=/),.csv),row.names=FALSE, quote=FALSE)}) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) #[1] 264 ReadOut2[[1]][1:3,1:3] # Percentiles G100_pav.ANN G101_pav.ANN #1 0% 0.766900 0.96240 #2 1% 0.796132 0.96572 #3 2% 0.825364 0.96904 Attached is the file. A.K. On Tuesday, April 15, 2014 4:00 AM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I tried all codes for observations. All others work great except this (probably due to different dimensions. What I did is that I took the Observed.zip file, deleted the station which had no data and applied the code. However, this section of the code did not work. The problem is that lstNew is NULL. So, nothing is actually written to Indices. I will check ReadOut1 when I get up from sleep. Thanks, Atem. dir.create(Indices) names1 - lapply(ReadOut1, function(x) names(x))[[1]] lstNew - simplify2array(ReadOut1) nrow(lstNew) #[1] NULL lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /), .csv), row.names = FALSE, quote = FALSE) }) === On Tuesday, April 15, 2014 12:45 AM, arun smartpink...@yahoo.com wrote: HI Atem, No problem. Hope it works for Observation files too. Remember that before you run the same code for sample in Observation, check the dimensions of the files (as I did previously). If there is change of dimensions, make them the same dimensions
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi AK, Thanks very much. I worked great. Many thanks. Atem. On Tuesday, April 15, 2014 9:20 AM, arun smartpink...@yahoo.com wrote: Hi Atem, May be this works. ### Q1: working directory: Observed #Only one file per Site. Assuming this is the ### case for the full dataset, then I guess there is no need to average dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = FALSE, sep = ,, stringsAsFactors = FALSE, skip = 2) colnames(dat1) - Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) lst3 - lst2[sapply(seq_along(lst2),function(i){lstN - sapply(lst2[[i]],function(x) is.integer(ncol(x)))})] length(lst2) #[1] 120 length(lst3) #[1] 119 library(plyr) library(stringr) lst4 - setNames(lapply(seq_along(lst3), function(i) { lapply(lst3[[i]], function(x) { names(x)[-1] - paste(names(x)[-1], names(lst1)[i], sep = _) names(x) - str_trim(names(x)) x })[[1]] }), names(lst3)) df1 - join_all(lst4, by = Year) dim(df1) # [1] 9 27311 dimCol - sapply(split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])), function(x) { df2 - df1[, x] df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE) ncol(df3) }) lst5 - split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])) lapply(seq_along(lst5), function(i) { df2 - df1[, lst5[[i]]] df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE) write.csv(df3, paste0(paste(getwd(), final, paste(names(lst4)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) dir.create(Indices) sapply(ReadOut1, dim)[,1:3] ##different dimensions # [,1] [,2] [,3] #[1,] 101 101 101 #[2,] 157 258 258 names1 - unlist(lapply(ReadOut1, function(x) names(x)[-1])) names2 - gsub(\\_.*, , names1) names3 - unique(gsub([.], , names2)) length(names3) #[1] 264 #lstNew - simplify2array(ReadOut1) ###results you got # nrow(lstNew) #NULL ReadOut2 - lapply(seq_along(ReadOut1),function(i) {df2 - ReadOut1[[i]]; df3 -as.data.frame(matrix(NA,nrow=101,ncol=length(names3), dimnames=list(NULL, names3))); names(df2) - gsub([.], , gsub(\\_.*,, names(df2))); df2 - df2[,-1]; df3[,match(names(df2), names(df3))] - df2; df3}) lstNew - simplify2array(ReadOut2) nrow(lstNew) #[1] 264 lapply(1:nrow(lstNew), function(i) { dat1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE); colnames(dat1) - c(Percentiles, paste(names(lst3), rep(rownames(lstNew)[i],length(lst3)),sep=_)); write.csv(dat1,paste0(paste(getwd(), Indices, gsub( , _,rownames(lstNew)[i]), sep=/),.csv),row.names=FALSE, quote=FALSE)}) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) #[1] 264 ReadOut2[[1]][1:3,1:3] # Percentiles G100_pav.ANN G101_pav.ANN #1 0% 0.766900 0.96240 #2 1% 0.796132 0.96572 #3 2% 0.825364 0.96904 Attached is the file. A.K. On Tuesday, April 15, 2014 4:00 AM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I tried all codes for observations. All others work great except this (probably due to different dimensions. What I did is that I took the Observed.zip file, deleted the station which had no data and applied the code. However, this section of the code did not work. The problem is that lstNew is NULL. So, nothing is actually written to Indices. I will check ReadOut1 when I get up from sleep. Thanks, Atem. dir.create(Indices) names1 - lapply(ReadOut1, function(x) names(x))[[1]] lstNew - simplify2array(ReadOut1) nrow(lstNew) #[1] NULL lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /), .csv), row.names = FALSE, quote = FALSE) }) === On Tuesday, April 15, 2014 12:45 AM, arun smartpink...@yahoo.com wrote: HI Atem, No problem. Hope it works for Observation files too. Remember that before you run the same code for sample in Observation,
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi, It is because of different dimensions of Simulation data within each Site. Try: dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) sapply(lst1,length) #G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114 G115 # 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 #G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18 GG19 GG20 # 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 #GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28 # 100 100 100 100 100 100 100 100 lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = FALSE, sep = ,, stringsAsFactors = FALSE, skip = 2) colnames(dat1) - Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) ##dimensions differ within each Site sapply(lst2,function(x) sapply(x,ncol))[1:6,5:8] # G104 G105 G106 G107 #[1,] 258 257 258 258 #[2,] 258 258 258 258 #[3,] 258 258 258 258 #[4,] 258 257 258 258 #[5,] 258 258 258 258 #[6,] 258 258 258 258 ##number of rows are consistent sapply(lst2,function(x) any(sapply(x,nrow)!=9)) # G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 #FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # G113 G114 G115 G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 #FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # GG15 GG16 GG17 GG18 GG19 GG20 GG21 GG22 GG23 GG24 GG25 GG26 GG27 #FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # GG28 #FALSE names1 - unique(unlist(lapply(lst2,function(x) unlist(lapply(x,function(y) names(y)[-1]) length(names1) #[1] 257 # lstYear - lapply(lst2,function(x) lapply(x, function(y) # y[,1,drop=FALSE])[[1]]) library(plyr) lapply(seq_along(lst2),function(i) {lstN - lapply(lst2[[i]],function(x) {datN - as.data.frame(matrix(NA, nrow=9, ncol=length(names1),dimnames=list(NULL,names1)));datN[,names1] - x[,-1]; datN }); lstQ1 - lapply(lstN,function(x) numcolwise(function(y) quantile(y,seq(0,1,by=0.01), na.rm=TRUE))(x)); arr1 - array(unlist(lstQ1), dim=c(dim(lstQ1[[1]]),length(lstQ1)),dimnames=list(NULL,lapply(lstQ1,names)[[1]])); res - rowMeans(arr1, dims=2, na.rm=TRUE); colnames(res) - gsub( , _, colnames(res)); res1 - data.frame(Percentiles=paste0(seq(0,100, by=1),%),res, stringsAsFactors=FALSE); write.csv(res1,paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile, sep=_), sep= /), .csv), row.names=FALSE, quote=FALSE)}) ## output files list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))] #[1] final/G100_Quantile.csv final/G101_Quantile.csv #[3] final/G102_Quantile.csv final/G103_Quantile.csv #[5] final/G104_Quantile.csv final/G105_Quantile.csv #[7] final/G106_Quantile.csv final/G107_Quantile.csv #[9] final/G108_Quantile.csv final/G109_Quantile.csv #[11] final/G110_Quantile.csv final/G111_Quantile.csv #[13] final/G112_Quantile.csv final/G113_Quantile.csv #[15] final/G114_Quantile.csv final/G115_Quantile.csv #[17] final/G116_Quantile.csv final/G117_Quantile.csv #[19] final/G118_Quantile.csv final/G119_Quantile.csv #[21] final/G120_Quantile.csv final/GG10_Quantile.csv #[23] final/GG11_Quantile.csv final/GG12_Quantile.csv #[25] final/GG13_Quantile.csv final/GG14_Quantile.csv #[27] final/GG15_Quantile.csv final/GG16_Quantile.csv #[29] final/GG17_Quantile.csv final/GG18_Quantile.csv #[31] final/GG19_Quantile.csv final/GG20_Quantile.csv #[33] final/GG21_Quantile.csv final/GG22_Quantile.csv #[35] final/GG23_Quantile.csv final/GG24_Quantile.csv #[37] final/GG25_Quantile.csv final/GG26_Quantile.csv #[39] final/GG27_Quantile.csv final/GG28_Quantile.csv ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1,function(x) dim(x)) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] #[1,] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 #[2,] 258 258 258 258 258 258 258 258 258 258 258 258 258 258 # [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] #[1,] 101 101 101 101 101 101 101 101 101 101 101 101 #[2,] 258 258 258 258 258 258 258 258 258 258 258 258 # [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] #[1,] 101 101 101 101 101 101 101 101 101 101 101 101 #[2,] 258 258 258 258 258 258 258 258 258 258 258 258 # [,39] [,40] #[1,] 101 101 #[2,] 258 258 ReadOut1[[1]][1:3,1:3] # Percentiles txav_DJF txav_MAM #1 0% -12.56619 6.795429 #2 1% -12.45888
Re: [R] : Quantile and rowMean from multiple files in a folder
Hi, Q1 solution already sent. Regarding Q2, one of the files in the new Observed folder doesn't have any data (just the Year column alone). That may be the reason for the problem. ### Q1: working directory: Observed #Only one file per Site. Assuming this is the ### case for the full dataset, then I guess there is no need to average dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = FALSE, sep = ,, stringsAsFactors = FALSE, skip = 2) colnames(dat1) - Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) lst3 - lst2[sapply(seq_along(lst2),function(i){lstN - sapply(lst2[[i]],function(x) is.integer(ncol(x)))})] #difference in column number sapply(seq_along(lst3), function(i) { sapply(lst3[[i]], function(x) ncol(x)) }) # #[1] 157 258 258 98 157 258 256 258 250 258 258 147 157 250 250 256 249 240 # [19] 181 188 256 146 117 258 153 256 255 246 255 256 258 257 145 258 258 255 # [37] 258 157 164 144 265 258 254 258 258 157 258 176 258 256 257 258 258 258 # [55] 248 258 156 258 157 157 258 258 258 258 258 148 258 258 258 258 257 258 # [73] 258 258 157 154 153 258 248 255 257 256 258 258 157 256 256 257 257 250 # [91] 257 139 155 256 256 257 257 256 258 258 257 258 258 258 258 157 157 157 #[109] 258 258 258 258 256 258 157 258 258 256 258 library(plyr) library(stringr) lst4 - setNames(lapply(seq_along(lst3), function(i) { lapply(lst3[[i]], function(x) { names(x)[-1] - paste(names(x)[-1], names(lst1)[i], sep = _) names(x) - str_trim(names(x)) x })[[1]] }), names(lst3)) df1 - join_all(lst4, by = Year) dim(df1) # [1] 9 27311 sapply(split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])), function(x) { df2 - df1[, x] df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE) ncol(df3) }) # #G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114 G115 # 157 258 258 98 157 258 256 258 250 258 258 147 157 250 250 256 #G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18 GG19 GG20 # 249 240 181 188 256 146 117 258 153 256 255 246 255 256 258 257 #GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28 GG29 GG30 GG31 GG32 GG33 GG34 GG35 GG36 # 145 258 258 255 258 157 164 144 265 258 254 258 258 157 258 176 #GG37 GG38 GG39 GG40 GG41 GG42 GG43 GG44 GG45 GG46 GG47 GG48 GG49 GG50 GG51 GG52 # 258 256 257 258 258 258 248 258 156 258 157 157 258 258 258 258 #GG53 GG54 GG55 GG56 GG57 GG58 GG59 GG60 GG61 GG62 GG63 GG64 GG65 GG66 GG67 GG68 # 258 148 258 258 258 258 257 258 258 258 157 154 153 258 248 255 #GG69 GG70 GG71 GG72 GG73 GG74 GG75 GG76 GG77 GG78 GG79 GG80 GG81 GG82 GG83 GG84 # 257 256 258 258 157 256 256 257 257 250 257 139 155 256 256 257 #GG85 GG86 GG87 GG88 GG89 GG90 GG91 GG92 GG93 GG94 GG95 GG96 GG97 GG98 GG99 GGG1 # 257 256 258 258 257 258 258 258 258 157 157 157 258 258 258 258 #GGG2 GGG3 GGG4 GGG5 GGG6 GGG7 GGG8 # 256 258 157 258 258 256 258 lst5 - split(names(df1)[-1], gsub(.*\\_, , names(df1)[-1])) lapply(seq_along(lst5), function(i) { df2 - df1[, lst5[[i]]] df3 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(df2), stringsAsFactors = FALSE) df3[1:3, 1:3] write.csv(df3, paste0(paste(getwd(), final, paste(names(lst4)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, dim)[,1:3] # [,1] [,2] [,3] #[1,] 101 101 101 #[2,] 157 258 258 lapply(ReadOut1, function(x) x[1:2, 1:3])[1:3] #[[1]] # Percentiles pav.DJF_G100 pav.MAM_G100 #1 0%0 0.640500 #2 1%0 0.664604 # #[[2]] # Percentiles txav.DJF_G101 txav.MAM_G101 #1 0% -13.8756 4.742400 #2 1% -13.8140 4.817184 # #[[3]] # Percentiles txav.DJF_G102 txav.MAM_G102 #1 0% -15.05000 4.520700 #2 1% -14.96833 4.543828 ### Q2: Observed data dir.create(Indices) names1 - unlist(lapply(ReadOut1, function(x) names(x)[-1])) names2 - gsub(\\_.*, , names1) names3 - unique(gsub([.], , names2)) res - do.call(rbind, lapply(seq_along(lst5), function(i) { df2 - df1[, lst5[[i]]] vec1 - colMeans(df2, na.rm = TRUE) vec2 - rep(NA, length(names3)) names(vec2) - paste(names3, names(lst5)[[i]], sep = _) vec2[names(vec2) %in%
Re: [R] Quantile and rowMean from multiple files in a folder
Hi AK, Thanks very much. I did send you another email with a larger Sample.zip file. The Quantilecode.R which you initially developed for a smaller sample.zip did not complete the task when I used it for a larger data set. Please check to rectify the error message. Thanks, Atem. -- Original Message -- From : arun To : R. Help; Cc : Zilefac Elvis; Sent : 14-04-2014 18:57 Subject : Re: Quantile and rowMean from multiple files in a folder Hi Atem, I guess this is what you wanted. ###Q1: ### ###working directory: Observed #Only one file per Site. Assuming this is the case for the full dataset, then I guess there is no need to average dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(patter n = .csv))) lst2 - lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines( x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=, ,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(head er1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]})) #different number of rows sapply(seq_along(lst2),function(i){lstN - lapply(lst2[[i]],function(x) x[,-1] );sapply(lstN,function(x) nrow(x))}) #[1] 9 9 9 8 2 9 #difference in number of columns sapply(seq_along(lst2),function(i) {sapply(lst2[[i]],function(x) ncol(x))}) #[1] 157 258 258 98 157 258 library(plyr) library(stringr) lst3 - setNames(lapply(seq_along(lst2),function(i) {lapply(lst2[[i]],function( x) {names(x)[-1] - paste(names(x)[-1], names(lst1)[i],sep=_); names(x) - st r_trim(names(x)); x})[[1]]}), names(lst1)) df1 - join_all(lst3,by=Year) dim(df1) #[1]9 1181 sapply(split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1])),function(x) {df2 - df1[,x];df3 - data.frame(Percentiles=paste0(seq(0,100, by=1) ,%), numcolw ise(function(y) quantile(y,seq(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors= FALSE);ncol(df3) }) #G100 G101 G102 G103 G104 G105 # 157 258 258 98 157 258 lst4 - split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1])) lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]]; df3 - data.frame(P ercentiles=paste0(seq(0,100, by=1) ,%), numcolwise(function(y) quantile(y,seq (0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=FALSE);df3[1:3,1:3]; write.csv (df3,paste0(paste(getwd(), final,paste(names(lst1)[[i]],Quantile,sep=_),s ep=/),.csv),row.names=FALSE,quote=FALSE)}) ReadOut1 - lapply(list.files(recursive=TRUE)[grep(Quantile,list.files(recurs ive=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) sapply(ReadOut1,dim) # [,1] [,2] [,3] [,4] [,5] [,6] #[1,] 101 101 101 101 101 101 #[2,] 157 258 258 98 157 258 lapply(ReadOut1,function(x) x[1:2,1:3])[1:3] #[[1]] # Percentiles pav.DJF_G100 pav.MAM_G100 #1 0%0 0.640500 #2 1%0 0.664604 # #[[2]] # Percentiles txav.DJF_G101 txav.MAM_G101 #1 0% -13.8756 4.742400 #2 1% -13.8140 4.817184 # #[[3]] # Percentiles txav.DJF_G102 txav.MAM_G102 #1 0% -15.05000 4.520700 #2 1% -14.96833 4.543828 # ###Q2: ###Observed data dir.create(Indices) names1 - unlist(lapply(ReadOut1,function(x) names(x)[-1])) names2 - gsub(\\_.*,,names1) names3 - unique(gsub([.], , names2)) res - do.call(rbind,lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]] ;vec1 - colMeans(df2,na.rm=TRUE); vec2 - rep(NA,length(names3));names(vec2) - paste(names3,names(lst4)[[i]],sep=_); vec2[names(vec2) %in% names(vec1)] - vec1; names(vec2) - gsub(\\_.*,,names(vec2)); vec2 })) lapply(seq_len(ncol(res)),function(i) {mat1 - t(res[,i,drop=FALSE]);colnames(m at1) - names(lst4); write.csv(mat1,paste0(paste(getwd(),Indices, gsub( ,_ ,rownames(mat1)),sep=/),.csv),row.names=FALSE,quote=FALSE)}) ##Output2: ReadOut2 - lapply(list.files(recursive=TRUE)[grep(Indices,list.files(recursi ve=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) length(ReadOut2) #[1] 257 list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))][1] #[1] Indices/pav_ANN.csv res[,pav ANN,drop=FALSE] # pav ANN #[1,] 1.298811 #[2,] 7.642922 #[3,] 6.740011 #[4,] NA #[5,] 1.296650 #[6,] 6.887622 ReadOut2[[1]] # G100 G101 G102 G103G104 G105 #1 1.298811 7.642922 6.740011 NA 1.29665 6.887622 ###Sample data ###Working directory changed to sample dir.create(Indices_colMeans) lst1 - split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.c sv))) lst2 - lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines( x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=, ,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(head er1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]})) res1 - do.call(rbind,lapply(seq_along(lst2),function(i) {rowMeans(do.call(cbin d,lapply(lst2[[i]],function(x)
Re: [R] Quantile and rowMean from multiple files in a folder
Hi Atem, I guess this is what you wanted. ###Q1: ### ###working directory: Observed #Only one file per Site. Assuming this is the case for the full dataset, then I guess there is no need to average dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) lst2 - lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines(x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(header1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]})) #different number of rows sapply(seq_along(lst2),function(i){lstN - lapply(lst2[[i]],function(x) x[,-1]);sapply(lstN,function(x) nrow(x))}) #[1] 9 9 9 8 2 9 #difference in number of columns sapply(seq_along(lst2),function(i) {sapply(lst2[[i]],function(x) ncol(x))}) #[1] 157 258 258 98 157 258 library(plyr) library(stringr) lst3 - setNames(lapply(seq_along(lst2),function(i) {lapply(lst2[[i]],function(x) {names(x)[-1] - paste(names(x)[-1], names(lst1)[i],sep=_); names(x) - str_trim(names(x)); x})[[1]]}), names(lst1)) df1 - join_all(lst3,by=Year) dim(df1) #[1] 9 1181 sapply(split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1])),function(x) {df2 - df1[,x];df3 - data.frame(Percentiles=paste0(seq(0,100, by=1) ,%), numcolwise(function(y) quantile(y,seq(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=FALSE);ncol(df3) }) #G100 G101 G102 G103 G104 G105 # 157 258 258 98 157 258 lst4 - split(names(df1)[-1] ,gsub(.*\\_,,names(df1)[-1])) lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]]; df3 - data.frame(Percentiles=paste0(seq(0,100, by=1) ,%), numcolwise(function(y) quantile(y,seq(0,1,by=0.01),na.rm=TRUE))(df2),stringsAsFactors=FALSE);df3[1:3,1:3]; write.csv(df3,paste0(paste(getwd(), final,paste(names(lst1)[[i]],Quantile,sep=_),sep=/),.csv),row.names=FALSE,quote=FALSE)}) ReadOut1 - lapply(list.files(recursive=TRUE)[grep(Quantile,list.files(recursive=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) sapply(ReadOut1,dim) # [,1] [,2] [,3] [,4] [,5] [,6] #[1,] 101 101 101 101 101 101 #[2,] 157 258 258 98 157 258 lapply(ReadOut1,function(x) x[1:2,1:3])[1:3] #[[1]] # Percentiles pav.DJF_G100 pav.MAM_G100 #1 0% 0 0.640500 #2 1% 0 0.664604 # #[[2]] # Percentiles txav.DJF_G101 txav.MAM_G101 #1 0% -13.8756 4.742400 #2 1% -13.8140 4.817184 # #[[3]] # Percentiles txav.DJF_G102 txav.MAM_G102 #1 0% -15.05000 4.520700 #2 1% -14.96833 4.543828 # ###Q2: ###Observed data dir.create(Indices) names1 - unlist(lapply(ReadOut1,function(x) names(x)[-1])) names2 - gsub(\\_.*,,names1) names3 - unique(gsub([.], , names2)) res - do.call(rbind,lapply(seq_along(lst4),function(i) {df2 - df1[,lst4[[i]]];vec1 - colMeans(df2,na.rm=TRUE); vec2 - rep(NA,length(names3));names(vec2) - paste(names3,names(lst4)[[i]],sep=_); vec2[names(vec2) %in% names(vec1)] - vec1; names(vec2) - gsub(\\_.*,,names(vec2)); vec2 })) lapply(seq_len(ncol(res)),function(i) {mat1 - t(res[,i,drop=FALSE]);colnames(mat1) - names(lst4); write.csv(mat1,paste0(paste(getwd(),Indices, gsub( ,_,rownames(mat1)),sep=/),.csv),row.names=FALSE,quote=FALSE)}) ##Output2: ReadOut2 - lapply(list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) length(ReadOut2) #[1] 257 list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))][1] #[1] Indices/pav_ANN.csv res[,pav ANN,drop=FALSE] # pav ANN #[1,] 1.298811 #[2,] 7.642922 #[3,] 6.740011 #[4,] NA #[5,] 1.296650 #[6,] 6.887622 ReadOut2[[1]] # G100 G101 G102 G103G104 G105 #1 1.298811 7.642922 6.740011 NA 1.29665 6.887622 ###Sample data ###Working directory changed to sample dir.create(Indices_colMeans) lst1 - split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.csv))) lst2 - lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines(x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(header1,,));dat1[-c(nrow(dat1),nrow(dat1)-1),]})) res1 - do.call(rbind,lapply(seq_along(lst2),function(i) {rowMeans(do.call(cbind,lapply(lst2[[i]],function(x) colMeans(x[,-1],na.rm=TRUE))),na.rm=TRUE) })) lapply(seq_len(ncol(res1)),function(i){mat1 - t(res1[,i,drop=FALSE]); colnames(mat1) - names(lst2);write.csv(mat1,paste0(paste(getwd(),Indices_colMeans,gsub( ,_,rownames(mat1)),sep=/),.csv),row.names=FALSE,quote=FALSE)}) ##Output2 Sample ReadOut2S - lapply(list.files(recursive=TRUE)[grep(Indices,list.files(recursive=TRUE))],function(x) read.csv(x,header=TRUE,stringsAsFactors=FALSE)) length(ReadOut2S) #[1] 257
Re: [R] Quantile and rowMean from multiple files in a folder
Hi, I am formatting the codes using library(formatR). Hopefully, it will not be mangled in the email. dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = FALSE, sep = ,, stringsAsFactors = FALSE, skip = 2) colnames(dat1) - Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) library(plyr) lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) x[, -1]) lstQ1 - lapply(lstN, function(x) numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x)) arr1 - array(unlist(lstQ1), dim = c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL, lapply(lstQ1, names)[[1]])) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, dim) # [,1] [,2] #[1,] 101 101 #[2,] 258 258 lapply(ReadOut1,function(x) x[1:2,1:3]) #[[1]] # Percentiles txav_DJF txav_MAM #1 0% -12.68566 7.09702 #2 1% -12.59062 7.15338 # #[[2]] # Percentiles txav_DJF txav_MAM #1 0% -12.75516 6.841840 #2 1% -12.68244 6.910664 ###Q2: dir.create(Indices) names1 - lapply(ReadOut1, function(x) names(x))[[1]] lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /), .csv), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) # [1] 257 head(ReadOut2[[1]], 2) # Percentiles G100_pav_ANN G101_pav_ANN #1 0% 1.054380 1.032740 #2 1% 1.069457 1.045689 A.K. On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, Q1) I need your help again. Using the previous data (attached) and the previous code below,instead of taking rowMeans, let's do quantile(x,seq(0,1,by=0.01)). Delete the last 2 rows (Trend and p) in each file before doing quantile(x,seq(0,1,by=0.01)). For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for each column of Site G100. I will do so for the 5 sims of site G100 and then take their average. This will be approximately close to the true value than just calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing for all the files. So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01)) calculated from all sims in that Site. Output The number of files in final remains the same (2 files). The Year column(will be replaced) will contain the names of quantile(x,seq(0,1,by=0.01)) such as 0% 1% 2% 3% 4% 5% 6%, ..., 98% 99% 100% . You can give this column any name such as Percentiles. Q2) From the folder final, please go to each file identified by site name, take a column, say col1 of txav from each file, create a dataframe whose colnames are site codes (names of files in final). Create a folder called Indices and place this dataframe in it. The filename for the dataframe is txav, say. So, in Indices, you will have one file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to be able to pick any column from files in final and form a dataframe from which I will generate my qqplot or boxplot. Thanks very much AK. Atem This should be the final step of this my drama, at least for now. #== dir.create(final) lst1 - split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.csv))) lst2 - lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines(x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(header1,,));dat1})) lstYear - lapply(lst2,function(x) lapply(x, function(y) y[,1,drop=FALSE])[[1]]) lapply(seq_along(lst2),function(i) {lstN -lapply(lst2[[i]],function(x) x[,-1]); arr1 -
Re: [R] Quantile and rowMean from multiple files in a folder
Hi AK, I must admit that you did an excellent job. Thanks very much. My analysis is manageable now. Regards, Atem. On Sunday, April 13, 2014 8:54 AM, arun smartpink...@yahoo.com wrote: Hi, I am formatting the codes using library(formatR). Hopefully, it will not be mangled in the email. dir.create(final) lst1 - split(list.files(pattern = .csv), gsub(\\_.*, , list.files(pattern = .csv))) lst2 - lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 - readLines(x2) header1 - lines1[1:2] dat1 - read.table(text = lines1, header = FALSE, sep = ,, stringsAsFactors = FALSE, skip = 2) colnames(dat1) - Reduce(paste, strsplit(header1, ,)) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) library(plyr) lapply(seq_along(lst2), function(i) { lstN - lapply(lst2[[i]], function(x) x[, -1]) lstQ1 - lapply(lstN, function(x) numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x)) arr1 - array(unlist(lstQ1), dim = c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL, lapply(lstQ1, names)[[1]])) res - rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) - gsub( , _, colnames(res)) res1 - data.frame(Percentiles = paste0(seq(0, 100, by = 1), %), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), final, paste(names(lst1)[[i]], Quantile, sep = _), sep = /), .csv), row.names = FALSE, quote = FALSE) }) ReadOut1 - lapply(list.files(recursive = TRUE)[grep(Quantile, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, dim) # [,1] [,2] #[1,] 101 101 #[2,] 258 258 lapply(ReadOut1,function(x) x[1:2,1:3]) #[[1]] # Percentiles txav_DJF txav_MAM #1 0% -12.68566 7.09702 #2 1% -12.59062 7.15338 # #[[2]] # Percentiles txav_DJF txav_MAM #1 0% -12.75516 6.841840 #2 1% -12.68244 6.910664 ###Q2: dir.create(Indices) names1 - lapply(ReadOut1, function(x) names(x))[[1]] lstNew - simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 - data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) - c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = _)) write.csv(dat1, paste0(paste(getwd(), Indices, rownames(lstNew)[i], sep = /), .csv), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 - lapply(list.files(recursive = TRUE)[grep(Indices, list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) # [1] 257 head(ReadOut2[[1]], 2) # Percentiles G100_pav_ANN G101_pav_ANN #1 0% 1.054380 1.032740 #2 1% 1.069457 1.045689 A.K. On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, Q1) I need your help again. Using the previous data (attached) and the previous code below,instead of taking rowMeans, let's do quantile(x,seq(0,1,by=0.01)). Delete the last 2 rows (Trend and p) in each file before doing quantile(x,seq(0,1,by=0.01)). For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for each column of Site G100. I will do so for the 5 sims of site G100 and then take their average. This will be approximately close to the true value than just calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing for all the files. So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01)) calculated from all sims in that Site. Output The number of files in final remains the same (2 files). The Year column(will be replaced) will contain the names of quantile(x,seq(0,1,by=0.01)) such as 0% 1% 2% 3% 4% 5% 6%, ..., 98% 99% 100% . You can give this column any name such as Percentiles. Q2) From the folder final, please go to each file identified by site name, take a column, say col1 of txav from each file, create a dataframe whose colnames are site codes (names of files in final). Create a folder called Indices and place this dataframe in it. The filename for the dataframe is txav, say. So, in Indices, you will have one file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to be able to pick any column from files in final and form a dataframe from which I will generate my qqplot or boxplot. Thanks very much AK. Atem This should be the final step of this my drama, at least for now. #== dir.create(final) lst1 - split(list.files(pattern=.csv),gsub(\\_.*,,list.files(pattern=.csv))) lst2 - lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 - readLines(x2); header1 - lines1[1:2]; dat1 - read.table(text=lines1,header=FALSE,sep=,,stringsAsFactors=FALSE, skip=2); colnames(dat1) - Reduce(paste,strsplit(header1,,));dat1})) lstYear - lapply(lst2,function(x) lapply(x,