Re: [R] Melt and Rbind/Rbindlist
Hello Mr. Holtman, Thank you very much for your reply and suggestion. This is what each Year's data looks like; tmp1 - structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 = c(285.5533142, 285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112, 285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767 ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745), X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711), X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)), .Names = c(FIPS, X2026.01.01.1, X2026.01.01.2, X2026.01.01.3, X2026.01.01.4, X2026.01.01.5, X2026.01.01.6), class = data.frame, row.names = c(NA, -3L)) The data is in 3-hour blocks for every day by US FIPS code from 2026-2045, each year's data is in a difference csv. My goal is to to compute max, min, and mean by week and month. I used the following code to assign week numbers to the observations; nweek - function(x, format=%Y-%m-%d, origin){ if(missing(origin)){ as.integer(format(strptime(x, format=format), %W)) }else{ x - as.Date(x, format=format) o - as.Date(origin, format=format) w - as.integer(format(strptime(x, format=format), %w)) 2 + as.integer(x - o - w) %/% 7 } } Then the following; for (i in filelist) { nweek(tmp2$date) } for (i in filelist) { nweek(dates, origin=2026-01-01) } for (i in filelist) { wkn-nweek(tmp2$date) } Is this efficient? Thank you so much again. I really appreciate it. Sincerely, Shouro On Sun, Feb 1, 2015 at 1:22 AM, jim holtman jholt...@gmail.com wrote: It would have been nice if you had at least supplied a subset (~10 lines) from a couple of files so we could see what the data looks like and test out any solution. Since you are using 'data.table', you should probably also use 'fread' for reading in the data. Here is a possible approach of reading the data into a list and then creating a single, large data.table: --- myDTs - lapply(filelist, function(.file) { tmp1 - fread(.file, sep=,) tmp2 - melt(tmp1, id=FIPS) tmp2$year - as.numeric(substr(tmp2$variable,2,5)) tmp2$month - as.numeric(substr(tmp2$variable,7,8)) tmp2$day - as.numeric(substr(tmp2$variable,10,11)) tmp2 # return value }) bigDT - rbindlist(myDTs) # rbind all the data.tables together # then you should be able to do: mean.temp - bigDT[, list(temp.mean=lapply(.SD, mean), by=c(FIPS,year,month), .SDcols=c(temp)] Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta sho...@gmail.com wrote: I have climate data for 20 years for US counties (FIPS) in csv format, each file represents one year of data. I have extracted the data and reshaped the yearly data files using melt(); for (i in filelist) { tmp1 - as.data.table(read.csv(i,header=T, sep=,)) tmp2 - melt(tmp1, id=FIPS) tmp2$year - as.numeric(substr(tmp2$variable,2,5)) tmp2$month - as.numeric(substr(tmp2$variable,7,8)) tmp2$day - as.numeric(substr(tmp2$variable,10,11)) } Should I *rbind *in the loop here as I have the memory? So, the file (i) tmp2 looks like this: FIPS temp year month date 1001 276.7936 2045 1 1/1/2045 1003 276.7936 2045 1 1/1/2045 1005 279.6452 2045 1 1/1/2045 1007 276.7936 2045 1 1/1/2045 1009 272.3748 2045 1 1/1/2045 1011 279.6452 2045 1 1/1/2045 My goal is calculate the mean by FIPS code by month/week, however, when I use the following code, I get a NULL value. mean.temp- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean), by=c(FIPS,year,month), .SDcols=c(temp)]} This works fine for individual years but with *for (i in filelist)*. What am I doing wrong? Can include a rbind/bindlist in the loop to make a big data.frame? Any suggestions will be highly appreciated. Thank you. Sincerely, Shouro [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Melt and Rbind/Rbindlist
It would have been nice if you had at least supplied a subset (~10 lines) from a couple of files so we could see what the data looks like and test out any solution. Since you are using 'data.table', you should probably also use 'fread' for reading in the data. Here is a possible approach of reading the data into a list and then creating a single, large data.table: --- myDTs - lapply(filelist, function(.file) { tmp1 - fread(.file, sep=,) tmp2 - melt(tmp1, id=FIPS) tmp2$year - as.numeric(substr(tmp2$variable,2,5)) tmp2$month - as.numeric(substr(tmp2$variable,7,8)) tmp2$day - as.numeric(substr(tmp2$variable,10,11)) tmp2 # return value }) bigDT - rbindlist(myDTs) # rbind all the data.tables together # then you should be able to do: mean.temp - bigDT[, list(temp.mean=lapply(.SD, mean), by=c(FIPS,year,month), .SDcols=c(temp)] Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta sho...@gmail.com wrote: I have climate data for 20 years for US counties (FIPS) in csv format, each file represents one year of data. I have extracted the data and reshaped the yearly data files using melt(); for (i in filelist) { tmp1 - as.data.table(read.csv(i,header=T, sep=,)) tmp2 - melt(tmp1, id=FIPS) tmp2$year - as.numeric(substr(tmp2$variable,2,5)) tmp2$month - as.numeric(substr(tmp2$variable,7,8)) tmp2$day - as.numeric(substr(tmp2$variable,10,11)) } Should I *rbind *in the loop here as I have the memory? So, the file (i) tmp2 looks like this: FIPS temp year month date 1001 276.7936 2045 1 1/1/2045 1003 276.7936 2045 1 1/1/2045 1005 279.6452 2045 1 1/1/2045 1007 276.7936 2045 1 1/1/2045 1009 272.3748 2045 1 1/1/2045 1011 279.6452 2045 1 1/1/2045 My goal is calculate the mean by FIPS code by month/week, however, when I use the following code, I get a NULL value. mean.temp- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean), by=c(FIPS,year,month), .SDcols=c(temp)]} This works fine for individual years but with *for (i in filelist)*. What am I doing wrong? Can include a rbind/bindlist in the loop to make a big data.frame? Any suggestions will be highly appreciated. Thank you. Sincerely, Shouro [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Melt and Rbind/Rbindlist
I have climate data for 20 years for US counties (FIPS) in csv format, each file represents one year of data. I have extracted the data and reshaped the yearly data files using melt(); for (i in filelist) { tmp1 - as.data.table(read.csv(i,header=T, sep=,)) tmp2 - melt(tmp1, id=FIPS) tmp2$year - as.numeric(substr(tmp2$variable,2,5)) tmp2$month - as.numeric(substr(tmp2$variable,7,8)) tmp2$day - as.numeric(substr(tmp2$variable,10,11)) } Should I *rbind *in the loop here as I have the memory? So, the file (i) tmp2 looks like this: FIPS temp year month date 1001 276.7936 2045 1 1/1/2045 1003 276.7936 2045 1 1/1/2045 1005 279.6452 2045 1 1/1/2045 1007 276.7936 2045 1 1/1/2045 1009 272.3748 2045 1 1/1/2045 1011 279.6452 2045 1 1/1/2045 My goal is calculate the mean by FIPS code by month/week, however, when I use the following code, I get a NULL value. mean.temp- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean), by=c(FIPS,year,month), .SDcols=c(temp)]} This works fine for individual years but with *for (i in filelist)*. What am I doing wrong? Can include a rbind/bindlist in the loop to make a big data.frame? Any suggestions will be highly appreciated. Thank you. Sincerely, Shouro [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.