Re: [R] Melt and Rbind/Rbindlist

2015-02-01 Thread Shouro Dasgupta
Hello Mr. Holtman,

Thank you very much for your reply and suggestion. This is what each Year's
data looks like;

tmp1 - structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
 c(285.5533142,
   285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
   285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
   ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
   X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
   X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)), .Names =
 c(FIPS,
   X2026.01.01.1, X2026.01.01.2, X2026.01.01.3, X2026.01.01.4,
   X2026.01.01.5, X2026.01.01.6), class = data.frame, row.names =
 c(NA,
   -3L))


The data is in 3-hour blocks for every day by US FIPS code from 2026-2045,
each year's data is in a difference csv. My goal is to to compute max, min,
and mean by week and month. I used the following code to assign week
numbers to the observations;

nweek - function(x, format=%Y-%m-%d, origin){
 if(missing(origin)){
 as.integer(format(strptime(x, format=format), %W))
 }else{
 x - as.Date(x, format=format)
 o - as.Date(origin, format=format)
 w - as.integer(format(strptime(x, format=format), %w))
 2 + as.integer(x - o - w) %/% 7
 }
 }


 Then the following;

for (i in filelist) {
 nweek(tmp2$date)
 }
 for (i in filelist) {
 nweek(dates, origin=2026-01-01)
 }
 for (i in filelist) {
 wkn-nweek(tmp2$date)
 }


Is this efficient? Thank you so much again. I really appreciate it.

Sincerely,

Shouro

On Sun, Feb 1, 2015 at 1:22 AM, jim holtman jholt...@gmail.com wrote:

 It would have been nice if you had at least supplied a subset (~10 lines)
 from a couple of files so we could see what the data looks like and test
 out any solution. Since you are using 'data.table', you should probably
 also use 'fread' for reading in the data.  Here is a possible approach of
 reading the data into a list and then creating a single, large data.table:

 ---
 myDTs - lapply(filelist, function(.file) {
   tmp1 - fread(.file, sep=,)
   tmp2 - melt(tmp1, id=FIPS)
   tmp2$year - as.numeric(substr(tmp2$variable,2,5))
   tmp2$month - as.numeric(substr(tmp2$variable,7,8))
   tmp2$day - as.numeric(substr(tmp2$variable,10,11))
   tmp2  # return value
 })

 bigDT - rbindlist(myDTs)  # rbind all the data.tables together

 # then you should be able to do:

 mean.temp - bigDT[, list(temp.mean=lapply(.SD, mean),
by=c(FIPS,year,month), .SDcols=c(temp)]




 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta sho...@gmail.com wrote:

 I have climate data for 20 years for US counties (FIPS) in csv format,
 each
 file represents one year of data. I have extracted the data and reshaped
 the yearly data files using melt();

 for (i in filelist) {
tmp1 - as.data.table(read.csv(i,header=T, sep=,))
tmp2 - melt(tmp1, id=FIPS)
tmp2$year - as.numeric(substr(tmp2$variable,2,5))
tmp2$month - as.numeric(substr(tmp2$variable,7,8))
tmp2$day - as.numeric(substr(tmp2$variable,10,11))
  }


 Should I *rbind *in the loop here as I have the memory?
 So, the file (i) tmp2 looks like this:

 FIPS  temp year month  date
  1001 276.7936 2045 1 1/1/2045
  1003 276.7936 2045 1 1/1/2045
  1005 279.6452 2045 1 1/1/2045
  1007 276.7936 2045 1 1/1/2045
  1009 272.3748 2045 1 1/1/2045
  1011 279.6452 2045 1 1/1/2045


 My goal is calculate the mean by FIPS code by month/week, however, when I
 use the following code, I get a NULL value.

 mean.temp- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
  by=c(FIPS,year,month), .SDcols=c(temp)]}


 This works fine for individual years but with *for (i in filelist)*. What
 am I doing wrong? Can include a rbind/bindlist in the loop to make a big
 data.frame? Any suggestions will be highly appreciated. Thank you.

 Sincerely,

 Shouro

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Melt and Rbind/Rbindlist

2015-01-31 Thread jim holtman
It would have been nice if you had at least supplied a subset (~10 lines)
from a couple of files so we could see what the data looks like and test
out any solution. Since you are using 'data.table', you should probably
also use 'fread' for reading in the data.  Here is a possible approach of
reading the data into a list and then creating a single, large data.table:

---
myDTs - lapply(filelist, function(.file) {
  tmp1 - fread(.file, sep=,)
  tmp2 - melt(tmp1, id=FIPS)
  tmp2$year - as.numeric(substr(tmp2$variable,2,5))
  tmp2$month - as.numeric(substr(tmp2$variable,7,8))
  tmp2$day - as.numeric(substr(tmp2$variable,10,11))
  tmp2  # return value
})

bigDT - rbindlist(myDTs)  # rbind all the data.tables together

# then you should be able to do:

mean.temp - bigDT[, list(temp.mean=lapply(.SD, mean),
   by=c(FIPS,year,month), .SDcols=c(temp)]




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta sho...@gmail.com wrote:

 I have climate data for 20 years for US counties (FIPS) in csv format, each
 file represents one year of data. I have extracted the data and reshaped
 the yearly data files using melt();

 for (i in filelist) {
tmp1 - as.data.table(read.csv(i,header=T, sep=,))
tmp2 - melt(tmp1, id=FIPS)
tmp2$year - as.numeric(substr(tmp2$variable,2,5))
tmp2$month - as.numeric(substr(tmp2$variable,7,8))
tmp2$day - as.numeric(substr(tmp2$variable,10,11))
  }


 Should I *rbind *in the loop here as I have the memory?
 So, the file (i) tmp2 looks like this:

 FIPS  temp year month  date
  1001 276.7936 2045 1 1/1/2045
  1003 276.7936 2045 1 1/1/2045
  1005 279.6452 2045 1 1/1/2045
  1007 276.7936 2045 1 1/1/2045
  1009 272.3748 2045 1 1/1/2045
  1011 279.6452 2045 1 1/1/2045


 My goal is calculate the mean by FIPS code by month/week, however, when I
 use the following code, I get a NULL value.

 mean.temp- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
  by=c(FIPS,year,month), .SDcols=c(temp)]}


 This works fine for individual years but with *for (i in filelist)*. What
 am I doing wrong? Can include a rbind/bindlist in the loop to make a big
 data.frame? Any suggestions will be highly appreciated. Thank you.

 Sincerely,

 Shouro

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Melt and Rbind/Rbindlist

2015-01-31 Thread Shouro Dasgupta
I have climate data for 20 years for US counties (FIPS) in csv format, each
file represents one year of data. I have extracted the data and reshaped
the yearly data files using melt();

for (i in filelist) {
   tmp1 - as.data.table(read.csv(i,header=T, sep=,))
   tmp2 - melt(tmp1, id=FIPS)
   tmp2$year - as.numeric(substr(tmp2$variable,2,5))
   tmp2$month - as.numeric(substr(tmp2$variable,7,8))
   tmp2$day - as.numeric(substr(tmp2$variable,10,11))
 }


Should I *rbind *in the loop here as I have the memory?
So, the file (i) tmp2 looks like this:

FIPS  temp year month  date
 1001 276.7936 2045 1 1/1/2045
 1003 276.7936 2045 1 1/1/2045
 1005 279.6452 2045 1 1/1/2045
 1007 276.7936 2045 1 1/1/2045
 1009 272.3748 2045 1 1/1/2045
 1011 279.6452 2045 1 1/1/2045


My goal is calculate the mean by FIPS code by month/week, however, when I
use the following code, I get a NULL value.

mean.temp- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
 by=c(FIPS,year,month), .SDcols=c(temp)]}


This works fine for individual years but with *for (i in filelist)*. What
am I doing wrong? Can include a rbind/bindlist in the loop to make a big
data.frame? Any suggestions will be highly appreciated. Thank you.

Sincerely,

Shouro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.