Re: [R] Data transformation problem

Jeff Newmiller Wed, 11 Nov 2020 23:04:23 -0800

I am not a data.table afficiando, but here is how I would do it withdplyr/tidyr:


library(dplyr)
library(tidyr)


do_per_REL <- function( DF ) {
  rng <- range( DF$REF1 ) # watch out for missing months?
  DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) )
        %>% left_join( DF, by = "REF1" )
        %>% arrange( REF1 )
        )
  with( DF
      , data.frame( REF2 = REF1[ -1 ]
                  , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
                  )
      )
}

df2a <- (   df1
        %>% mutate( REF1 = as.Date( REF1 )
                  , REL1 = as.Date( REL1 )
                  )
        %>% nest( data = -REL1 )
        %>% rename( REL2 = REL1 )
        %>% rowwise()
        %>% mutate( data = list( do_per_REL( data ) ) )
        %>% ungroup()
        %>% unnest( cols = "data" )
        %>% select( REF2, REL2, VAL2 )
        %>% arrange( REF2, desc( REL2 ), VAL2 )
        )
df2a

On Wed, 11 Nov 2020, p...@philipsmith.ca wrote:

I am stuck on a data transformation problem. I have a data frame, df1 in myexample, with some original "levels" data. The data pertain to some variable,such as GDP, in various reference periods, REF, as estimated and released invarious release periods, REL. The release periods follow after the referenceperiods by two months or more, sometimes by several years. I want to build asecond data frame, called df2 in my example, with the month-to-month growthrates that existed in each reference period, revealing the revisions to thosegrowth rates in subsequent periods.
REF1 <- c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
 "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
 "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <-c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
 19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <- c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
 "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)
In my example I have provided some sample data pertaining to three referencemonths, 2017-01-01 through 2017-03-01, and five release periods,"2020-09-01","2020-08-01","2020-07-01","2020-06-01" and "2019-05-01". In myactual problem I have millions of REF-REL combinations, so my data frame isquite large. I am using data.table for faster processing, though I am morefamiliar with the tidyverse. I am providing df2 as the target data frame formy example, so you can see what I am trying to achieve.
I have not been able to find an efficient way to do these calculations. Ihave tried "for" loops with "if" statements, without success so far, andanyway this approach would be too slow, I fear. Suggestions as to how I mightproceed would be much appreciated.
Philip

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation problem

Reply via email to