I have written two functions which do useful things with panel data a.k.a. longitudinal data, where one unit of observation (a firm or a person or an animal) is observed on a uniform time grid: - The first function makes lagged values of variables of your choice. - The second function makes growth rates w.r.t. q observations ago, for variables of your choice.
These strike me as bread-and-butter tasks in dealing with panel data. I couldn't find these functions in the standard R libraries. They are presented in this email for two reasons. First, it'll be great if R gurus can look at the code and propose improvements. Second, it'll be great if some package-owner can adopt these orphans :-) and make them available to the R community. The two functions follow: library(Hmisc) # Am using Lag() in this. # Task: For a supplied list of variables (the list `lagvars'), # make new columns in a dataset denoting lagged values. # You must supply `unitvar' which identifies the unit that's # repeatedly observed. # You must supply the name of the time variable `timevar' # and you must tell a list of the lags that interest you (`lags') # Example: # paneldata.lags(A, "person", "year", c("v1","v2"), lags=1:4) paneldata.lags <- function(X, unitvar, timevar, lagvars, lags=1) { stopifnot(length(lagvars)>=1) X <- X[order(X[,timevar]),] # just in case it's not sorted. innertask <- function(Y, lagvars, lags) { E <- labels <- NULL for (v in lagvars) { for (i in lags) { E <- cbind(E, Lag(Y[,v], i)) } labels <- c(labels, paste(v, ".l", lags, sep="")) } colnames(E) <- labels cbind(Y, E) } do.call("rbind", by(X, X[,unitvar], innertask, lagvars, lags)) } # Task: For a supplied list of variables (the list `gvars'), # make new columns in a dataset denoting growth rates. # You must supply `unitvar' which identifies the unit that's # repeatedly observed. # You must supply the name of the time variable `timevar' # and you must tell the time periods Q (vector is ok) over which # the growth rates are computed. paneldata.growthrates <- function(X, unitvar, timevar, gvars, Q=1) { stopifnot(length(gvars)>=1) X <- X[order(X[,timevar]),] makegrowths <- function(x, q) { new <- rep(NA, length(x)) for (t in (1+q):length(x)) { new[t] <- 100*((x[t]/x[t-q])-1) } return(new) } innertask <- function(Y, gvars, Q) { E <- labels <- NULL for (v in gvars) { for (q in Q) { E <- cbind(E, makegrowths(Y[,v], q)) } labels <- c(labels, paste(v, ".g", Q, sep="")) } colnames(E) <- labels cbind(Y, E) } do.call("rbind", by(X, X[,unitvar], innertask, gvars, Q)) } Here's a demo of using them: # A simple panel dataset A <- data.frame(year=rep(1980:1982,4), person=factor(sort(rep(1:4,3))), v1=round(rnorm(12),digits=2), v2=round(rnorm(12),digits=2)) # Demo of creating lags for both variables v1 and v2 -- paneldata.lags(A, "person", "year", c("v1","v2"), lags=1:2) # Demo of creating growth rates for v2 w.r.t. 1 & 2 years ago -- paneldata.growthrates(A, "person", "year", "v2", Q=1:2) Finally, I have a question. In a previous posting on this subject, Gabor showed me that my code: # Blast this function for all the values that A$person takes -- new <- NULL for (f in levels(A$person)) { new <- rbind(new, make.additional.variables(subset(A, A$person==f), nlags=2, Q=1)) } A <- new; rm(new) can be replaced by one do.call() (as used above). It's awesome, but I don't understand it! :-) Could someone please explain how and why this works? I know by() and I see that when I do by(A,A$x), it gives me a list containing as many entries as are levels of A$x. I couldn't think of a way to force all this into one data frame; the best I could do was to do for (f in levels (A$person)) {} as shown here. The two functions above are using do.call() as Gabor used them, and they're awesome, but I don't understand why they work! The man page ?do.call was a bit too cryptic and I couldn't comprehend it. Where can I learn this stuff? -- Ajay Shah Consultant [EMAIL PROTECTED] Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html