Hi, I am trying to calculate growth rate (say, sales, though it is to be computed for many variables) in a panel data set. Problem is that I have missing data for many firms for many years. To put it simply, I have created this short dataframe (original df id much bigger)
df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7), fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3)) # this gives me co_code1 fyear1 sales1 1 1100 1990 1000 2 1100 1991 1100 3 1100 1992 1200 4 1100 1993 1300 5 1100 1994 1400 6 1100 1995 1500 7 1100 1996 1600 8 1200 1990 1000 9 1200 1991 1100 10 1200 1992 1200 11 1200 1993 1300 12 1200 1994 1400 13 1200 1995 1500 14 1200 1996 1600 15 1300 1990 1000 16 1300 1991 1100 17 1300 1992 1200 18 1300 1993 1300 19 1300 1994 1400 20 1300 1995 1500 21 1300 1996 1600 # I am now removing a couple of rows df1<-df1[-c(5, 8), ] # the result is co_code1 fyear1 sales1 1 1100 1990 1000 2 1100 1991 1100 3 1100 1992 1200 4 1100 1993 1300 6 1100 1995 1500 7 1100 1996 1600 9 1200 1991 1100 10 1200 1992 1200 11 1200 1993 1300 12 1200 1994 1400 13 1200 1995 1500 14 1200 1996 1600 15 1300 1990 1000 16 1300 1991 1100 17 1300 1992 1200 18 1300 1993 1300 19 1300 1994 1400 20 1300 1995 1500 21 1300 1996 1600 # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been removed. If I try, d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100) # this apparently gives wrong results for the year 1995 (as shown below) as growth rates are computed considering yearly increment. co_code1 fyear1 sales1 growth 1 1100 1990 1000 NA 2 1100 1991 1100 10.000000 3 1100 1992 1200 9.090909 4 1100 1993 1300 8.333333 5 1100 1995 1500 15.384615 6 1100 1996 1600 6.666667 7 1200 1991 1100 NA 8 1200 1992 1200 9.090909 9 1200 1993 1300 8.333333 10 1200 1994 1400 7.692308 11 1200 1995 1500 7.142857 12 1200 1996 1600 6.666667 13 1300 1990 1000 NA 14 1300 1991 1100 10.000000 15 1300 1992 1200 9.090909 16 1300 1993 1300 8.333333 17 1300 1994 1400 7.692308 18 1300 1995 1500 7.142857 19 1300 1996 1600 6.666667 # I thought of using the formula only when the increment of fyear1 is only 1 while in a co_code1, by using this formula d<-ddply(df1, "co_code1", transform, if(diff(fyear1)==1){ growth=(exp(diff(log(df1$sales1)))-1)*100 } else{ growth=NA }) But, this doesn't work. I am getting the following error. In if (diff(fyear1) == 1) { : the condition has length > 1 and only the first element will be used (repeated a few times). # I have searched for a solution, but somehow couldn't get one. Hope that some kind soul will guide me here. Regards, Brijesh K Mishra Indian Institute of Management, Indore India ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.