Hi R-helpers,
My real data is a panel (unbalanced and with gaps in years) of thousands
of firms, by year and industry, and with financial information (variables
X, Y, Z, for example), the number of firms by year and industry is not
always equal, the number of years by industry is not always equal.
#reproducible example
firm1<-sort(rep(1:10,5),decreasing=F)
year1<-rep(2000:2004,10)
industry1<-rep(20,50)
X<-rnorm(50)
Y<-rnorm(50)
Z<-rnorm(50)
data1<-data.frame(firm1,year1,industry1,X,Y,Z)
data1
colnames(data1)<-c("firm","year","industry","X","Y","Z")
firm2<-sort(rep(11:15,3),decreasing=F)
year2<-rep(2001:2003,5)
industry2<-rep(30,15)
X<-rnorm(15)
Y<-rnorm(15)
Z<-rnorm(15)
data2<-data.frame(firm2,year2,industry2,X,Y,Z)
data2
colnames(data2)<-c("firm","year","industry","X","Y","Z")
firm3<-sort(rep(16:20,4),decreasing=F)
year3<-rep(2001:2004,5)
industry3<-rep(40,20)
X<-rnorm(20)
Y<-rnorm(20)
Z<-rnorm(20)
data3<-data.frame(firm3,year3,industry3,X,Y,Z)
data3
colnames(data3)<-c("firm","year","industry","X","Y","Z")
final1<-rbind(data1,data2)
final2<-rbind(final1,data3)
final2
final3<-final2[order(final2$industry,final2$year),]
final3
I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year,
to obtain the estimates of b0, b1 and b2 by industry and year (for example
I need to have de b0 for industry 20 and year 2000, for industry 20 and
year 2001...). Then I need to calculate the fitted values and the residuals
by firm so I need to keep b0, b1 and b2 in a way that I could do something
like
newdata1<-transform(final3,Y'=b0+b1.X+b2.Z)
newdata2<-transform(newdata1,residual=Y-Y')
or another way to keep Y' and the residuals in a dataframe with the
columns firm and year.
Until now I have been doing this in very hard way and because I need to do
it several times, I need your help to get an easier way.
Thank you,
CecĂlia Carmo
Universidade de Aveiro
Portugal