[R] pgmm (Blundell-Bond) sample needed)
Dear Ivo, dear list, (see: Message: 70 Date: Thu, 26 Mar 2009 21:39:19 + From: ivo...@gmail.com Subject: [R] pgmm (Blundell-Bond) sample needed) I think I finally figured out how to replicate your supersimple GMM example with pgmm() so as to get the very same results as Stata. Having no other regressors in the formula initially drove me crazy. This was a case where simpler models are trickier than more complicated ones! For the benefit of other GMM people on this list, here's a brief résumé of our rather long private mail exchange of these days, answering to some other pgmm()-related posts which have appeared on this list lately. Sorry for the overlong posting but it might be worth the space. I will refer to the very good Stata tutorial by David Roodman that Ivo himself pointed me to, which gives a nice (and free) theoretical intro as well. Please (the others) find it here: http://repec.org/nasug2006/howtodoxtabond2.cgdev.pdf. As far as textbooks are concerned, Arellano's panel data book (Oxford) is the theoretical reference I would suggest. There have been two separate issues: - syntax (how to get the right model) - small sample behaviour (minimal time dimension to get estimates) I'll start with this last one, then provide a quick Rosetta stone of pgmm() and Stata commands producing the same results. The established benchmarks for dynamic panels' GMM are the DPD routines written by Arellano et al. for Gauss and later Ox, but Stata is proven to give the same results, and it is the established general reference for panel data. Lastly I will add the usual examples found in the literature, although they are very close relatives of 'example(pgmm)', so as to show the correspondence between the models. 1) Small samples and N-asymptotics: GMM needs big N, small T. Else you end up having more instruments than observations and you get a singular matrix error (which, as Ivo correctly found out, happens in the computation of the optimal weights' matrix). While this is probably going to be substituted with a more descriptive error message, it still explains you the heart of the matter. Yet Stata gives you estimates in this case as well: as I suspected, it is because it uses a generalized inverse (see Roodman's tutorial, 2.6). This looks theoretically ok. Whether this is meaningful in applied practice is an issue I will discuss with the package maintainer. IMHO it is not, apart maybe for illustrative purposes, and it might well encourage bad habits (see the discussion about (not) fitting the Grunfeld model by GMM on this list, some weeks ago). 2) fitting the simple models Simplest possible model: AR(1) with individual effects x(i,t)= a*(x(i,t-1)) + bi + c This is what Ivo asked for in the first place. As the usual example is on data from the Arellano and Bond paper, available in package 'plm' as data(EmplUK) I'll use log(emp) from this dataset as 'x', for ease of reproducibility. Same data are available in Stata by 'use http://www.stata-press.com/data/r7/abdata.dta;'. The Stata dataset is identical but for the variable names and the fact that in Stata you have to generate logs beforehand (ugh!). I'm also adding the 'nomata' option to avoid complications, but this will be unnecessary on most systems (not on mine...). The system-GMM estimator (with robust SEs) in Stata is 'xtabond2 n nL1, gmm(L.(n)) nomata robust' whose R equivalent is: sysmod-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, gmm.inst=~log(emp), lag.gmm=c(2,99), + effect=individual, model=onestep, transformation=ld ) summary(sysmod, robust=TRUE) (note that although 'summary(sysmod)' does not report a constant, it's actually there; this is an issue to be checked). while the difference-GMM is 'xtabond2 n nL1, gmm(L.(n)) noleveleq nomata robust', in R: diffmod-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, gmm.inst=~log(emp), lag.gmm=c(2,99), + effect=individual, model=onestep, transformation=d ) summary(diffmod,robust=TRUE) The particular model Ivo asked for, using only lags 2-4 as instruments, is 'xtabond2 x lx, gmm(L.(x),lag(1 3)) robust' in Stata and only requires to set 'lag.gmm=c(2,4)' in the 'sysmod' above (notice the difference in the lags specification!). Note also that, unlike Ivo, I am using robust covariances. 3) fitting the standard examples from the literature. 'example(pgmm)' is a somewhat simplified version of the standard Arellano-Bond example. For better comparability, here I am replicating the results from the abest.do Stata script from http://ideas.repec.org/c/boc/bocode/s435901.html (i.e., the results of the Arellano and Bond paper done via xtabond2). The same output is also to be found in Roodman's tutorial, 3.3. Here's how to replicate the output of abest.do: (must execute the preceding lines in the file as well for data transf.) * Replicate difference GMM runs in Arellano and Bond 1991, Table 4 * Column (a1) xtabond2 n L(0/1).(l.n w) l(0/2).(k ys) yr198?c cons, gmm
[R] pgmm (Blundell-Bond) sample needed
Dear Ivo, please find below some answers to your pgmm-related questions. ## Was: Message: 70 Date: Thu, 26 Mar 2009 21:39:19 + From: ivo...@gmail.com Subject: [R] pgmm (Blundell-Bond) sample needed To: r-help r-h...@stat.math.ethz.ch Message-ID: 0016361e8962dfdfd704660c7...@google.com Content-Type: text/plain Dear R Experts--- Sorry for all the questions yesterday and today. I am trying to use Yves Croissant's pgmm function in the plm package with Blundell-Bond moments. I have read the Blundell-Bond paper, and want to run the simplest model first, d[i,t] = a*d[i,t-1] + fixed[i] + u[i,t] . no third conditioning variables yet. the full set of moment conditions recommended for system-GMM, which is (T-1)*(T-2)/2+(T-3), in which the u's interact with all possible lagged y's and delta y's. I believe that pgmm operates by demanding that firm (i) and year (t) be the first two columns in the data set. Almost correct: this is the easiest way. Else you can supply data organized as you like but then you have to specify who the index is. See vignette(plm), § 4 library(plm) NF=20; NT=10 d= data.frame( firm= rep(1:NF, each=NT), year= rep( 1:NT, NF), x=rnorm(NF*NT) ); # the following fails, because dynformula magic is required; learned this the hard way # v=pgmm( x ~ lag(x), data=d, gmm.inst=~x, lag.gmm=c(2,99), transformation=ld ) The reason for 'dynformula magic' is that lags in panel data are only well defined in conjunction with the group and time indices; therefore in 'plm' lags (and first differences) are best supplied through a 'dynformula' interface inside a model. else you get the standard time-series lag, which is incorrect here. formula= dynformula( x ~ 1, list(1)); # this creates x ~ lag(x) v=pgmm( formula, data=d, gmm.inst=~x, lag.gmm=c(2,99), transformation=ld ) Error in solve.default(suml(Vi)) : system is computationally singular: reciprocal condition number = 8.20734e-20 obviously, I am confused. You should not, as you yourself state that the full set of moment conditions recommended for system-GMM [...] is (T-1)*(T-2)/2+(T-3). If T=10 then you have the equivalent of 9*8/2+7 = 43 regressors (instruments). That's why N=20 is way too little. The original Arellano and Bond example in UKEmpl (which is actually called 'EmplUK'!) has N=140, T=9. I already pointed this out in another r-help post, not many days ago (March 9th, 17:59). May I suggest you give a further look at Arellano's panel data book? This would probably clarify how the instrumments are constructed (by the way, that's also what I am currently reading in my spare time). See also Greene, Econometric analysis, § 18.5 and the Z matrix in particular. (Yves Croissant has put this down nicely in the package vignette as well). when I execute the same command on the included UKEmpl data set, it works. however, my inputs would seem perfectly reasonable. I would hope that the procedure could produce a lag(x) coefficient estimate of around 0, and then call it a day. would be nice; but your troubles aren't over yet :^) could someone please tell me how to instruct pgmm to just estimate this simplest of all BB models? OK, you found out by yourself. Just for the benefit of other list readers, I reproduce the lines you sent us by private email (comments are mine): lagformula= dynformula(x ~ 1, list(1)) # reproduces x~lag(x, 1) in standard OLS parlance v=pgmm(lagformula, data=d, gmm.inst=~x, lag.gmm=c(1,99), transformation=ld ) # means the GMM-system estimator # where you use both levels and differences as instruments. [My ultimate goal is to replicate what another author has run via xtabond2 d ld, gmm(L.(d), lag(1 3)) robust in Stata; if you know the magic of moving this statement into pgmm syntax, I would be even more grateful. Right now, I am so stuck on square 1 that I do not know how to move towards figuring out where I ultimately need to go.] GMM are a tricky subject I still don't master. I'll try to figure out what both Stata and plm do with the instruments and let you know. Anyway, the 'plm' equivalent of Stata's Robust option, which uses the Windmeijer correction if I'm not mistaken, is to specify a robust covariance via vcovHC(). Now to your second message: # Was: Message: 82 Date: Thu, 26 Mar 2009 21:45:49 -0400 From: ivo welch ivo...@gmail.com Subject: Re: [R] pgmm (blundell-bond) help needed To: r-help r-h...@stat.math.ethz.ch Message-ID: 50d1c22d0903261845m7d8b321fq97faab26542a...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 I have been playing with more examples, and I now know that with larger NF's my example code actually produces a result, instead of a singular matrix error. interestingly, stata's xtabond2 command seems ok with these sorts of data sets. either R has more stringent requirements, or stata is too casual
[R] pgmm (Blundell-Bond) sample needed
Dear R Experts--- Sorry for all the questions yesterday and today. I am trying to use Yves Croissant's pgmm function in the plm package with Blundell-Bond moments. I have read the Blundell-Bond paper, and want to run the simplest model first, d[i,t] = a*d[i,t-1] + fixed[i] + u[i,t] . no third conditioning variables yet. the full set of moment conditions recommended for system-GMM, which is (T-1)*(T-2)/2+(T-3), in which the u's interact with all possible lagged y's and delta y's. I believe that pgmm operates by demanding that firm (i) and year (t) be the first two columns in the data set. library(plm) NF=20; NT=10 d= data.frame( firm= rep(1:NF, each=NT), year= rep( 1:NT, NF), x=rnorm(NF*NT) ); # the following fails, because dynformula magic is required; learned this the hard way # v=pgmm( x ~ lag(x), data=d, gmm.inst=~x, lag.gmm=c(2,99), transformation=ld ) formula= dynformula( x ~ 1, list(1)); # this creates x ~ lag(x) v=pgmm( formula, data=d, gmm.inst=~x, lag.gmm=c(2,99), transformation=ld ) Error in solve.default(suml(Vi)) : system is computationally singular: reciprocal condition number = 8.20734e-20 obviously, I am confused. when I execute the same command on the included UKEmpl data set, it works. however, my inputs would seem perfectly reasonable. I would hope that the procedure could produce a lag(x) coefficient estimate of around 0, and then call it a day. could someone please tell me how to instruct pgmm to just estimate this simplest of all BB models? [My ultimate goal is to replicate what another author has run via xtabond2 d ld, gmm(L.(d), lag(1 3)) robust in Stata; if you know the magic of moving this statement into pgmm syntax, I would be even more grateful. Right now, I am so stuck on square 1 that I do not know how to move towards figuring out where I ultimately need to go.] regards, /iaw [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.