[R] pgmm (Blundell-Bond) sample needed)

2009-03-30 Thread Millo Giovanni
Dear Ivo, dear list,

(see: Message: 70
Date: Thu, 26 Mar 2009 21:39:19 +
From: ivo...@gmail.com
Subject: [R] pgmm (Blundell-Bond) sample needed)

I think I finally figured out how to replicate your supersimple GMM
example with pgmm() so as to get the very same results as Stata.
Having no other regressors in the formula initially drove me crazy. This was a 
case where simpler models are trickier than more
complicated ones!

For the benefit of other GMM people on this list, here's a brief résumé
of our rather long private mail exchange of these days, answering to
some other pgmm()-related posts which have appeared on this list
lately. Sorry for the overlong posting but it might be worth the space.

I will refer to the very good Stata tutorial by David Roodman that Ivo
himself pointed me to, which gives a nice
(and free) theoretical intro as well. Please (the others) find it
here: http://repec.org/nasug2006/howtodoxtabond2.cgdev.pdf. As far as
textbooks are concerned, Arellano's
panel data book (Oxford) is the theoretical reference I would
suggest. 

There have been two separate issues: 
- syntax (how to get the right model)
- small sample behaviour (minimal time dimension to get estimates)

I'll start with this last one, then provide a quick Rosetta stone of
pgmm() and Stata commands producing the same results. The established
benchmarks for dynamic panels' GMM are the DPD routines written by Arellano et
al. for Gauss and later Ox, but  Stata is proven to give the same
results, and it is the established general reference for panel
data. Lastly I will add the usual examples found in the literature,
although they are very close relatives of 'example(pgmm)', so as to
show the correspondence between the models.

1) Small samples and N-asymptotics:
GMM needs big N, small T. Else you end up having more instruments than
observations and you get a singular matrix error (which, as Ivo
correctly found out, happens in the computation of the optimal
weights' matrix). While this is
probably going to be substituted with a more descriptive error
message, it still explains you the heart of the matter. 
Yet Stata
gives you estimates in this case as well: as I suspected, it is
because it uses a generalized inverse (see Roodman's tutorial,
2.6). This looks theoretically ok. Whether this is meaningful in
applied practice is an issue I will discuss with the package
maintainer. IMHO it is not, apart maybe for illustrative purposes, and
it might well encourage bad habits (see the discussion about (not)
fitting the Grunfeld model by GMM on this list, some weeks ago).

2) fitting the simple models
Simplest possible model: AR(1) with individual effects
  x(i,t)= a*(x(i,t-1)) + bi + c

This is what Ivo asked for in the first place. As the usual example is on data 
from the Arellano and Bond paper,
available in package 'plm' as
 
 data(EmplUK)

I'll use log(emp) from this dataset as 'x', for ease of reproducibility. Same 
data are
available in Stata by 'use
http://www.stata-press.com/data/r7/abdata.dta;'. The Stata dataset is
identical but for the variable names and the fact that in Stata you
have to generate logs beforehand (ugh!). I'm also adding the
'nomata' option to avoid complications, but this will be unnecessary on most
systems (not on mine...).

The system-GMM estimator (with robust SEs) in Stata is 'xtabond2 n
nL1, gmm(L.(n)) nomata robust' whose R equivalent is:

 sysmod-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, 
 gmm.inst=~log(emp), lag.gmm=c(2,99),  
 + effect=individual, model=onestep, transformation=ld )
 summary(sysmod, robust=TRUE)

(note that although 'summary(sysmod)' does not report a constant, it's
actually there; this is an issue to be checked).

while the difference-GMM is 'xtabond2 n nL1, gmm(L.(n)) noleveleq
nomata robust', in R:

 diffmod-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, 
 gmm.inst=~log(emp), lag.gmm=c(2,99),  
+  effect=individual, model=onestep, transformation=d )
 summary(diffmod,robust=TRUE)

The particular model Ivo asked for, using only lags 2-4 as
instruments, is 'xtabond2 x lx, gmm(L.(x),lag(1 3)) robust' in Stata
and only requires to set 'lag.gmm=c(2,4)' in the 'sysmod' above
(notice the difference in the lags specification!).

Note also that, unlike Ivo, I am using robust covariances.

3) fitting the standard examples from the literature.

'example(pgmm)' is a somewhat simplified version of the standard
Arellano-Bond example. For better comparability, here I am replicating
the results from the abest.do Stata script from
http://ideas.repec.org/c/boc/bocode/s435901.html (i.e., the results of
the Arellano and Bond paper done via xtabond2). The same output is also to
be found in Roodman's tutorial, 3.3. 

Here's how to replicate the output of abest.do:
(must execute the preceding lines in the file as well for data transf.)
 
* Replicate difference GMM runs in Arellano and Bond 1991, Table 4
* Column (a1)
xtabond2 n L(0/1).(l.n w) l(0/2).(k ys) yr198?c cons, gmm

[R] pgmm (Blundell-Bond) sample needed

2009-03-27 Thread Millo Giovanni
Dear Ivo,
please find below some answers to your pgmm-related questions.

##

Was: Message: 70
Date: Thu, 26 Mar 2009 21:39:19 +
From: ivo...@gmail.com
Subject: [R] pgmm (Blundell-Bond) sample needed
To: r-help r-h...@stat.math.ethz.ch
Message-ID: 0016361e8962dfdfd704660c7...@google.com
Content-Type: text/plain

Dear R Experts---

Sorry for all the questions yesterday and today. I am trying to use Yves 
Croissant's pgmm function in the plm package with Blundell-Bond moments. I  
have read the Blundell-Bond paper, and want to run the simplest model  
first, d[i,t] = a*d[i,t-1] + fixed[i] + u[i,t] . no third conditioning 
variables yet. the full set of moment conditions recommended for  
system-GMM, which is (T-1)*(T-2)/2+(T-3), in which the u's interact with 
all possible lagged y's and delta y's.

I believe that pgmm operates by demanding that firm (i) and year (t) be  
the first two columns in the data set.

 Almost correct: this is the easiest way. Else you can supply data 
organized as you like but then you have to specify who the index is. See 
vignette(plm), § 4

library(plm)
NF=20; NT=10
d= data.frame( firm= rep(1:NF, each=NT), year= rep( 1:NT, NF),  
x=rnorm(NF*NT) );

# the following fails, because dynformula magic is required; learned this  
the hard way
# v=pgmm( x ~ lag(x), data=d, gmm.inst=~x, lag.gmm=c(2,99),  
transformation=ld )

 The reason for 'dynformula magic' is that lags in panel data are only well 
defined in conjunction with the group and time indices; therefore in 'plm' lags 
(and first differences) are best supplied through a 'dynformula' interface 
inside a model. else you get the standard time-series lag, which is incorrect 
here.

formula= dynformula( x ~ 1, list(1)); # this creates x ~ lag(x)
v=pgmm( formula, data=d, gmm.inst=~x, lag.gmm=c(2,99), transformation=ld )

Error in solve.default(suml(Vi)) :
system is computationally singular: reciprocal condition number =  
8.20734e-20

obviously, I am confused.

 You should not, as you yourself state that the full set of moment 
conditions recommended for  
system-GMM [...] is (T-1)*(T-2)/2+(T-3). If T=10 then you have the equivalent 
of 9*8/2+7 = 43 regressors (instruments). That's why N=20 is way too little. 
The original Arellano and Bond example in UKEmpl (which is actually called 
'EmplUK'!) has N=140, T=9. I already pointed this out in another r-help post, 
not many days ago (March 9th, 17:59).

 May I suggest you give a further look at Arellano's panel data book? This 
would probably clarify how the instrumments are constructed (by the way, that's 
also what I am currently reading in my spare time). See also Greene, 
Econometric analysis, § 18.5 and the Z matrix in particular. (Yves Croissant 
has put this down nicely in the package vignette as well).

 when I execute the same command on the included  
UKEmpl data set, it works. however, my inputs would seem perfectly  
reasonable. I would hope that the procedure could produce a lag(x)  
coefficient estimate of around 0, and then call it a day.

 would be nice; but your troubles aren't over yet :^)

could someone please tell me how to instruct pgmm to just estimate this  
simplest of all BB models?

 OK, you found out by yourself. Just for the benefit of other list readers, 
I reproduce the lines you sent us by private email (comments are mine):
 lagformula= dynformula(x ~ 1, list(1)) 
 # reproduces x~lag(x, 1) in standard OLS parlance
 v=pgmm(lagformula, data=d, gmm.inst=~x, lag.gmm=c(1,99), transformation=ld )
 # means the GMM-system estimator
 # where you use both levels and differences as instruments.

[My ultimate goal is to replicate what another author has run via xtabond2  
d ld, gmm(L.(d), lag(1 3)) robust in Stata; if you know the magic of  
moving this statement into pgmm syntax, I would be even more grateful.  
Right now, I am so stuck on square 1 that I do not know how to move towards  
figuring out where I ultimately need to go.]

 GMM are a tricky subject I still don't master. I'll try to figure out what 
both Stata and plm do with the instruments and let you know. 
 Anyway, the 'plm' equivalent of Stata's Robust option, which uses the 
Windmeijer correction if I'm not mistaken, is to specify a robust covariance 
via vcovHC().

 Now to your second message:

#

Was: Message: 82
Date: Thu, 26 Mar 2009 21:45:49 -0400
From: ivo welch ivo...@gmail.com
Subject: Re: [R] pgmm (blundell-bond) help needed
To: r-help r-h...@stat.math.ethz.ch
Message-ID:
50d1c22d0903261845m7d8b321fq97faab26542a...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

I have been playing with more examples, and I now know that with
larger NF's my example code actually produces a result, instead of a
singular matrix error.  interestingly, stata's xtabond2 command seems
ok with these sorts of data sets.  either R has more stringent
requirements, or stata is too casual

[R] pgmm (Blundell-Bond) sample needed

2009-03-26 Thread ivowel
Dear R Experts---

Sorry for all the questions yesterday and today. I am trying to use Yves  
Croissant's pgmm function in the plm package with Blundell-Bond moments. I  
have read the Blundell-Bond paper, and want to run the simplest model  
first, d[i,t] = a*d[i,t-1] + fixed[i] + u[i,t] . no third conditioning  
variables yet. the full set of moment conditions recommended for  
system-GMM, which is (T-1)*(T-2)/2+(T-3), in which the u's interact with  
all possible lagged y's and delta y's.

I believe that pgmm operates by demanding that firm (i) and year (t) be  
the first two columns in the data set.

library(plm)
NF=20; NT=10
d= data.frame( firm= rep(1:NF, each=NT), year= rep( 1:NT, NF),  
x=rnorm(NF*NT) );

# the following fails, because dynformula magic is required; learned this  
the hard way
# v=pgmm( x ~ lag(x), data=d, gmm.inst=~x, lag.gmm=c(2,99),  
transformation=ld )

formula= dynformula( x ~ 1, list(1)); # this creates x ~ lag(x)
v=pgmm( formula, data=d, gmm.inst=~x, lag.gmm=c(2,99), transformation=ld )

Error in solve.default(suml(Vi)) :
system is computationally singular: reciprocal condition number =  
8.20734e-20

obviously, I am confused. when I execute the same command on the included  
UKEmpl data set, it works. however, my inputs would seem perfectly  
reasonable. I would hope that the procedure could produce a lag(x)  
coefficient estimate of around 0, and then call it a day.

could someone please tell me how to instruct pgmm to just estimate this  
simplest of all BB models?


[My ultimate goal is to replicate what another author has run via xtabond2  
d ld, gmm(L.(d), lag(1 3)) robust in Stata; if you know the magic of  
moving this statement into pgmm syntax, I would be even more grateful.  
Right now, I am so stuck on square 1 that I do not know how to move towards  
figuring out where I ultimately need to go.]

regards,

/iaw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.