Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-22 Thread Mike Marchywka





 From: marchy...@hotmail.com
 To: rvarad...@jhmi.edu; pda...@gmail.com; alex.ols...@gmail.com
 Date: Sat, 21 May 2011 20:40:44 -0400
 CC: r-help@r-project.org
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata












 
  From: rvarad...@jhmi.edu
  To: marchy...@hotmail.com; pda...@gmail.com; alex.ols...@gmail.com
  CC: r-help@r-project.org
  Date: Sat, 21 May 2011 17:26:29 -0400
  Subject: RE: [R] maximum likelihood convergence reproducing Anderson 
  Blundell 1982 Econometrica R vs Stata
 
  Hi,
 
  I don't think the final verdict has been spoken. Peter's posts have hinted 
  at ill-conditioning as the crux of the problem. So, I decided to try a 
  couple of more things: (1) standardizing the covariates, (2) exact 
  gradient, and (3) both (1) and (2).

 Cool, I thought everyone lost interest. I'll get back to this then.
 Before launching into this, I was curious however if the STATA
 solution ( or whatever other proudct was used) was thought
 to reprsent a good solution or the actual global optimum.

 IIRC, your earlier post along these lines,

  # exact solution as the starting value
  start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

 was what got me started.


 I guess my point was that the problem would not obviously
 have a nice surface to optimize and IIRC the title of the paper
 suggested the system was a bit difficult ( I won't pay for med papers,
 not going to pay for econ LOL). The function to be minimized, and this
 is stated as fact but only for sake of eliciting criticism,
 is the determinant of 2 unrelated residuals vectors. And, from memory,

  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3



 this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d))  with
 d being angle between residual vectors ( in space with dimension of number
 of data points). Or, E=F*Sin^2(d) and depending on data is would seem
 possible to move in such a way that F increases but just more slowly than
 Sin^2(d). Any solution for which they are colinear would seem to be optimal.

 I guess if my starting point is not too far off I may see if
 I can find some diagnostics to determine is the data set creates
 a condition as I have outlined and optimizer.

 It might, for example, be interesting to see what happens to |e1||e2|
 and Sin(d) at various points.
 ( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming x0=1 
 LOL),
 E1=Y1-T1*X , E2=Y2-T2*x
 E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X)

 I guess I keep working on the above and see if it points
 to anything pathological or that doesn't play nice with
 optimizer.


I was too lazy to continue the above on paper but empirically this seems to
be part of symptoms. For example, 

e1m=sum(e1*e1);
e2m=sum(e2*e2);
e12=sum(e1*e2);
phit=e12/sqrt(e1m*e2m);
dete=det(sigma);
xxx=paste(e1m*e2m=,e1m*e2m, aphi=,phit,  dete=,dete,sep=);
print(xxx);

on this call, 

print(doing BGFS from zero);
p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=data$y1, y2=data$y2, x
1=data$x1, x2=data$x2, x3=data$x3)


shows that the the product of e1 and e2 explodes with the error vectors become 
parallel,
( since I got an answer I like, I haven't bothered to check for typos
or coding errors LOL note in particular things like sqrt() etc )

[1] e1m*e2m=41.1669479546532 aphi=0.960776458479452 dete=3.16609220303528
[1] e1m*e2m=41.0289397617336 aphi=0.960734505863182 dete=3.15878562836847
[1] e1m*e2m=41.3051898313859 aphi=0.960818247708638 dete=3.17340730403131
[1] e1m*e2m=39.7867410499191 aphi=0.960397122114274 dete=3.08893784983331
[1] e1m*e2m=42.5708634641545 aphi=0.961141202950479 dete=3.24422282329344
[1] e1m*e2m=39.952815544519 aphi=0.960383767149008 dete=3.10285630456706
[1] e1m*e2m=42.3994556719457 aphi=0.961155360764465 dete=3.23000632577225
[1] e1m*e2m=40.6077431942131 aphi=0.960502159746792 dete=3.14448500444145
[1] e1m*e2m=41.7300849780045 aphi=0.96104579811445 dete=3.18780183350788
[1] e1m*e2m=40.8461025929321 aphi=0.960565688832973 dete=3.15795749888956
[1] e1m*e2m=41.4890962339491 aphi=0.960985035909235 dete=3.17423784875441
[1] e1m*e2m=38.0026479775113 aphi=0.958424486959574 dete=3.09427071121483
[1] e1m*e2m=44.4634366247632 aphi=0.962889636822005 dete=3.23887444893151
[1] e1m*e2m=38.3679152097842 aphi=0.958864175571152 dete=3.09166714763783
[1] e1m*e2m=44.0684332841423 aphi=0.962518076859826 dete=3.24162775623887
[1] e1m*e2m=39.8519719942113 aphi=0.960141707858905 dete=3.11355091583515
[1] e1m*e2m=42.5038484736989 aphi=0.961384528172175 dete=3.21923251471042
[1] e1m*e2m=21375239704009490432 aphi=0.77697068197 dete=953450394271031
[1] e1m*e2m=34177731168061536 aphi=0.77905000934 dete=1510297191323.93
[1] e1m*e2m=54503403593545 aphi=0.78923036089 dete=2297508328.68597
[1] e1m*e2m=85767789487.156 aphi=0.83466344755 dete

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Ravi Varadhan
Hi,

I don't think the final verdict has been spoken.  Peter's posts have hinted at 
ill-conditioning as the crux of the problem.  So, I decided to try a couple of  
more things: (1) standardizing the covariates, (2) exact gradient, and (3) both 
(1) and (2).

I compute the exact gradient using a complex-step derivative approach.  This 
works just like the standard first-order, forward differencing.  The only (but, 
essential) difference is that an imaginary increment, i*dx, is used.  This, 
incredibly, gives exact gradients (up to machine precision). 

Here are the code and the results of my experiments:

data - read.table(h:/computations/optimx_example.dat, header=TRUE, sep=,)
attach(data)
require(optimx)

## model 18
lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  det.sigma - sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1]
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det.sigma)) 
  return(-logl)
}

csd - function(fn, x, ...) {
# Complex step derivatives; yields exact derivatives
h - .Machine$double.eps 
n - length(x)
h0 - g - rep(0, n)
for (i in 1:n) {
h0[i] - h * 1i
g[i] - Im(fn(x+h0, ...))/h 
h0[i]  - 0
}
g
}

gr.csd - function(theta,y1, y2, x1, x2, x3) {
csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
}

# exact solution as the starting value
start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
p1 - optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# numerical gradient
p2 - optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# exact gradient
p2g - optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
gradient is used, we also see a major difference for L-BFGS-B
# Exact gradient did not affect the gradient methods, CG and spg, much.  
However, convergence of Rcgmin improved when exact gradient was used

x1s - scale(x1)
x2s - scale(x2)
x3s - scale(x3)

  
p3 - optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
control=list(all.methods=TRUE, maxit=1500))

# both scaling and exact gradient
p3g - optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
control=list(all.methods=TRUE, maxit=1500))

# Comparing p3 and p3g, use of exact gradient improved spg dramatically.  
However, it made CG worse! 

# Of course, derivative-free methods newuoa, and Nelder-Mead are improved by 
scaling, but not by the availability of exact gradients.  I don't know what is 
wrong with bobyqa in this example.

In short, even with scaling and exact gradients, this optimization problem is 
recalcitrant.

Best,
Ravi.


From: Mike Marchywka [marchy...@hotmail.com]
Sent: Thursday, May 12, 2011 8:30 AM
To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
Cc: r-help@r-project.org
Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
1982 Econometrica R vs Stata

So what was the final verdict on this discussion? I kind of
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the optimum solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider.

The function to be optimized appears to be, as OP stated,
some function of residuals of two ( unrelated ) fits.
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the Error.  The intent would appear to be to
keep the residuals distributed similarly in the two ( unrelated) fits.
 I guess my question is,
 did anyone determine that there is a unique solution? or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.











 From: rvarad...@jhmi.edu
 To: pda...@gmail.com; alex.ols...@gmail.com
 Date: Sat, 7 May 2011 11:51:56 -0400
 CC: r-help@r-project.org
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Mike Marchywka












 From: rvarad...@jhmi.edu
 To: marchy...@hotmail.com; pda...@gmail.com; alex.ols...@gmail.com
 CC: r-help@r-project.org
 Date: Sat, 21 May 2011 17:26:29 -0400
 Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 Hi,

 I don't think the final verdict has been spoken. Peter's posts have hinted at 
 ill-conditioning as the crux of the problem. So, I decided to try a couple of 
 more things: (1) standardizing the covariates, (2) exact gradient, and (3) 
 both (1) and (2).

Cool, I thought everyone lost interest. I'll get back to this then.
Before launching into this, I was curious however if the STATA
solution ( or whatever other proudct was used) was thought
to reprsent a good solution or the actual global optimum. 

IIRC, your earlier post along these lines, 

 # exact solution as the starting value
 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

was what got me started. 


I guess my point was that the problem would not obviously
have a nice surface to optimize and IIRC the title of the paper
suggested the system was a bit difficult ( I won't pay for med papers,
not going to pay for econ LOL). The function to be minimized, and this
is stated as fact but only for sake of eliciting criticism, 
is the determinant of 2 unrelated residuals vectors. And, from memory,

 e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3



this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d))  with
d being angle between residual vectors ( in space with dimension of number
of data points). Or, E=F*Sin^2(d) and depending on data is would seem
possible to move in such a way that F increases but just more slowly than
Sin^2(d). Any solution for which they are colinear would seem to be optimal.

I guess if my starting point is not too far off I may see if
I can find some diagnostics to determine is the data set creates
a condition as I have outlined and optimizer.

It might, for example, be interesting to see what happens to |e1||e2|
and Sin(d) at various points.
( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming x0=1 LOL), 
E1=Y1-T1*X , E2=Y2-T2*x
E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X)

I guess I keep working on the above and see if it points
to anything pathological or that doesn't play nice with
optimizer. 





 I compute the exact gradient using a complex-step derivative approach. This 
 works just like the standard first-order, forward differencing. The only 
 (but, essential) difference is that an imaginary increment, i*dx, is used. 
 This, incredibly, gives exact gradients (up to machine precision).

 Here are the code and the results of my experiments:

 data - read.table(h:/computations/optimx_example.dat, header=TRUE, sep=,)
 attach(data)
 require(optimx)

 ## model 18
 lnl - function(theta,y1, y2, x1, x2, x3) {
 n - length(y1)
 beta - theta[1:8]
 e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
 e - cbind(e1, e2)
 sigma - t(e)%*%e
 det.sigma - sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1]
 logl - -1*n/2*(2*(1+log(2*pi)) + log(det.sigma))
 return(-logl)
 }

 csd - function(fn, x, ...) {
 # Complex step derivatives; yields exact derivatives
 h - .Machine$double.eps
 n - length(x)
 h0 - g - rep(0, n)
 for (i in 1:n) {
 h0[i] - h * 1i
 g[i] - Im(fn(x+h0, ...))/h
 h0[i] - 0
 }
 g
 }

 gr.csd - function(theta,y1, y2, x1, x2, x3) {
 csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
 }

 # exact solution as the starting value
 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
 p1 - optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
 control=list(all.methods=TRUE, maxit=1500))

 # numerical gradient
 p2 - optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
 control=list(all.methods=TRUE, maxit=1500))

 # exact gradient
 p2g - optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
 control=list(all.methods=TRUE, maxit=1500))

 # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
 gradient is used, we also see a major difference for L-BFGS-B
 # Exact gradient did not affect the gradient methods, CG and spg, much. 
 However, convergence of Rcgmin improved when exact gradient was used

 x1s - scale(x1)
 x2s - scale(x2)
 x3s - scale(x3)

 p3 - optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
 control=list(all.methods=TRUE, maxit=1500))

 # both scaling and exact gradient
 p3g - optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
 control=list(all.methods=TRUE, maxit=1500))

 # Comparing p3 and p3g, use of exact gradient improved spg 
 dramatically[[elided Hotmail spam]]

 # Of course, derivative-free methods newuoa, and Nelder-Mead are improved by 
 scaling, but not by the availability of exact gradients. I don't know what is 
 wrong with bobyqa in this example

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-12 Thread Mike Marchywka




So what was the final verdict on this discussion? I kind of 
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the optimum solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider. 

The function to be optimized appears to be, as OP stated, 
some function of residuals of two ( unrelated ) fits. 
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the Error.  The intent would appear to be to 
keep the residuals distributed similarly in the two ( unrelated) fits. 
 I guess my question is,
 did anyone determine that there is a unique solution? or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.











 From: rvarad...@jhmi.edu
 To: pda...@gmail.com; alex.ols...@gmail.com
 Date: Sat, 7 May 2011 11:51:56 -0400
 CC: r-help@r-project.org
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 There is something strange in this problem. I think the log-likelihood is 
 incorrect. See the results below from optimx. You can get much larger 
 log-likelihood values than for the exact solution that Peter provided.

 ## model 18
 lnl - function(theta,y1, y2, x1, x2, x3) {
 n - length(y1)
 beta - theta[1:8]
 e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
 e - cbind(e1, e2)
 sigma - t(e)%*%e
 logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
 something wrong here
 return(-logl)
 }

 data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,)

 attach(data)

 require(optimx)

 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

 # the warnings can be safely ignored in the optimx calls
 p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

 p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

 p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

 Ravi.
 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
 Of peter dalgaard [pda...@gmail.com]
 Sent: Saturday, May 07, 2011 4:46 AM
 To: Alex Olssen
 Cc: r-help@r-project.org
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 On May 6, 2011, at 14:29 , Alex Olssen wrote:

  Dear R-help,
 
  I am trying to reproduce some results presented in a paper by Anderson
  and Blundell in 1982 in Econometrica using R.
  The estimation I want to reproduce concerns maximum likelihood
  estimation of a singular equation system.
  I can estimate the static model successfully in Stata but for the
  dynamic models I have difficulty getting convergence.
  My R program which uses the same likelihood function as in Stata has
  convergence properties even for the static case.
 
  I have copied my R program and the data below. I realise the code
  could be made more elegant - but it is short enough.
 
  Any ideas would be highly appreciated.

 Better starting values would help. In this case, almost too good values are 
 available:

 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

 which appears to be the _exact_ solution.

 Apart from that, it seems that the conjugate gradient methods have 
 difficulties with this likelihood, for some less than obvious reason. 
 Increasing the maxit gets you closer but still not satisfactory.

 I would suggest trying out the experimental optimx package. Apparently, some 
 of the algorithms in there are much better at handling this likelihood, 
 notably nlm and nlminb.




 
  ## model 18
  lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
  return(-logl)
  }
  p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2,
  x1=x1, x2=x2

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-10 Thread Berend Hasselman

Alex Olssen wrote:
 
 Wow that is really interesting,
 
 Sorry I was asleep when you emailed these.
 
 And yes, of course, I had been trying to implement model 18, not 18s,
 that was a typo, sorry.
 
 I will have a look at the code you posted.
 
 Thanks,
 
 Alex
 

I have run nlm with a zero starting vector on model 18.


 start.zero - rep(0,8)
 
 q.zero - nlm(p=start.zero, f=lnl, hessian=TRUE, gradtol=1e-15,
+  y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
 
 q.zero
$minimum
[1] -76.73878

$estimate
[1] -0.68124025  0.28686058 -0.15394903 -0.06173779  1.30304998 -0.25446946
[7]  0.14531856  0.05349810

$gradient
[1]  0.0001356761  0.0014407493  0.0013240913  0.0006367333 -0.0005172628
[6] -0.0051911383 -0.0044429086 -0.0019081432

$hessian
  [,1] [,2] [,3]  [,4]  [,5] [,6] [,7]
[1,]  197993.1  1982843  1746582  807261.9  135628.5  1355944  1194854
[2,] 1982842.8 19766730 17430209 8084282.7 1357883.8 13506353 11915748
[3,] 1746581.8 17430209 15376070 7138194.2 1196134.9 11911149 10512490
[4,]  807261.9  8084283  7138194 3323204.7  552932.9  5526861  4882154
[5,]  135628.5  1357884  1196135  552932.9  271571.7  2713953  2391658
[6,] 1355943.5 13506353 11911149 5526860.9 2713953.0 26956003 23785984
[7,] 1194854.2 11915748 10512490 4882154.2 2391658.1 23785984 20996480
[8,]  552860.7  5534816  4887292 2275676.3 1106862.5 11057031  9768028
   [,8]
[1,]   552860.7
[2,]  5534816.0
[3,]  4887292.4
[4,]  2275676.3
[5,]  1106862.5
[6,] 11057030.7
[7,]  9768027.6
[8,]  4554570.6

$code
[1] 2

$iterations
[1] 68

 
 q.zero$estimate
[1] -0.68124025  0.28686058 -0.15394903 -0.06173779  1.30304998 -0.25446946
[7]  0.14531856  0.05349810

Quite different from what you get if one starts with AB's model 18 results
of Table II.
I have also used Gretl's FIML procedure to estimate the model (I don't know
what starting values Gretl uses).
The results are very similar to those obtained with nlm as above.

Berend

--
View this message in context: 
http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-tp3502516p3512807.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread peter dalgaard

On May 9, 2011, at 06:07 , Alex Olssen wrote:

 Thank you all for your input.
 
 Unfortunately my problem is not yet resolved.  Before I respond to
 individual comments I make a clarification:
 
 In Stata, using the same likelihood function as above, I can reproduce
 EXACTLY (to 3 decimal places or more, which is exactly considering I
 am using different software) the results from model 8 of the paper.
 
 I take this as an indication that I am using the same likelihood
 function as the authors, and that it does indeed work.
 The reason I am trying to estimate the model in R is because while
 Stata reproduces model 8 perfectly it has convergence
 difficulties for some of the other models.
 
 Peter Dalgaard,
 
 Better starting values would help. In this case, almost too good
 values are available:
 
 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
 
 which appears to be the _exact_ solution.
 
 Thanks for the suggestion.  Using these starting values produces the
 exact estimate that Dave Fournier emailed me.
 If these are the exact solution then why did the author publish
 different answers which are completely reproducible in
 Stata and Tsp?


Ahem! You might get us interested in your problem, but not to the level that we 
are going to install Stata and Tsp and actually dig out and study the 
scientific paper you are talking about. Please cite the results and explain the 
differences.

Are we maximizing over the same parameter space? You say that the estimates 
from the paper gives a log-likelihood of 54.04, but the exact solution clocked 
in at 76.74, which in my book is rather larger.

Confused   

-p


 
 Ravi,
 
 Thanks for introducing optimx to me, I am new to R.  I completely
 agree that you can get higher log-likelihood values
 than what those obtained with optim and the starting values suggested
 by Peter.  In fact, in Stata, when I reproduce
 the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.
 
 Furthermore if I estimate model 8 without symmetry imposed on the
 system I reproduce the Likelihood Ratio reported
 in the paper to 3 decimal places as well, suggesting that the
 log-likelihoods I am reporting differ from those in the paper
 only due to a constant.
 
 Thanks for your comments,
 
 I am still highly interested in knowing why the results of the
 optimisation in R are so different to those in Stata?
 
 I might try making my convergence requirements more stringent.
 
 Kind regards,
 
 Alex

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Alex Olssen
Peter said

Ahem! You might get us interested in your problem, but not to the
level that we are going to install Stata and Tsp and actually dig out
and study the scientific paper you are talking about. Please cite the
results and explain the differences.

Apologies Peter, will do,

The results which I can emulate in Stata but not (yet) in R are reported below.
They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569

TABLE II - model 18s

 coef std err
p10 -0.19 0.078
p11 0.2200.019
p12 -0.148   0.021
p13 -0.072
p20 0.8930.072
p21 -0.148
p22 0.0500.035
p23 0.098

The results which I produced in Stata are reported below.
I spent the last hour rewriting the code to reproduce this - since I
am now at home and not at work :(
My results are identical to those published.  The estimates are for
a 3 equation symmetrical singular system.
I have not bothered to report symmetrical results and have backed out
an extra estimate using adding up constraints.
I have also backed out all standard errors using the delta method.

. ereturn display
--
|  Coef.   Std. Err.  zP|z| [95% Conf. Interval]
-+
a|
 a1 |  -.0188115   .0767759-0.25   0.806-.1692895.1316664
 a2 |   .8926598   .070406812.68   0.000 .75466511.030655
 a3 |   .1261517   .0590193 2.14   0.033  .010476.2418275
-+
g|
g11 |   .2199442   .018407511.95   0.000  .183866.2560223
g12 |  -.1476856   .0211982-6.97   0.000-.1892334   -.1061378
g13 |  -.0722586   .0145154-4.98   0.000-.1007082   -.0438089
g22 |   .0496865   .0348052 1.43   0.153-.0185305.1179034
g23 |   .0979991   .0174397 5.62   0.000 .0638179.1321803
g33 |  -.0257405   .0113869-2.26   0.024-.0480584   -.0034226
--

In R I cannot get results like this - I think it is probably to do
with my inability at using the optimisers well.
Any pointers would be appreciated.

Peter said Are we maximizing over the same parameter space? You say
that the estimates from the paper gives a log-likelihood of 54.04, but
the exact solution clocked in at 76.74, which in my book is rather
larger.

I meant +54.04  -76.74.  It is quite common to get positive
log-likelihoods in these system estimation.

Kind regards,

Alex

On 9 May 2011 19:04, peter dalgaard pda...@gmail.com wrote:

 On May 9, 2011, at 06:07 , Alex Olssen wrote:

 Thank you all for your input.

 Unfortunately my problem is not yet resolved.  Before I respond to
 individual comments I make a clarification:

 In Stata, using the same likelihood function as above, I can reproduce
 EXACTLY (to 3 decimal places or more, which is exactly considering I
 am using different software) the results from model 8 of the paper.

 I take this as an indication that I am using the same likelihood
 function as the authors, and that it does indeed work.
 The reason I am trying to estimate the model in R is because while
 Stata reproduces model 8 perfectly it has convergence
 difficulties for some of the other models.

 Peter Dalgaard,

 Better starting values would help. In this case, almost too good
 values are available:

 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

 which appears to be the _exact_ solution.

 Thanks for the suggestion.  Using these starting values produces the
 exact estimate that Dave Fournier emailed me.
 If these are the exact solution then why did the author publish
 different answers which are completely reproducible in
 Stata and Tsp?


 Ahem! You might get us interested in your problem, but not to the level that 
 we are going to install Stata and Tsp and actually dig out and study the 
 scientific paper you are talking about. Please cite the results and explain 
 the differences.

 Are we maximizing over the same parameter space? You say that the estimates 
 from the paper gives a log-likelihood of 54.04, but the exact solution 
 clocked in at 76.74, which in my book is rather larger.

 Confused

 -p



 Ravi,

 Thanks for introducing optimx to me, I am new to R.  I completely
 agree that you can get higher log-likelihood values
 than what those obtained with optim and the starting values suggested
 by Peter.  In fact, in Stata, when I reproduce
 the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.

 Furthermore if I estimate model 8 without symmetry imposed on the
 system I reproduce the Likelihood Ratio reported
 in the paper to 3 decimal places as well, suggesting that the
 log-likelihoods I am reporting differ from those in 

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Alex Olssen
Hi Mike,

Mike said
is this it, page 1559?

That is the front page yes, page 15*6*9 has the table, of which the
model labelled 18s is the one I replicated.

did you actually cut/paste code anywhere and is your first
coefficient -.19 or -.019?
Presumably typos would be one possible problem.

-0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
estimation - as far as I can tell cutting and pasting is not the
problem.

generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,

Unfortunately I do not know of any place to get the full version of
this paper that doesn't require access to a database such as JSTOR.
The fact that this likelihood function reproduces the published
results in Stata makes me confident that it is correct - I also have
read a lot on systems estimation and it is a pretty standard
likelihood function.

I guess one question would be  what is beta in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the
optimizers).

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?

The code above is certainly not as elegant as it could be - in this
case the beta is rather superfluous.  It is just a vector to hold
parameters - but since I called all the parameters theta anyway there
is no need for it.  e1 and e2 are the residuals from the first and
second equations of the system.  Sigma is a 2x2 matrix which is the
outer product of the two vectors of residuals.

Kind regards,

Alex



On 9 May 2011 23:12, Mike Marchywka marchy...@hotmail.com wrote:






 
 Date: Mon, 9 May 2011 22:06:38 +1200
 From: alex.ols...@gmail.com
 To: pda...@gmail.com
 CC: r-help@r-project.org; da...@otter-rsch.com
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
 Blundell 1982 Econometrica R vs Stata

 Peter said

 Ahem! You might get us interested in your problem, but not to the
 level that we are going to install Stata and Tsp and actually dig out
 and study the scientific paper you are talking about. Please cite the
 results and explain the differences.

 Apologies Peter, will do,

 The results which I can emulate in Stata but not (yet) in R are reported 
 below.

 did you actually cut/paste code anywhere and is your first coefficient -.19 
 or -.019?
 Presumably typos would be one possible problem.

 They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569


 is this it, page 1559?

 http://www.jstor.org/pss/1913396

 generally it helps if we could at least see the equations to check your
 code against typos ( note page number ?) in lnl that may fix part of the
 mystery.  Is full text available
 on author's site, doesn't come up on citeseer AFAICT,


 http://citeseerx.ist.psu.edu/search?q=blundell+1982sort=ascdate

 I guess one question would be  what is beta in lnl supposed to be -
 it isn't used anywhere but I will also mentioned I'm not that familiar
 with the R code ( I'm trying to work through this to learn R and the 
 optimizers).

 maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
 e1 and e2 supposed to be?





 TABLE II - model 18s

 coef std err
 p10 -0.19 0.078
 p11 0.220 0.019
 p12 -0.148 0.021
 p13 -0.072
 p20 0.893 0.072
 p21 -0.148
 p22 0.050 0.035
 p23 0.098

 The results which I produced in Stata are reported below.
 I spent the last hour rewriting the code to reproduce this - since I
 am now at home and not at work :(
 My results are identical to those published. The estimates are for
 a 3 equation symmetrical singular system.
 I have not bothered to report symmetrical results and have backed out
 an extra estimate using adding up constraints.
 I have also backed out all standard errors using the delta method.

 . ereturn display
 --
 | Coef. Std. Err. z P|z| [95% Conf. Interval]
 -+
 a |
 a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
 a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
 a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
 -+
 g |
 g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
 g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
 g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
 g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
 g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
 g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
 --

 In R I cannot get results like this - I think it is probably to do
 with my

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Mike Marchywka







 Date: Mon, 9 May 2011 22:06:38 +1200
 From: alex.ols...@gmail.com
 To: pda...@gmail.com
 CC: r-help@r-project.org; da...@otter-rsch.com
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 Peter said

 Ahem! You might get us interested in your problem, but not to the
 level that we are going to install Stata and Tsp and actually dig out
 and study the scientific paper you are talking about. Please cite the
 results and explain the differences.

 Apologies Peter, will do,

 The results which I can emulate in Stata but not (yet) in R are reported 
 below.

did you actually cut/paste code anywhere and is your first coefficient -.19 or 
-.019?
Presumably typos would be one possible problem. 

 They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569


is this it, page 1559?

http://www.jstor.org/pss/1913396

generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,


http://citeseerx.ist.psu.edu/search?q=blundell+1982sort=ascdate

I guess one question would be  what is beta in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the 
optimizers). 

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?





 TABLE II - model 18s

 coef std err
 p10 -0.19 0.078
 p11 0.220 0.019
 p12 -0.148 0.021
 p13 -0.072
 p20 0.893 0.072
 p21 -0.148
 p22 0.050 0.035
 p23 0.098

 The results which I produced in Stata are reported below.
 I spent the last hour rewriting the code to reproduce this - since I
 am now at home and not at work :(
 My results are identical to those published. The estimates are for
 a 3 equation symmetrical singular system.
 I have not bothered to report symmetrical results and have backed out
 an extra estimate using adding up constraints.
 I have also backed out all standard errors using the delta method.

 . ereturn display
 --
 | Coef. Std. Err. z P|z| [95% Conf. Interval]
 -+
 a |
 a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
 a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
 a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
 -+
 g |
 g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
 g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
 g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
 g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
 g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
 g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
 --

 In R I cannot get results like this - I think it is probably to do
 with my inability at using the optimisers well.
 Any pointers would be appreciated.

 Peter said Are we maximizing over the same parameter space? You say
 that the estimates from the paper gives a log-likelihood of 54.04, but
 the exact solution clocked in at 76.74, which in my book is rather
 larger.

 I meant +54.04  -76.74. It is quite common to get positive
 log-likelihoods in these system estimation.

 Kind regards,

 Alex

 On 9 May 2011 19:04, peter dalgaard  wrote:
 
  On May 9, 2011, at 06:07 , Alex Olssen wrote:
 
  Thank you all for your input.
 
  Unfortunately my problem is not yet resolved.  Before I respond to
  individual comments I make a clarification:
 
  In Stata, using the same likelihood function as above, I can reproduce
  EXACTLY (to 3 decimal places or more, which is exactly considering I
  am using different software) the results from model 8 of the paper.
 
  I take this as an indication that I am using the same likelihood
  function as the authors, and that it does indeed work.
  The reason I am trying to estimate the model in R is because while
  Stata reproduces model 8 perfectly it has convergence
  difficulties for some of the other models.
 
  Peter Dalgaard,
 
  Better starting values would help. In this case, almost too good
  values are available:
 
  start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
 
  which appears to be the _exact_ solution.
 
  Thanks for the suggestion.  Using these starting values produces the
  exact estimate that Dave Fournier emailed me.
  If these are the exact solution then why did the author publish
  different answers which are completely reproducible in
  Stata and Tsp?
 
 
  Ahem! You might get us interested in your problem, but not to the level 
  that we are going to install Stata and Tsp and actually dig out and study 
  the scientific

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread peter dalgaard

On May 9, 2011, at 13:40 , Alex Olssen wrote:

 Hi Mike,
 
 Mike said
 is this it, page 1559?
 
 That is the front page yes, page 15*6*9 has the table, of which the
 model labelled 18s is the one I replicated.
 

However, the R code you posted will at best replicate model 18. For 18s, you 
need to impose symmetry conditions.

The fit of model 18 does seem to give similar results to those in Table II, but 
not exactly. 

Even more puzzling, if you start from the published coefficients, nlm seems to 
be getting stuck at a non-concave point of the negative likelihood.


 nlm(p=ab, f=lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, gradtol=1e-15)
$minimum
[1] -76.60681

$estimate
[1] -0.64701504  0.27284992 -0.13713084 -0.07205753  1.28200342 -0.24596766  
0.13502787  0.06001181

$gradient
[1]   2.782425  12.648629  -1.830244 -29.296795  -2.577734  -4.954042   
1.495517  11.016190

$hessian
  [,1] [,2] [,3]  [,4]  [,5] [,6] [,7] [,8]
[1,]  197435.9  1977197  1741724  805132.7  135821.9  1358013  1196714   553746
[2,] 1977197.4 19712076 17383474 8064540.2 1359856.0 13527720 11935305  5544903
[3,] 1741724.0 17383474 15336008 7121047.7 1197868.5 11929667 10529439  4896008
[4,]  805132.7  8064540  7121048 3315377.8  553733.1  5534923  4889523  2279404
[5,]  135821.9  1359856  1197868  553733.1  271511.7  2713733  2391332  1106572
[6,] 1358012.7 13527720 11929667 5534923.0 2713732.9 26955636 23784039 11054174
[7,] 1196714.3 11935305 10529439 4889522.6 2391331.9 23784039 20993413  9765022
[8,]  553746.0  5544903  4896008 2279404.1 1106571.9 11054174  9765022  4552693

$code
[1] 2

$iterations
[1] 4


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread peter dalgaard
Hmm, I tried replacing the x's in the model with their principal component 
scores, and suddenly everything converges as a greased lightning:

 Z - princomp(cbind(x1,x2,x3))$scores
 Z - as.data.frame(princomp(cbind(x1,x2,x3))$scores)
 names(Z)- letters[1:3]
 optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=Z$a, x2=Z$b, x3=Z$c, method=BFGS)

par
1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, 
0.18045177, -0.23657327
fvalues method fns grs itns conv KKT1 KKT2 xtimes
1 -76.73878   BFGS 157  37 NULL0 TRUE TRUE  0.055
Warning messages:
1: In max(logpar) : no non-missing arguments to max; returning -Inf
2: In min(logpar) : no non-missing arguments to min; returning Inf

The likelihood appears to be spot on, but, obviously, the parameter estimates 
need to be back-transformed to the original x1,x2,x3 system. I'll leave that as 
the proverbial exercise for the reader...

However, the whole thing leads me to a suspicion that maybe our numerical 
gradients are misbehaving in cases with high local collinearity?

-pd

On May 9, 2011, at 13:40 , Alex Olssen wrote:

 Hi Mike,
 
 Mike said
 is this it, page 1559?
 
 That is the front page yes, page 15*6*9 has the table, of which the
 model labelled 18s is the one I replicated.
 
 did you actually cut/paste code anywhere and is your first
 coefficient -.19 or -.019?
 Presumably typos would be one possible problem.
 
 -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
 estimation - as far as I can tell cutting and pasting is not the
 problem.
 
 generally it helps if we could at least see the equations to check your
 code against typos ( note page number ?) in lnl that may fix part of the
 mystery.  Is full text available
 on author's site, doesn't come up on citeseer AFAICT,
 
 Unfortunately I do not know of any place to get the full version of
 this paper that doesn't require access to a database such as JSTOR.
 The fact that this likelihood function reproduces the published
 results in Stata makes me confident that it is correct - I also have
 read a lot on systems estimation and it is a pretty standard
 likelihood function.
 
 I guess one question would be  what is beta in lnl supposed to be -
 it isn't used anywhere but I will also mentioned I'm not that familiar
 with the R code ( I'm trying to work through this to learn R and the
 optimizers).
 
 maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
 e1 and e2 supposed to be?
 
 The code above is certainly not as elegant as it could be - in this
 case the beta is rather superfluous.  It is just a vector to hold
 parameters - but since I called all the parameters theta anyway there
 is no need for it.  e1 and e2 are the residuals from the first and
 second equations of the system.  Sigma is a 2x2 matrix which is the
 outer product of the two vectors of residuals.
 
 Kind regards,
 
 Alex
 
 
 
 On 9 May 2011 23:12, Mike Marchywka marchy...@hotmail.com wrote:
 
 
 
 
 
 
 
 Date: Mon, 9 May 2011 22:06:38 +1200
 From: alex.ols...@gmail.com
 To: pda...@gmail.com
 CC: r-help@r-project.org; da...@otter-rsch.com
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
 Blundell 1982 Econometrica R vs Stata
 
 Peter said
 
 Ahem! You might get us interested in your problem, but not to the
 level that we are going to install Stata and Tsp and actually dig out
 and study the scientific paper you are talking about. Please cite the
 results and explain the differences.
 
 Apologies Peter, will do,
 
 The results which I can emulate in Stata but not (yet) in R are reported 
 below.
 
 did you actually cut/paste code anywhere and is your first coefficient -.19 
 or -.019?
 Presumably typos would be one possible problem.
 
 They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569
 
 
 is this it, page 1559?
 
 http://www.jstor.org/pss/1913396
 
 generally it helps if we could at least see the equations to check your
 code against typos ( note page number ?) in lnl that may fix part of the
 mystery.  Is full text available
 on author's site, doesn't come up on citeseer AFAICT,
 
 
 http://citeseerx.ist.psu.edu/search?q=blundell+1982sort=ascdate
 
 I guess one question would be  what is beta in lnl supposed to be -
 it isn't used anywhere but I will also mentioned I'm not that familiar
 with the R code ( I'm trying to work through this to learn R and the 
 optimizers).
 
 maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
 e1 and e2 supposed to be?
 
 
 
 
 
 TABLE II - model 18s
 
 coef std err
 p10 -0.19 0.078
 p11 0.220 0.019
 p12 -0.148 0.021
 p13 -0.072
 p20 0.893 0.072
 p21 -0.148
 p22 0.050 0.035
 p23 0.098
 
 The results which I produced in Stata are reported below.
 I spent the last hour rewriting the code to reproduce this - since I
 am now at home

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Alex Olssen
Wow that is really interesting,

Sorry I was asleep when you emailed these.

And yes, of course, I had been trying to implement model 18, not 18s,
that was a typo, sorry.

I will have a look at the code you posted.

Thanks,

Alex

On 10 May 2011 02:18, peter dalgaard pda...@gmail.com wrote:
 Hmm, I tried replacing the x's in the model with their principal component 
 scores, and suddenly everything converges as a greased lightning:

 Z - princomp(cbind(x1,x2,x3))$scores
 Z - as.data.frame(princomp(cbind(x1,x2,x3))$scores)
 names(Z)- letters[1:3]
 optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=Z$a, x2=Z$b, x3=Z$c, method=BFGS)
                                                                               
                  par
 1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, 
 0.18045177, -0.23657327
    fvalues method fns grs itns conv KKT1 KKT2 xtimes
 1 -76.73878   BFGS 157  37 NULL    0 TRUE TRUE  0.055
 Warning messages:
 1: In max(logpar) : no non-missing arguments to max; returning -Inf
 2: In min(logpar) : no non-missing arguments to min; returning Inf

 The likelihood appears to be spot on, but, obviously, the parameter estimates 
 need to be back-transformed to the original x1,x2,x3 system. I'll leave that 
 as the proverbial exercise for the reader...

 However, the whole thing leads me to a suspicion that maybe our numerical 
 gradients are misbehaving in cases with high local collinearity?

 -pd

 On May 9, 2011, at 13:40 , Alex Olssen wrote:

 Hi Mike,

 Mike said
 is this it, page 1559?

 That is the front page yes, page 15*6*9 has the table, of which the
 model labelled 18s is the one I replicated.

 did you actually cut/paste code anywhere and is your first
 coefficient -.19 or -.019?
 Presumably typos would be one possible problem.

 -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
 estimation - as far as I can tell cutting and pasting is not the
 problem.

 generally it helps if we could at least see the equations to check your
 code against typos ( note page number ?) in lnl that may fix part of the
 mystery.  Is full text available
 on author's site, doesn't come up on citeseer AFAICT,

 Unfortunately I do not know of any place to get the full version of
 this paper that doesn't require access to a database such as JSTOR.
 The fact that this likelihood function reproduces the published
 results in Stata makes me confident that it is correct - I also have
 read a lot on systems estimation and it is a pretty standard
 likelihood function.

 I guess one question would be  what is beta in lnl supposed to be -
 it isn't used anywhere but I will also mentioned I'm not that familiar
 with the R code ( I'm trying to work through this to learn R and the
 optimizers).

 maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
 e1 and e2 supposed to be?

 The code above is certainly not as elegant as it could be - in this
 case the beta is rather superfluous.  It is just a vector to hold
 parameters - but since I called all the parameters theta anyway there
 is no need for it.  e1 and e2 are the residuals from the first and
 second equations of the system.  Sigma is a 2x2 matrix which is the
 outer product of the two vectors of residuals.

 Kind regards,

 Alex



 On 9 May 2011 23:12, Mike Marchywka marchy...@hotmail.com wrote:






 
 Date: Mon, 9 May 2011 22:06:38 +1200
 From: alex.ols...@gmail.com
 To: pda...@gmail.com
 CC: r-help@r-project.org; da...@otter-rsch.com
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
 Blundell 1982 Econometrica R vs Stata

 Peter said

 Ahem! You might get us interested in your problem, but not to the
 level that we are going to install Stata and Tsp and actually dig out
 and study the scientific paper you are talking about. Please cite the
 results and explain the differences.

 Apologies Peter, will do,

 The results which I can emulate in Stata but not (yet) in R are reported 
 below.

 did you actually cut/paste code anywhere and is your first coefficient -.19 
 or -.019?
 Presumably typos would be one possible problem.

 They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569


 is this it, page 1559?

 http://www.jstor.org/pss/1913396

 generally it helps if we could at least see the equations to check your
 code against typos ( note page number ?) in lnl that may fix part of the
 mystery.  Is full text available
 on author's site, doesn't come up on citeseer AFAICT,


 http://citeseerx.ist.psu.edu/search?q=blundell+1982sort=ascdate

 I guess one question would be  what is beta in lnl supposed to be -
 it isn't used anywhere but I will also mentioned I'm not that familiar
 with the R code ( I'm trying to work through this to learn R and the 
 optimizers).

 maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
 e1 and e2 supposed to be?





 TABLE II - model 18s

 coef std err
 p10 -0.19

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-08 Thread Alex Olssen
Thank you all for your input.

Unfortunately my problem is not yet resolved.  Before I respond to
individual comments I make a clarification:

In Stata, using the same likelihood function as above, I can reproduce
EXACTLY (to 3 decimal places or more, which is exactly considering I
am using different software) the results from model 8 of the paper.

I take this as an indication that I am using the same likelihood
function as the authors, and that it does indeed work.
The reason I am trying to estimate the model in R is because while
Stata reproduces model 8 perfectly it has convergence
difficulties for some of the other models.

Peter Dalgaard,

Better starting values would help. In this case, almost too good
values are available:

start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution.

Thanks for the suggestion.  Using these starting values produces the
exact estimate that Dave Fournier emailed me.
If these are the exact solution then why did the author publish
different answers which are completely reproducible in
Stata and Tsp?

Ravi,

Thanks for introducing optimx to me, I am new to R.  I completely
agree that you can get higher log-likelihood values
than what those obtained with optim and the starting values suggested
by Peter.  In fact, in Stata, when I reproduce
the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.

Furthermore if I estimate model 8 without symmetry imposed on the
system I reproduce the Likelihood Ratio reported
in the paper to 3 decimal places as well, suggesting that the
log-likelihoods I am reporting differ from those in the paper
only due to a constant.

Thanks for your comments,

I am still highly interested in knowing why the results of the
optimisation in R are so different to those in Stata?

I might try making my convergence requirements more stringent.

Kind regards,

Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-07 Thread peter dalgaard

On May 6, 2011, at 14:29 , Alex Olssen wrote:

 Dear R-help,
 
 I am trying to reproduce some results presented in a paper by Anderson
 and Blundell in 1982 in Econometrica using R.
 The estimation I want to reproduce concerns maximum likelihood
 estimation of a singular equation system.
 I can estimate the static model successfully in Stata but for the
 dynamic models I have difficulty getting convergence.
 My R program which uses the same likelihood function as in Stata has
 convergence properties even for the static case.
 
 I have copied my R program and the data below.  I realise the code
 could be made more elegant - but it is short enough.
 
 Any ideas would be highly appreciated.

Better starting values would help. In this case, almost too good values are 
available:

start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution. 

Apart from that, it seems that the conjugate gradient methods have difficulties 
with this likelihood, for some less than obvious reason. Increasing the maxit 
gets you closer but still not satisfactory. 

I would suggest trying out the experimental optimx package. Apparently, some of 
the algorithms in there are much better at handling this likelihood, notably 
nlm and nlminb.




 
 ## model 18
 lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
  return(-logl)
 }
 p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2,
 x1=x1, x2=x2, x3=x3)
 
 year,y1,y2,x1,x2,x3
 1929,0.554779,0.266051,9.87415,8.60371,3.75673
 1930,0.516336,0.297473,9.68621,8.50492,3.80692
 1931,0.508201,0.324199,9.4701,8.27596,3.80437
 1932,0.500482,0.33958,9.24692,7.99221,3.76251
 1933,0.501695,0.276974,9.35356,7.98968,3.69071
 1934,0.591426,0.287008,9.42084,8.0362,3.63564
 1935,0.565047,0.244096,9.53972,8.15803,3.59285
 1936,0.605954,0.239187,9.6914,8.32009,3.56678
 1937,0.620161,0.218232,9.76817,8.42001,3.57381
 1938,0.592091,0.243161,9.51295,8.19771,3.6024
 1939,0.613115,0.217042,9.68047,8.30987,3.58147
 1940,0.632455,0.215269,9.78417,8.49624,3.57744
 1941,0.663139,0.184409,10.0606,8.69868,3.6095
 1942,0.698179,0.164348,10.2892,8.84523,3.4
 1943,0.70459,0.146865,10.4731,8.93024,3.65388
 1944,0.694067,0.161722,10.4465,8.96044,3.62434
 1945,0.674668,0.197231,10.279,8.82522,3.61489
 1946,0.635916,0.204232,10.1536,8.77547,3.67562
 1947,0.642855,0.187224,10.2053,8.77481,3.82632
 1948,0.641063,0.186566,10.2227,8.83821,3.96038
 1949,0.646317,0.203646,10.1127,8.82364,4.0447
 1950,0.645476,0.187497,10.2067,8.84161,4.08128
 1951,0.63803,0.197361,10.2773,8.9401,4.10951
 1952,0.634626,0.209992,10.283,9.01603,4.1693
 1953,0.631144,0.219287,10.3217,9.06317,4.21727
 1954,0.593088,0.235335,10.2101,9.05664,4.2567
 1955,0.60736,0.227035,10.272,9.07566,4.29193
 1956,0.607204,0.246631,10.2743,9.12407,4.32252
 1957,0.586994,0.256784,10.2396,9.1588,4.37792
 1958,0.548281,0.271022,10.1248,9.14025,4.42641
 1959,0.553401,0.261815,10.2012,9.1598,4.4346
 1960,0.552105,0.275137,10.1846,9.19297,4.43173
 1961,0.544133,0.280783,10.1479,9.19533,4.44407
 1962,0.55382,0.281286,10.197,9.21544,4.45074
 1963,0.549951,0.28303,10.2036,9.22841,4.46403
 1964,0.547204,0.291287,10.2271,9.23954,4.48447
 1965,0.55511,0.281313,10.2882,9.26531,4.52057
 1966,0.558182,0.280151,10.353,9.31675,4.58156
 1967,0.545735,0.294385,10.3351,9.35382,4.65983
 1968,0.538964,0.294593,10.3525,9.38361,4.71804
 1969,0.542764,0.299927,10.3676,9.40725,4.76329
 1970,0.534595,0.315319,10.2968,9.39139,4.81136
 1971,0.545591,0.315828,10.2592,9.34121,4.84082
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-07 Thread Ravi Varadhan
There is something strange in this problem.  I think the log-likelihood is 
incorrect.  See the results below from optimx.  You can get much larger 
log-likelihood values than for the exact solution that Peter provided.

## model 18
lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))  # it looks like there is 
something wrong here
  return(-logl)
}

data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,)

attach(data)

require(optimx)

start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

# the warnings can be safely ignored in the optimx calls
p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

Ravi.

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
peter dalgaard [pda...@gmail.com]
Sent: Saturday, May 07, 2011 4:46 AM
To: Alex Olssen
Cc: r-help@r-project.org
Subject: Re: [R] maximum likelihood convergence reproducing Anderson
Blundell 1982 Econometrica R vs Stata

On May 6, 2011, at 14:29 , Alex Olssen wrote:

 Dear R-help,

 I am trying to reproduce some results presented in a paper by Anderson
 and Blundell in 1982 in Econometrica using R.
 The estimation I want to reproduce concerns maximum likelihood
 estimation of a singular equation system.
 I can estimate the static model successfully in Stata but for the
 dynamic models I have difficulty getting convergence.
 My R program which uses the same likelihood function as in Stata has
 convergence properties even for the static case.

 I have copied my R program and the data below.  I realise the code
 could be made more elegant - but it is short enough.

 Any ideas would be highly appreciated.

Better starting values would help. In this case, almost too good values are 
available:

start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution.

Apart from that, it seems that the conjugate gradient methods have difficulties 
with this likelihood, for some less than obvious reason. Increasing the maxit 
gets you closer but still not satisfactory.

I would suggest trying out the experimental optimx package. Apparently, some of 
the algorithms in there are much better at handling this likelihood, notably 
nlm and nlminb.





 ## model 18
 lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
  return(-logl)
 }
 p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2,
 x1=x1, x2=x2, x3=x3)

 year,y1,y2,x1,x2,x3
 1929,0.554779,0.266051,9.87415,8.60371,3.75673
 1930,0.516336,0.297473,9.68621,8.50492,3.80692
 1931,0.508201,0.324199,9.4701,8.27596,3.80437
 1932,0.500482,0.33958,9.24692,7.99221,3.76251
 1933,0.501695,0.276974,9.35356,7.98968,3.69071
 1934,0.591426,0.287008,9.42084,8.0362,3.63564
 1935,0.565047,0.244096,9.53972,8.15803,3.59285
 1936,0.605954,0.239187,9.6914,8.32009,3.56678
 1937,0.620161,0.218232,9.76817,8.42001,3.57381
 1938,0.592091,0.243161,9.51295,8.19771,3.6024
 1939,0.613115,0.217042,9.68047,8.30987,3.58147
 1940,0.632455,0.215269,9.78417,8.49624,3.57744
 1941,0.663139,0.184409,10.0606,8.69868,3.6095
 1942,0.698179,0.164348,10.2892,8.84523,3.4
 1943,0.70459,0.146865,10.4731,8.93024,3.65388
 1944,0.694067,0.161722,10.4465,8.96044,3.62434
 1945,0.674668,0.197231,10.279,8.82522,3.61489
 1946,0.635916,0.204232,10.1536,8.77547,3.67562
 1947,0.642855,0.187224,10.2053,8.77481,3.82632
 1948,0.641063,0.186566,10.2227,8.83821,3.96038
 1949,0.646317,0.203646,10.1127,8.82364,4.0447
 1950,0.645476,0.187497,10.2067,8.84161,4.08128
 1951,0.63803,0.197361,10.2773,8.9401,4.10951
 1952,0.634626,0.209992,10.283,9.01603,4.1693
 1953,0.631144,0.219287,10.3217,9.06317,4.21727
 1954,0.593088,0.235335,10.2101,9.05664,4.2567
 1955,0.60736,0.227035,10.272,9.07566,4.29193
 1956,0.607204,0.246631,10.2743,9.12407,4.32252
 1957,0.586994,0.256784,10.2396,9.1588,4.37792
 1958,0.548281,0.271022,10.1248,9.14025,4.42641
 1959,0.553401,0.261815,10.2012,9.1598,4.4346
 1960,0.552105,0.275137,10.1846,9.19297,4.43173
 1961,0.544133,0.280783,10.1479,9.19533,4.44407
 1962,0.55382,0.281286,10.197,9.21544,4.45074
 1963,0.549951,0.28303,10.2036,9.22841,4.46403
 1964,0.547204,0.291287,10.2271,9.23954,4.48447
 1965,0.55511,0.281313,10.2882,9.26531,4.52057

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-07 Thread peter dalgaard

On May 7, 2011, at 17:51 , Ravi Varadhan wrote:

 There is something strange in this problem.  I think the log-likelihood is 
 incorrect.  See the results below from optimx.  You can get much larger 
 log-likelihood values than for the exact solution that Peter provided.
 
 ## model 18
 lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))  # it looks like there is 
 something wrong here
  return(-logl)
 }
 
 data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,)
 
 attach(data)
 
 require(optimx)
 
 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
 
 # the warnings can be safely ignored in the optimx calls
 p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
 
 p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
 
 p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
 

Hm? I get stuff like below (for p3). Some of the entries have a considerably 
larger NEGATIVE log likelihood, but that's hardly a problem with the likelihood 
per se.  

 fvalues  method   fns  grs itns conv  KKT1  KKT2 xtimes
12 8.988466e+307  newuoaNA   NA   NA NANA  0.002
11 8.988466e+307  bobyqaNA   NA   NA NANA  0.001
3   23.66768 Nelder-Mead  1501   NA NULL1 FALSE FALSE   0.18
7  -51.76068 spg  1925   NA 15011 FALSE FALSE  2.322
4  -55.78708L-BFGS-B  2093 2093 NULL0 FALSE FALSE  4.176
2  -70.57023  CG  5360 1501 NULL1 FALSE  TRUE  3.465
1  -70.66286BFGS 21481 1500 NULL1 FALSE  TRUE  5.383
8  -76.73765  ucminf  1500 1500 NULL0 FALSE  TRUE  0.067
9  -76.73871  Rcgmin  2434  867 NULL0 FALSE  TRUE  1.514
10 -76.73877  Rvmmin   231   45 NULL0 FALSE  TRUE  0.101
6  -76.73878  nlminb   130  581   670  TRUE  TRUE  0.085
5  -76.73878 nlmNA   NA   460  TRUE  TRUE  0.058

I must admit that I didn't check the likelihood in detail, but minimizing the 
determinant of the residual SSD matrix is generally what you end up doing in 
this sort of models. (Of course, you can safely lose the constants, and you can 
also think up more stable ways of computing the log-determinant, but it's only 
2x2 for cryin' out loud.)


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-06 Thread Alex Olssen
Dear R-help,

I am trying to reproduce some results presented in a paper by Anderson
and Blundell in 1982 in Econometrica using R.
The estimation I want to reproduce concerns maximum likelihood
estimation of a singular equation system.
I can estimate the static model successfully in Stata but for the
dynamic models I have difficulty getting convergence.
My R program which uses the same likelihood function as in Stata has
convergence properties even for the static case.

I have copied my R program and the data below.  I realise the code
could be made more elegant - but it is short enough.

Any ideas would be highly appreciated.

## model 18
lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
  return(-logl)
}
p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2,
x1=x1, x2=x2, x3=x3)

year,y1,y2,x1,x2,x3
1929,0.554779,0.266051,9.87415,8.60371,3.75673
1930,0.516336,0.297473,9.68621,8.50492,3.80692
1931,0.508201,0.324199,9.4701,8.27596,3.80437
1932,0.500482,0.33958,9.24692,7.99221,3.76251
1933,0.501695,0.276974,9.35356,7.98968,3.69071
1934,0.591426,0.287008,9.42084,8.0362,3.63564
1935,0.565047,0.244096,9.53972,8.15803,3.59285
1936,0.605954,0.239187,9.6914,8.32009,3.56678
1937,0.620161,0.218232,9.76817,8.42001,3.57381
1938,0.592091,0.243161,9.51295,8.19771,3.6024
1939,0.613115,0.217042,9.68047,8.30987,3.58147
1940,0.632455,0.215269,9.78417,8.49624,3.57744
1941,0.663139,0.184409,10.0606,8.69868,3.6095
1942,0.698179,0.164348,10.2892,8.84523,3.4
1943,0.70459,0.146865,10.4731,8.93024,3.65388
1944,0.694067,0.161722,10.4465,8.96044,3.62434
1945,0.674668,0.197231,10.279,8.82522,3.61489
1946,0.635916,0.204232,10.1536,8.77547,3.67562
1947,0.642855,0.187224,10.2053,8.77481,3.82632
1948,0.641063,0.186566,10.2227,8.83821,3.96038
1949,0.646317,0.203646,10.1127,8.82364,4.0447
1950,0.645476,0.187497,10.2067,8.84161,4.08128
1951,0.63803,0.197361,10.2773,8.9401,4.10951
1952,0.634626,0.209992,10.283,9.01603,4.1693
1953,0.631144,0.219287,10.3217,9.06317,4.21727
1954,0.593088,0.235335,10.2101,9.05664,4.2567
1955,0.60736,0.227035,10.272,9.07566,4.29193
1956,0.607204,0.246631,10.2743,9.12407,4.32252
1957,0.586994,0.256784,10.2396,9.1588,4.37792
1958,0.548281,0.271022,10.1248,9.14025,4.42641
1959,0.553401,0.261815,10.2012,9.1598,4.4346
1960,0.552105,0.275137,10.1846,9.19297,4.43173
1961,0.544133,0.280783,10.1479,9.19533,4.44407
1962,0.55382,0.281286,10.197,9.21544,4.45074
1963,0.549951,0.28303,10.2036,9.22841,4.46403
1964,0.547204,0.291287,10.2271,9.23954,4.48447
1965,0.55511,0.281313,10.2882,9.26531,4.52057
1966,0.558182,0.280151,10.353,9.31675,4.58156
1967,0.545735,0.294385,10.3351,9.35382,4.65983
1968,0.538964,0.294593,10.3525,9.38361,4.71804
1969,0.542764,0.299927,10.3676,9.40725,4.76329
1970,0.534595,0.315319,10.2968,9.39139,4.81136
1971,0.545591,0.315828,10.2592,9.34121,4.84082

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.