Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-22 Thread Mike Marchywka





> From: marchy...@hotmail.com
> To: rvarad...@jhmi.edu; pda...@gmail.com; alex.ols...@gmail.com
> Date: Sat, 21 May 2011 20:40:44 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
>
>
>
>
>
>
>
>
>
>
>
> 
> > From: rvarad...@jhmi.edu
> > To: marchy...@hotmail.com; pda...@gmail.com; alex.ols...@gmail.com
> > CC: r-help@r-project.org
> > Date: Sat, 21 May 2011 17:26:29 -0400
> > Subject: RE: [R] maximum likelihood convergence reproducing Anderson 
> > Blundell 1982 Econometrica R vs Stata
> >
> > Hi,
> >
> > I don't think the final verdict has been spoken. Peter's posts have hinted 
> > at ill-conditioning as the crux of the problem. So, I decided to try a 
> > couple of more things: (1) standardizing the covariates, (2) exact 
> > gradient, and (3) both (1) and (2).
>
> Cool, I thought everyone lost interest. I'll get back to this then.
> Before launching into this, I was curious however if the STATA
> solution ( or whatever other proudct was used) was thought
> to reprsent a good solution or the actual global optimum.
>
> IIRC, your earlier post along these lines,
>
> > # exact solution as the starting value
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> was what got me started.
>
>
> I guess my point was that the problem would not obviously
> have a nice surface to optimize and IIRC the title of the paper
> suggested the system was a bit difficult ( I won't pay for med papers,
> not going to pay for econ LOL). The function to be minimized, and this
> is stated as fact but only for sake of eliciting criticism,
> is the determinant of 2 unrelated residuals vectors. And, from memory,
>
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
>
>
>
> this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d))  with
> d being angle between residual vectors ( in space with dimension of number
> of data points). Or, E=F*Sin^2(d) and depending on data is would seem
> possible to move in such a way that F increases but just more slowly than
> Sin^2(d). Any solution for which they are colinear would seem to be optimal.
>
> I guess if my starting point is not too far off I may see if
> I can find some diagnostics to determine is the data set creates
> a condition as I have outlined and optimizer.
>
> It might, for example, be interesting to see what happens to |e1||e2|
> and Sin(d) at various points.
> ( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming "x0"=1 
> LOL),
> E1=Y1-T1*X , E2=Y2-T2*x
> E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X)
>
> I guess I keep working on the above and see if it points
> to anything pathological or that doesn't play nice with
> optimizer.


I was too lazy to continue the above on paper but empirically this seems to
be part of symptoms. For example, 

e1m=sum(e1*e1);
e2m=sum(e2*e2);
e12=sum(e1*e2);
phit=e12/sqrt(e1m*e2m);
dete=det(sigma);
xxx=paste("e1m*e2m=",e1m*e2m," aphi=",phit, " dete=",dete,sep="");
print(xxx);

on this call, 

print("doing BGFS from zero");
p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=data$y1, y2=data$y2, x
1=data$x1, x2=data$x2, x3=data$x3)


shows that the the product of e1 and e2 explodes with the error vectors become 
parallel,
( since I got an answer I like, I haven't bothered to check for typos
or coding errors LOL note in particular things like sqrt() etc )

[1] "e1m*e2m=41.1669479546532 aphi=0.960776458479452 dete=3.16609220303528"
[1] "e1m*e2m=41.0289397617336 aphi=0.960734505863182 dete=3.15878562836847"
[1] "e1m*e2m=41.3051898313859 aphi=0.960818247708638 dete=3.17340730403131"
[1] "e1m*e2m=39.7867410499191 aphi=0.960397122114274 dete=3.08893784983331"
[1] "e1m*e2m=42.5708634641545 aphi=0.961141202950479 dete=3.24422282329344"
[1] "e1m*e2m=39.952815544519 aphi=0.960383767149008 dete=3.10285630456706"
[1] "e1m*e2m=42.3994556719457 aphi=0.961155360764465 dete=3.23000632577225"
[1] "e1m*e2m=40.6077431942131 aphi=0.960502159746792 dete=3.14448500444145"
[1] "e1m*e2m=41.7300849780045 aphi=0.96104579811445 dete=3.18780183350788"
[1] "e1m*e2m=40.8461025929321 aphi=0.960565688832973 dete=3.15795749888956"
[1] "e1m*e2m=41.4890962339491 aphi=0.960985035909235 dete=3.17423784875441"
[1] "e1m*e2m=38.0026479775113 aphi=0.958424486959574 dete=3.09427071121483"

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Mike Marchywka












> From: rvarad...@jhmi.edu
> To: marchy...@hotmail.com; pda...@gmail.com; alex.ols...@gmail.com
> CC: r-help@r-project.org
> Date: Sat, 21 May 2011 17:26:29 -0400
> Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> Hi,
>
> I don't think the final verdict has been spoken. Peter's posts have hinted at 
> ill-conditioning as the crux of the problem. So, I decided to try a couple of 
> more things: (1) standardizing the covariates, (2) exact gradient, and (3) 
> both (1) and (2).

Cool, I thought everyone lost interest. I'll get back to this then.
Before launching into this, I was curious however if the STATA
solution ( or whatever other proudct was used) was thought
to reprsent a good solution or the actual global optimum. 

IIRC, your earlier post along these lines, 

> # exact solution as the starting value
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

was what got me started. 


I guess my point was that the problem would not obviously
have a nice surface to optimize and IIRC the title of the paper
suggested the system was a bit difficult ( I won't pay for med papers,
not going to pay for econ LOL). The function to be minimized, and this
is stated as fact but only for sake of eliciting criticism, 
is the determinant of 2 unrelated residuals vectors. And, from memory,

> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3



this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d))  with
d being angle between residual vectors ( in space with dimension of number
of data points). Or, E=F*Sin^2(d) and depending on data is would seem
possible to move in such a way that F increases but just more slowly than
Sin^2(d). Any solution for which they are colinear would seem to be optimal.

I guess if my starting point is not too far off I may see if
I can find some diagnostics to determine is the data set creates
a condition as I have outlined and optimizer.

It might, for example, be interesting to see what happens to |e1||e2|
and Sin(d) at various points.
( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming "x0"=1 LOL), 
E1=Y1-T1*X , E2=Y2-T2*x
E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X)

I guess I keep working on the above and see if it points
to anything pathological or that doesn't play nice with
optimizer. 




>
> I compute the "exact" gradient using a complex-step derivative approach. This 
> works just like the standard first-order, forward differencing. The only 
> (but, essential) difference is that an imaginary increment, i*dx, is used. 
> This, incredibly, gives exact gradients (up to machine precision).
>
> Here are the code and the results of my experiments:
>
> data <- read.table("h:/computations/optimx_example.dat", header=TRUE, sep=",")
> attach(data)
> require(optimx)
>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
> n <- length(y1)
> beta <- theta[1:8]
> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> e <- cbind(e1, e2)
> sigma <- t(e)%*%e
> det.sigma <- sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1]
> logl <- -1*n/2*(2*(1+log(2*pi)) + log(det.sigma))
> return(-logl)
> }
>
> csd <- function(fn, x, ...) {
> # Complex step derivatives; yields exact derivatives
> h <- .Machine$double.eps
> n <- length(x)
> h0 <- g <- rep(0, n)
> for (i in 1:n) {
> h0[i] <- h * 1i
> g[i] <- Im(fn(x+h0, ...))/h
> h0[i] <- 0
> }
> g
> }
>
> gr.csd <- function(theta,y1, y2, x1, x2, x3) {
> csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
> }
>
> # exact solution as the starting value
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # numerical gradient
> p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # exact gradient
> p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
> gradient is used, we also see a major difference for L-BFGS-B
> # Exact gradient did not affect the gradient methods, CG and spg, much. 
> However, convergence of Rcgmin improved when exact gradient was used
>
> x1s <- scale(x1)
> x2s <- scale(x2)
> x3s <- scale(x3)
>
> p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Ravi Varadhan
Hi,

I don't think the final verdict has been spoken.  Peter's posts have hinted at 
ill-conditioning as the crux of the problem.  So, I decided to try a couple of  
more things: (1) standardizing the covariates, (2) exact gradient, and (3) both 
(1) and (2).

I compute the "exact" gradient using a complex-step derivative approach.  This 
works just like the standard first-order, forward differencing.  The only (but, 
essential) difference is that an imaginary increment, i*dx, is used.  This, 
incredibly, gives exact gradients (up to machine precision). 

Here are the code and the results of my experiments:

data <- read.table("h:/computations/optimx_example.dat", header=TRUE, sep=",")
attach(data)
require(optimx)

## model 18
lnl <- function(theta,y1, y2, x1, x2, x3) {
  n <- length(y1)
  beta <- theta[1:8]
  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e <- cbind(e1, e2)
  sigma <- t(e)%*%e
  det.sigma <- sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1]
  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det.sigma)) 
  return(-logl)
}

csd <- function(fn, x, ...) {
# Complex step derivatives; yields exact derivatives
h <- .Machine$double.eps 
n <- length(x)
h0 <- g <- rep(0, n)
for (i in 1:n) {
h0[i] <- h * 1i
g[i] <- Im(fn(x+h0, ...))/h 
h0[i]  <- 0
}
g
}

gr.csd <- function(theta,y1, y2, x1, x2, x3) {
csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
}

# exact solution as the starting value
start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# numerical gradient
p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# exact gradient
p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
gradient is used, we also see a major difference for L-BFGS-B
# Exact gradient did not affect the gradient methods, CG and spg, much.  
However, convergence of Rcgmin improved when exact gradient was used

x1s <- scale(x1)
x2s <- scale(x2)
x3s <- scale(x3)

  
p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
control=list(all.methods=TRUE, maxit=1500))

# both scaling and exact gradient
p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
control=list(all.methods=TRUE, maxit=1500))

# Comparing p3 and p3g, use of exact gradient improved spg dramatically.  
However, it made CG worse! 

# Of course, derivative-free methods newuoa, and Nelder-Mead are improved by 
scaling, but not by the availability of exact gradients.  I don't know what is 
wrong with bobyqa in this example.

In short, even with scaling and exact gradients, this optimization problem is 
recalcitrant.

Best,
Ravi.


From: Mike Marchywka [marchy...@hotmail.com]
Sent: Thursday, May 12, 2011 8:30 AM
To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
Cc: r-help@r-project.org
Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
1982 Econometrica R vs Stata

So what was the final verdict on this discussion? I kind of
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the "optimum" solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider.

The function to be optimized appears to be, as OP stated,
some function of residuals of two ( unrelated ) fits.
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the "Error."  The intent would appear to be to
keep the residuals distributed similarly in the two ( unrelated) fits.
 I guess my question is,
" did anyone determine that there is a unique solution?" or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.










----
&

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-12 Thread Mike Marchywka




So what was the final verdict on this discussion? I kind of 
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the "optimum" solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider. 

The function to be optimized appears to be, as OP stated, 
some function of residuals of two ( unrelated ) fits. 
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the "Error."  The intent would appear to be to 
keep the residuals distributed similarly in the two ( unrelated) fits. 
 I guess my question is,
" did anyone determine that there is a unique solution?" or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.











> From: rvarad...@jhmi.edu
> To: pda...@gmail.com; alex.ols...@gmail.com
> Date: Sat, 7 May 2011 11:51:56 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> There is something strange in this problem. I think the log-likelihood is 
> incorrect. See the results below from "optimx". You can get much larger 
> log-likelihood values than for the exact solution that Peter provided.
>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
> n <- length(y1)
> beta <- theta[1:8]
> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> e <- cbind(e1, e2)
> sigma <- t(e)%*%e
> logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
> something wrong here
> return(-logl)
> }
>
> data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
>
> attach(data)
>
> require(optimx)
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> # the warnings can be safely ignored in the "optimx" calls
> p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> Ravi.
> ____________
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> Of peter dalgaard [pda...@gmail.com]
> Sent: Saturday, May 07, 2011 4:46 AM
> To: Alex Olssen
> Cc: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> On May 6, 2011, at 14:29 , Alex Olssen wrote:
>
> > Dear R-help,
> >
> > I am trying to reproduce some results presented in a paper by Anderson
> > and Blundell in 1982 in Econometrica using R.
> > The estimation I want to reproduce concerns maximum likelihood
> > estimation of a singular equation system.
> > I can estimate the static model successfully in Stata but for the
> > dynamic models I have difficulty getting convergence.
> > My R program which uses the same likelihood function as in Stata has
> > convergence properties even for the static case.
> >
> > I have copied my R program and the data below. I realise the code
> > could be made more elegant - but it is short enough.
> >
> > Any ideas would be highly appreciated.
>
> Better starting values would help. In this case, almost too good values are 
> available:
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> which appears to be the _exact_ solution.
>
> Apart from that, it seems that the conjugate gradient methods have 
> difficulties with this likelihood, for some less than obvious reason. 
> Increasing the maxit gets you closer but still not satisfactory.
>
> I would suggest trying out the experimental optimx package. Appar

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-10 Thread Berend Hasselman

Alex Olssen wrote:
> 
> Wow that is really interesting,
> 
> Sorry I was asleep when you emailed these.
> 
> And yes, of course, I had been trying to implement model 18, not 18s,
> that was a typo, sorry.
> 
> I will have a look at the code you posted.
> 
> Thanks,
> 
> Alex
> 

I have run nlm with a zero starting vector on model 18.


> start.zero <- rep(0,8)
> 
> q.zero <- nlm(p=start.zero, f=lnl, hessian=TRUE, gradtol=1e-15,
+  y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
> 
> q.zero
$minimum
[1] -76.73878

$estimate
[1] -0.68124025  0.28686058 -0.15394903 -0.06173779  1.30304998 -0.25446946
[7]  0.14531856  0.05349810

$gradient
[1]  0.0001356761  0.0014407493  0.0013240913  0.0006367333 -0.0005172628
[6] -0.0051911383 -0.0044429086 -0.0019081432

$hessian
  [,1] [,2] [,3]  [,4]  [,5] [,6] [,7]
[1,]  197993.1  1982843  1746582  807261.9  135628.5  1355944  1194854
[2,] 1982842.8 19766730 17430209 8084282.7 1357883.8 13506353 11915748
[3,] 1746581.8 17430209 15376070 7138194.2 1196134.9 11911149 10512490
[4,]  807261.9  8084283  7138194 3323204.7  552932.9  5526861  4882154
[5,]  135628.5  1357884  1196135  552932.9  271571.7  2713953  2391658
[6,] 1355943.5 13506353 11911149 5526860.9 2713953.0 26956003 23785984
[7,] 1194854.2 11915748 10512490 4882154.2 2391658.1 23785984 20996480
[8,]  552860.7  5534816  4887292 2275676.3 1106862.5 11057031  9768028
   [,8]
[1,]   552860.7
[2,]  5534816.0
[3,]  4887292.4
[4,]  2275676.3
[5,]  1106862.5
[6,] 11057030.7
[7,]  9768027.6
[8,]  4554570.6

$code
[1] 2

$iterations
[1] 68

> 
> q.zero$estimate
[1] -0.68124025  0.28686058 -0.15394903 -0.06173779  1.30304998 -0.25446946
[7]  0.14531856  0.05349810

Quite different from what you get if one starts with AB's model 18 results
of Table II.
I have also used Gretl's FIML procedure to estimate the model (I don't know
what starting values Gretl uses).
The results are very similar to those obtained with nlm as above.

Berend

--
View this message in context: 
http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-tp3502516p3512807.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Alex Olssen
Wow that is really interesting,

Sorry I was asleep when you emailed these.

And yes, of course, I had been trying to implement model 18, not 18s,
that was a typo, sorry.

I will have a look at the code you posted.

Thanks,

Alex

On 10 May 2011 02:18, peter dalgaard  wrote:
> Hmm, I tried replacing the x's in the model with their principal component 
> scores, and suddenly everything converges as a greased lightning:
>
>> Z <- princomp(cbind(x1,x2,x3))$scores
>> Z <- as.data.frame(princomp(cbind(x1,x2,x3))$scores)
>> names(Z)<- letters[1:3]
>> optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=Z$a, x2=Z$b, x3=Z$c, method="BFGS")
>                                                                               
>                  par
> 1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, 
> 0.18045177, -0.23657327
>    fvalues method fns grs itns conv KKT1 KKT2 xtimes
> 1 -76.73878   BFGS 157  37 NULL    0 TRUE TRUE  0.055
> Warning messages:
> 1: In max(logpar) : no non-missing arguments to max; returning -Inf
> 2: In min(logpar) : no non-missing arguments to min; returning Inf
>
> The likelihood appears to be spot on, but, obviously, the parameter estimates 
> need to be back-transformed to the original x1,x2,x3 system. I'll leave that 
> as the proverbial "exercise for the reader"...
>
> However, the whole thing leads me to a suspicion that maybe our numerical 
> gradients are misbehaving in cases with high local collinearity?
>
> -pd
>
> On May 9, 2011, at 13:40 , Alex Olssen wrote:
>
>> Hi Mike,
>>
>> Mike said
>> "is this it, page 1559?"
>>
>> That is the front page yes, page 15*6*9 has the table, of which the
>> model labelled 18s is the one I replicated.
>>
>> "did you actually cut/paste code anywhere and is your first
>> coefficient -.19 or -.019?
>> Presumably typos would be one possible problem."
>>
>> -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
>> estimation - as far as I can tell cutting and pasting is not the
>> problem.
>>
>> "generally it helps if we could at least see the equations to check your
>> code against typos ( note page number ?) in lnl that may fix part of the
>> mystery.  Is full text available
>> on author's site, doesn't come up on citeseer AFAICT,"
>>
>> Unfortunately I do not know of any place to get the full version of
>> this paper that doesn't require access to a database such as JSTOR.
>> The fact that this likelihood function reproduces the published
>> results in Stata makes me confident that it is correct - I also have
>> read a lot on systems estimation and it is a pretty standard
>> likelihood function.
>>
>> "I guess one question would be " what is beta" in lnl supposed to be -
>> it isn't used anywhere but I will also mentioned I'm not that familiar
>> with the R code ( I'm trying to work through this to learn R and the
>> optimizers).
>>
>> maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
>> e1 and e2 supposed to be?"
>>
>> The code above is certainly not as elegant as it could be - in this
>> case the beta is rather superfluous.  It is just a vector to hold
>> parameters - but since I called all the parameters theta anyway there
>> is no need for it.  e1 and e2 are the residuals from the first and
>> second equations of the system.  Sigma is a 2x2 matrix which is the
>> outer product of the two vectors of residuals.
>>
>> Kind regards,
>>
>> Alex
>>
>>
>>
>> On 9 May 2011 23:12, Mike Marchywka  wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>>> Date: Mon, 9 May 2011 22:06:38 +1200
>>>> From: alex.ols...@gmail.com
>>>> To: pda...@gmail.com
>>>> CC: r-help@r-project.org; da...@otter-rsch.com
>>>> Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
>>>> Blundell 1982 Econometrica R vs Stata
>>>>
>>>> Peter said
>>>>
>>>> "Ahem! You might get us interested in your problem, but not to the
>>>> level that we are going to install Stata and Tsp and actually dig out
>>>> and study the scientific paper you are talking about. Please cite the
>>>> results and explain the differences."
>>>>
>>>> Apologies Peter, will do,
>>>>
>>>> The results which I can emulate in Stata b

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread peter dalgaard
Hmm, I tried replacing the x's in the model with their principal component 
scores, and suddenly everything converges as a greased lightning:

> Z <- princomp(cbind(x1,x2,x3))$scores
> Z <- as.data.frame(princomp(cbind(x1,x2,x3))$scores)
> names(Z)<- letters[1:3]
> optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=Z$a, x2=Z$b, x3=Z$c, method="BFGS")

par
1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, 
0.18045177, -0.23657327
fvalues method fns grs itns conv KKT1 KKT2 xtimes
1 -76.73878   BFGS 157  37 NULL0 TRUE TRUE  0.055
Warning messages:
1: In max(logpar) : no non-missing arguments to max; returning -Inf
2: In min(logpar) : no non-missing arguments to min; returning Inf

The likelihood appears to be spot on, but, obviously, the parameter estimates 
need to be back-transformed to the original x1,x2,x3 system. I'll leave that as 
the proverbial "exercise for the reader"...

However, the whole thing leads me to a suspicion that maybe our numerical 
gradients are misbehaving in cases with high local collinearity?

-pd

On May 9, 2011, at 13:40 , Alex Olssen wrote:

> Hi Mike,
> 
> Mike said
> "is this it, page 1559?"
> 
> That is the front page yes, page 15*6*9 has the table, of which the
> model labelled 18s is the one I replicated.
> 
> "did you actually cut/paste code anywhere and is your first
> coefficient -.19 or -.019?
> Presumably typos would be one possible problem."
> 
> -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
> estimation - as far as I can tell cutting and pasting is not the
> problem.
> 
> "generally it helps if we could at least see the equations to check your
> code against typos ( note page number ?) in lnl that may fix part of the
> mystery.  Is full text available
> on author's site, doesn't come up on citeseer AFAICT,"
> 
> Unfortunately I do not know of any place to get the full version of
> this paper that doesn't require access to a database such as JSTOR.
> The fact that this likelihood function reproduces the published
> results in Stata makes me confident that it is correct - I also have
> read a lot on systems estimation and it is a pretty standard
> likelihood function.
> 
> "I guess one question would be " what is beta" in lnl supposed to be -
> it isn't used anywhere but I will also mentioned I'm not that familiar
> with the R code ( I'm trying to work through this to learn R and the
> optimizers).
> 
> maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
> e1 and e2 supposed to be?"
> 
> The code above is certainly not as elegant as it could be - in this
> case the beta is rather superfluous.  It is just a vector to hold
> parameters - but since I called all the parameters theta anyway there
> is no need for it.  e1 and e2 are the residuals from the first and
> second equations of the system.  Sigma is a 2x2 matrix which is the
> outer product of the two vectors of residuals.
> 
> Kind regards,
> 
> Alex
> 
> 
> 
> On 9 May 2011 23:12, Mike Marchywka  wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> Date: Mon, 9 May 2011 22:06:38 +1200
>>> From: alex.ols...@gmail.com
>>> To: pda...@gmail.com
>>> CC: r-help@r-project.org; da...@otter-rsch.com
>>> Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
>>> Blundell 1982 Econometrica R vs Stata
>>> 
>>> Peter said
>>> 
>>> "Ahem! You might get us interested in your problem, but not to the
>>> level that we are going to install Stata and Tsp and actually dig out
>>> and study the scientific paper you are talking about. Please cite the
>>> results and explain the differences."
>>> 
>>> Apologies Peter, will do,
>>> 
>>> The results which I can emulate in Stata but not (yet) in R are reported 
>>> below.
>> 
>> did you actually cut/paste code anywhere and is your first coefficient -.19 
>> or -.019?
>> Presumably typos would be one possible problem.
>> 
>>> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569
>> 
>> 
>> is this it, page 1559?
>> 
>> http://www.jstor.org/pss/1913396
>> 
>> generally it helps if we could at least see the equations to check your
>> code against typos ( note page number ?) in lnl that may fix part of the
>> mystery.  Is full text available
>> on author's site, doesn't come

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread peter dalgaard

On May 9, 2011, at 13:40 , Alex Olssen wrote:

> Hi Mike,
> 
> Mike said
> "is this it, page 1559?"
> 
> That is the front page yes, page 15*6*9 has the table, of which the
> model labelled 18s is the one I replicated.
> 

However, the R code you posted will at best replicate model 18. For 18s, you 
need to impose symmetry conditions.

The fit of model 18 does seem to give similar results to those in Table II, but 
not exactly. 

Even more puzzling, if you start from the published coefficients, nlm seems to 
be getting stuck at a non-concave point of the negative likelihood.


> nlm(p=ab, f=lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, gradtol=1e-15)
$minimum
[1] -76.60681

$estimate
[1] -0.64701504  0.27284992 -0.13713084 -0.07205753  1.28200342 -0.24596766  
0.13502787  0.06001181

$gradient
[1]   2.782425  12.648629  -1.830244 -29.296795  -2.577734  -4.954042   
1.495517  11.016190

$hessian
  [,1] [,2] [,3]  [,4]  [,5] [,6] [,7] [,8]
[1,]  197435.9  1977197  1741724  805132.7  135821.9  1358013  1196714   553746
[2,] 1977197.4 19712076 17383474 8064540.2 1359856.0 13527720 11935305  5544903
[3,] 1741724.0 17383474 15336008 7121047.7 1197868.5 11929667 10529439  4896008
[4,]  805132.7  8064540  7121048 3315377.8  553733.1  5534923  4889523  2279404
[5,]  135821.9  1359856  1197868  553733.1  271511.7  2713733  2391332  1106572
[6,] 1358012.7 13527720 11929667 5534923.0 2713732.9 26955636 23784039 11054174
[7,] 1196714.3 11935305 10529439 4889522.6 2391331.9 23784039 20993413  9765022
[8,]  553746.0  5544903  4896008 2279404.1 1106571.9 11054174  9765022  4552693

$code
[1] 2

$iterations
[1] 4


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Mike Marchywka







> Date: Mon, 9 May 2011 22:06:38 +1200
> From: alex.ols...@gmail.com
> To: pda...@gmail.com
> CC: r-help@r-project.org; da...@otter-rsch.com
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> Peter said
>
> "Ahem! You might get us interested in your problem, but not to the
> level that we are going to install Stata and Tsp and actually dig out
> and study the scientific paper you are talking about. Please cite the
> results and explain the differences."
>
> Apologies Peter, will do,
>
> The results which I can emulate in Stata but not (yet) in R are reported 
> below.

did you actually cut/paste code anywhere and is your first coefficient -.19 or 
-.019?
Presumably typos would be one possible problem. 

> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569


is this it, page 1559?

http://www.jstor.org/pss/1913396

generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,


http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate

I guess one question would be " what is beta" in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the 
optimizers). 

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?




>
> TABLE II - model 18s
>
> coef std err
> p10 -0.19 0.078
> p11 0.220 0.019
> p12 -0.148 0.021
> p13 -0.072
> p20 0.893 0.072
> p21 -0.148
> p22 0.050 0.035
> p23 0.098
>
> The results which I produced in Stata are reported below.
> I spent the last hour rewriting the code to reproduce this - since I
> am now at home and not at work :(
> My results are "identical" to those published. The estimates are for
> a 3 equation symmetrical singular system.
> I have not bothered to report symmetrical results and have backed out
> an extra estimate using adding up constraints.
> I have also backed out all standard errors using the delta method.
>
> . ereturn display
> --
> | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -+
> a |
> a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
> a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
> a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
> -+
> g |
> g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
> g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
> g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
> g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
> g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
> g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
> --
>
> In R I cannot get results like this - I think it is probably to do
> with my inability at using the optimisers well.
> Any pointers would be appreciated.
>
> Peter said "Are we maximizing over the same parameter space? You say
> that the estimates from the paper gives a log-likelihood of 54.04, but
> the exact solution clocked in at 76.74, which in my book is rather
> larger."
>
> I meant +54.04 > -76.74. It is quite common to get positive
> log-likelihoods in these system estimation.
>
> Kind regards,
>
> Alex
>
> On 9 May 2011 19:04, peter dalgaard  wrote:
> >
> > On May 9, 2011, at 06:07 , Alex Olssen wrote:
> >
> >> Thank you all for your input.
> >>
> >> Unfortunately my problem is not yet resolved.  Before I respond to
> >> individual comments I make a clarification:
> >>
> >> In Stata, using the same likelihood function as above, I can reproduce
> >> EXACTLY (to 3 decimal places or more, which is exactly considering I
> >> am using different software) the results from model 8 of the paper.
> >>
> >> I take this as an indication that I am using the same likelihood
> >> function as the authors, and that it does indeed work.
> >> The reason I am trying to estimate the model in R is because while
> >> Stata reproduces model 8 perfectly it has convergence
> >> difficulties for some of the other models.
> >>
> >> Peter Dalgaard,
> >>
> >> &qu

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Alex Olssen
Hi Mike,

Mike said
"is this it, page 1559?"

That is the front page yes, page 15*6*9 has the table, of which the
model labelled 18s is the one I replicated.

"did you actually cut/paste code anywhere and is your first
coefficient -.19 or -.019?
Presumably typos would be one possible problem."

-0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
estimation - as far as I can tell cutting and pasting is not the
problem.

"generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,"

Unfortunately I do not know of any place to get the full version of
this paper that doesn't require access to a database such as JSTOR.
The fact that this likelihood function reproduces the published
results in Stata makes me confident that it is correct - I also have
read a lot on systems estimation and it is a pretty standard
likelihood function.

"I guess one question would be " what is beta" in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the
optimizers).

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?"

The code above is certainly not as elegant as it could be - in this
case the beta is rather superfluous.  It is just a vector to hold
parameters - but since I called all the parameters theta anyway there
is no need for it.  e1 and e2 are the residuals from the first and
second equations of the system.  Sigma is a 2x2 matrix which is the
outer product of the two vectors of residuals.

Kind regards,

Alex



On 9 May 2011 23:12, Mike Marchywka  wrote:
>
>
>
>
>
>
> 
>> Date: Mon, 9 May 2011 22:06:38 +1200
>> From: alex.ols...@gmail.com
>> To: pda...@gmail.com
>> CC: r-help@r-project.org; da...@otter-rsch.com
>> Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
>> Blundell 1982 Econometrica R vs Stata
>>
>> Peter said
>>
>> "Ahem! You might get us interested in your problem, but not to the
>> level that we are going to install Stata and Tsp and actually dig out
>> and study the scientific paper you are talking about. Please cite the
>> results and explain the differences."
>>
>> Apologies Peter, will do,
>>
>> The results which I can emulate in Stata but not (yet) in R are reported 
>> below.
>
> did you actually cut/paste code anywhere and is your first coefficient -.19 
> or -.019?
> Presumably typos would be one possible problem.
>
>> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569
>
>
> is this it, page 1559?
>
> http://www.jstor.org/pss/1913396
>
> generally it helps if we could at least see the equations to check your
> code against typos ( note page number ?) in lnl that may fix part of the
> mystery.  Is full text available
> on author's site, doesn't come up on citeseer AFAICT,
>
>
> http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate
>
> I guess one question would be " what is beta" in lnl supposed to be -
> it isn't used anywhere but I will also mentioned I'm not that familiar
> with the R code ( I'm trying to work through this to learn R and the 
> optimizers).
>
> maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
> e1 and e2 supposed to be?
>
>
>
>
>>
>> TABLE II - model 18s
>>
>> coef std err
>> p10 -0.19 0.078
>> p11 0.220 0.019
>> p12 -0.148 0.021
>> p13 -0.072
>> p20 0.893 0.072
>> p21 -0.148
>> p22 0.050 0.035
>> p23 0.098
>>
>> The results which I produced in Stata are reported below.
>> I spent the last hour rewriting the code to reproduce this - since I
>> am now at home and not at work :(
>> My results are "identical" to those published. The estimates are for
>> a 3 equation symmetrical singular system.
>> I have not bothered to report symmetrical results and have backed out
>> an extra estimate using adding up constraints.
>> I have also backed out all standard errors using the delta method.
>>
>> . ereturn display
>> --
>> | Coef. Std. Err. z P>|z| [95% Conf. Interval]
>> -+
>> a |
>> a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
>> a2 

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Alex Olssen
Peter said

"Ahem! You might get us interested in your problem, but not to the
level that we are going to install Stata and Tsp and actually dig out
and study the scientific paper you are talking about. Please cite the
results and explain the differences."

Apologies Peter, will do,

The results which I can emulate in Stata but not (yet) in R are reported below.
They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569

TABLE II - model 18s

 coef std err
p10 -0.19 0.078
p11 0.2200.019
p12 -0.148   0.021
p13 -0.072
p20 0.8930.072
p21 -0.148
p22 0.0500.035
p23 0.098

The results which I produced in Stata are reported below.
I spent the last hour rewriting the code to reproduce this - since I
am now at home and not at work :(
My results are "identical" to those published.  The estimates are for
a 3 equation symmetrical singular system.
I have not bothered to report symmetrical results and have backed out
an extra estimate using adding up constraints.
I have also backed out all standard errors using the delta method.

. ereturn display
--
|  Coef.   Std. Err.  zP>|z| [95% Conf. Interval]
-+
a|
 a1 |  -.0188115   .0767759-0.25   0.806-.1692895.1316664
 a2 |   .8926598   .070406812.68   0.000 .75466511.030655
 a3 |   .1261517   .0590193 2.14   0.033  .010476.2418275
-+
g|
g11 |   .2199442   .018407511.95   0.000  .183866.2560223
g12 |  -.1476856   .0211982-6.97   0.000-.1892334   -.1061378
g13 |  -.0722586   .0145154-4.98   0.000-.1007082   -.0438089
g22 |   .0496865   .0348052 1.43   0.153-.0185305.1179034
g23 |   .0979991   .0174397 5.62   0.000 .0638179.1321803
g33 |  -.0257405   .0113869-2.26   0.024-.0480584   -.0034226
--

In R I cannot get results like this - I think it is probably to do
with my inability at using the optimisers well.
Any pointers would be appreciated.

Peter said "Are we maximizing over the same parameter space? You say
that the estimates from the paper gives a log-likelihood of 54.04, but
the exact solution clocked in at 76.74, which in my book is rather
larger."

I meant +54.04 > -76.74.  It is quite common to get positive
log-likelihoods in these system estimation.

Kind regards,

Alex

On 9 May 2011 19:04, peter dalgaard  wrote:
>
> On May 9, 2011, at 06:07 , Alex Olssen wrote:
>
>> Thank you all for your input.
>>
>> Unfortunately my problem is not yet resolved.  Before I respond to
>> individual comments I make a clarification:
>>
>> In Stata, using the same likelihood function as above, I can reproduce
>> EXACTLY (to 3 decimal places or more, which is exactly considering I
>> am using different software) the results from model 8 of the paper.
>>
>> I take this as an indication that I am using the same likelihood
>> function as the authors, and that it does indeed work.
>> The reason I am trying to estimate the model in R is because while
>> Stata reproduces model 8 perfectly it has convergence
>> difficulties for some of the other models.
>>
>> Peter Dalgaard,
>>
>> "Better starting values would help. In this case, almost too good
>> values are available:
>>
>> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>>
>> which appears to be the _exact_ solution."
>>
>> Thanks for the suggestion.  Using these starting values produces the
>> exact estimate that Dave Fournier emailed me.
>> If these are the exact solution then why did the author publish
>> different answers which are completely reproducible in
>> Stata and Tsp?
>
>
> Ahem! You might get us interested in your problem, but not to the level that 
> we are going to install Stata and Tsp and actually dig out and study the 
> scientific paper you are talking about. Please cite the results and explain 
> the differences.
>
> Are we maximizing over the same parameter space? You say that the estimates 
> from the paper gives a log-likelihood of 54.04, but the exact solution 
> clocked in at 76.74, which in my book is rather larger.
>
> Confused
>
> -p
>
>
>>
>> Ravi,
>>
>> Thanks for introducing optimx to me, I am new to R.  I completely
>> agree that you can get higher log-likelihood values
>> than what those obtained with optim and the starting values suggested
>> by Peter.  In fact, in Stata, when I reproduce
>> the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.
>>
>> Furthermore if I estimate model 8 without symmetry imposed on the
>> system I reproduce the Likelihood Ratio reported
>> in the paper to 3 deci

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread peter dalgaard

On May 9, 2011, at 06:07 , Alex Olssen wrote:

> Thank you all for your input.
> 
> Unfortunately my problem is not yet resolved.  Before I respond to
> individual comments I make a clarification:
> 
> In Stata, using the same likelihood function as above, I can reproduce
> EXACTLY (to 3 decimal places or more, which is exactly considering I
> am using different software) the results from model 8 of the paper.
> 
> I take this as an indication that I am using the same likelihood
> function as the authors, and that it does indeed work.
> The reason I am trying to estimate the model in R is because while
> Stata reproduces model 8 perfectly it has convergence
> difficulties for some of the other models.
> 
> Peter Dalgaard,
> 
> "Better starting values would help. In this case, almost too good
> values are available:
> 
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> 
> which appears to be the _exact_ solution."
> 
> Thanks for the suggestion.  Using these starting values produces the
> exact estimate that Dave Fournier emailed me.
> If these are the exact solution then why did the author publish
> different answers which are completely reproducible in
> Stata and Tsp?


Ahem! You might get us interested in your problem, but not to the level that we 
are going to install Stata and Tsp and actually dig out and study the 
scientific paper you are talking about. Please cite the results and explain the 
differences.

Are we maximizing over the same parameter space? You say that the estimates 
from the paper gives a log-likelihood of 54.04, but the exact solution clocked 
in at 76.74, which in my book is rather larger.

Confused   

-p


> 
> Ravi,
> 
> Thanks for introducing optimx to me, I am new to R.  I completely
> agree that you can get higher log-likelihood values
> than what those obtained with optim and the starting values suggested
> by Peter.  In fact, in Stata, when I reproduce
> the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.
> 
> Furthermore if I estimate model 8 without symmetry imposed on the
> system I reproduce the Likelihood Ratio reported
> in the paper to 3 decimal places as well, suggesting that the
> log-likelihoods I am reporting differ from those in the paper
> only due to a constant.
> 
> Thanks for your comments,
> 
> I am still highly interested in knowing why the results of the
> optimisation in R are so different to those in Stata?
> 
> I might try making my convergence requirements more stringent.
> 
> Kind regards,
> 
> Alex

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-08 Thread Alex Olssen
Thank you all for your input.

Unfortunately my problem is not yet resolved.  Before I respond to
individual comments I make a clarification:

In Stata, using the same likelihood function as above, I can reproduce
EXACTLY (to 3 decimal places or more, which is exactly considering I
am using different software) the results from model 8 of the paper.

I take this as an indication that I am using the same likelihood
function as the authors, and that it does indeed work.
The reason I am trying to estimate the model in R is because while
Stata reproduces model 8 perfectly it has convergence
difficulties for some of the other models.

Peter Dalgaard,

"Better starting values would help. In this case, almost too good
values are available:

start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution."

Thanks for the suggestion.  Using these starting values produces the
exact estimate that Dave Fournier emailed me.
If these are the exact solution then why did the author publish
different answers which are completely reproducible in
Stata and Tsp?

Ravi,

Thanks for introducing optimx to me, I am new to R.  I completely
agree that you can get higher log-likelihood values
than what those obtained with optim and the starting values suggested
by Peter.  In fact, in Stata, when I reproduce
the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.

Furthermore if I estimate model 8 without symmetry imposed on the
system I reproduce the Likelihood Ratio reported
in the paper to 3 decimal places as well, suggesting that the
log-likelihoods I am reporting differ from those in the paper
only due to a constant.

Thanks for your comments,

I am still highly interested in knowing why the results of the
optimisation in R are so different to those in Stata?

I might try making my convergence requirements more stringent.

Kind regards,

Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-07 Thread peter dalgaard

On May 7, 2011, at 17:51 , Ravi Varadhan wrote:

> There is something strange in this problem.  I think the log-likelihood is 
> incorrect.  See the results below from "optimx".  You can get much larger 
> log-likelihood values than for the exact solution that Peter provided.
> 
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
>  n <- length(y1)
>  beta <- theta[1:8]
>  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
>  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
>  e <- cbind(e1, e2)
>  sigma <- t(e)%*%e
>  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))  # it looks like there is 
> something wrong here
>  return(-logl)
> }
> 
> data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
> 
> attach(data)
> 
> require(optimx)
> 
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> 
> # the warnings can be safely ignored in the "optimx" calls
> p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> 
> p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> 
> p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> 

Hm? I get stuff like below (for p3). Some of the entries have a considerably 
larger NEGATIVE log likelihood, but that's hardly a problem with the likelihood 
per se.  

 fvalues  method   fns  grs itns conv  KKT1  KKT2 xtimes
12 8.988466e+307  newuoaNA   NA   NA NANA  0.002
11 8.988466e+307  bobyqaNA   NA   NA NANA  0.001
3   23.66768 Nelder-Mead  1501   NA NULL1 FALSE FALSE   0.18
7  -51.76068 spg  1925   NA 15011 FALSE FALSE  2.322
4  -55.78708L-BFGS-B  2093 2093 NULL0 FALSE FALSE  4.176
2  -70.57023  CG  5360 1501 NULL1 FALSE  TRUE  3.465
1  -70.66286BFGS 21481 1500 NULL1 FALSE  TRUE  5.383
8  -76.73765  ucminf  1500 1500 NULL0 FALSE  TRUE  0.067
9  -76.73871  Rcgmin  2434  867 NULL0 FALSE  TRUE  1.514
10 -76.73877  Rvmmin   231   45 NULL0 FALSE  TRUE  0.101
6  -76.73878  nlminb   130  581   670  TRUE  TRUE  0.085
5  -76.73878 nlmNA   NA   460  TRUE  TRUE  0.058

I must admit that I didn't check the likelihood in detail, but minimizing the 
determinant of the residual SSD matrix is generally what you end up doing in 
this sort of models. (Of course, you can safely lose the constants, and you can 
also think up more stable ways of computing the log-determinant, but it's only 
2x2 for cryin' out loud.)


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-07 Thread Ravi Varadhan
There is something strange in this problem.  I think the log-likelihood is 
incorrect.  See the results below from "optimx".  You can get much larger 
log-likelihood values than for the exact solution that Peter provided.

## model 18
lnl <- function(theta,y1, y2, x1, x2, x3) {
  n <- length(y1)
  beta <- theta[1:8]
  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e <- cbind(e1, e2)
  sigma <- t(e)%*%e
  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))  # it looks like there is 
something wrong here
  return(-logl)
}

data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")

attach(data)

require(optimx)

start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

# the warnings can be safely ignored in the "optimx" calls
p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
+ x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

Ravi.

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
peter dalgaard [pda...@gmail.com]
Sent: Saturday, May 07, 2011 4:46 AM
To: Alex Olssen
Cc: r-help@r-project.org
Subject: Re: [R] maximum likelihood convergence reproducing Anderson
Blundell 1982 Econometrica R vs Stata

On May 6, 2011, at 14:29 , Alex Olssen wrote:

> Dear R-help,
>
> I am trying to reproduce some results presented in a paper by Anderson
> and Blundell in 1982 in Econometrica using R.
> The estimation I want to reproduce concerns maximum likelihood
> estimation of a singular equation system.
> I can estimate the static model successfully in Stata but for the
> dynamic models I have difficulty getting convergence.
> My R program which uses the same likelihood function as in Stata has
> convergence properties even for the static case.
>
> I have copied my R program and the data below.  I realise the code
> could be made more elegant - but it is short enough.
>
> Any ideas would be highly appreciated.

Better starting values would help. In this case, almost too good values are 
available:

start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution.

Apart from that, it seems that the conjugate gradient methods have difficulties 
with this likelihood, for some less than obvious reason. Increasing the maxit 
gets you closer but still not satisfactory.

I would suggest trying out the experimental optimx package. Apparently, some of 
the algorithms in there are much better at handling this likelihood, notably 
"nlm" and "nlminb".




>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
>  n <- length(y1)
>  beta <- theta[1:8]
>  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
>  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
>  e <- cbind(e1, e2)
>  sigma <- t(e)%*%e
>  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
>  return(-logl)
> }
> p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2,
> x1=x1, x2=x2, x3=x3)
>
> "year","y1","y2","x1","x2","x3"
> 1929,0.554779,0.266051,9.87415,8.60371,3.75673
> 1930,0.516336,0.297473,9.68621,8.50492,3.80692
> 1931,0.508201,0.324199,9.4701,8.27596,3.80437
> 1932,0.500482,0.33958,9.24692,7.99221,3.76251
> 1933,0.501695,0.276974,9.35356,7.98968,3.69071
> 1934,0.591426,0.287008,9.42084,8.0362,3.63564
> 1935,0.565047,0.244096,9.53972,8.15803,3.59285
> 1936,0.605954,0.239187,9.6914,8.32009,3.56678
> 1937,0.620161,0.218232,9.76817,8.42001,3.57381
> 1938,0.592091,0.243161,9.51295,8.19771,3.6024
> 1939,0.613115,0.217042,9.68047,8.30987,3.58147
> 1940,0.632455,0.215269,9.78417,8.49624,3.57744
> 1941,0.663139,0.184409,10.0606,8.69868,3.6095
> 1942,0.698179,0.164348,10.2892,8.84523,3.4
> 1943,0.70459,0.146865,10.4731,8.93024,3.65388
> 1944,0.694067,0.161722,10.4465,8.96044,3.62434
> 1945,0.674668,0.197231,10.279,8.82522,3.61489
> 1946,0.635916,0.204232,10.1536,8.77547,3.67562
> 1947,0.642855,0.187224,10.2053,8.77481,3.82632
> 1948,0.641063,0.186566,10.2227,8.83821,3.96038
> 1949,0.646317,0.203646,10.1127,8.82364,4.0447
> 1950,0.645476,0.187497,10.2067,8.84161,4.08128
> 1951,0.63803,0.197361,10.2773,8.9401,4.10951
> 1952,0.634626,0.209992,10.283,9.01603,4.1693
> 1953,0.631144,0.219287,10.3217,9.06317,4.21727
> 1954,0.593088,0.235335,10.2101,9.05664,4.2567
> 1955,0.60

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-07 Thread peter dalgaard

On May 6, 2011, at 14:29 , Alex Olssen wrote:

> Dear R-help,
> 
> I am trying to reproduce some results presented in a paper by Anderson
> and Blundell in 1982 in Econometrica using R.
> The estimation I want to reproduce concerns maximum likelihood
> estimation of a singular equation system.
> I can estimate the static model successfully in Stata but for the
> dynamic models I have difficulty getting convergence.
> My R program which uses the same likelihood function as in Stata has
> convergence properties even for the static case.
> 
> I have copied my R program and the data below.  I realise the code
> could be made more elegant - but it is short enough.
> 
> Any ideas would be highly appreciated.

Better starting values would help. In this case, almost too good values are 
available:

start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution. 

Apart from that, it seems that the conjugate gradient methods have difficulties 
with this likelihood, for some less than obvious reason. Increasing the maxit 
gets you closer but still not satisfactory. 

I would suggest trying out the experimental optimx package. Apparently, some of 
the algorithms in there are much better at handling this likelihood, notably 
"nlm" and "nlminb".




> 
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
>  n <- length(y1)
>  beta <- theta[1:8]
>  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
>  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
>  e <- cbind(e1, e2)
>  sigma <- t(e)%*%e
>  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
>  return(-logl)
> }
> p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2,
> x1=x1, x2=x2, x3=x3)
> 
> "year","y1","y2","x1","x2","x3"
> 1929,0.554779,0.266051,9.87415,8.60371,3.75673
> 1930,0.516336,0.297473,9.68621,8.50492,3.80692
> 1931,0.508201,0.324199,9.4701,8.27596,3.80437
> 1932,0.500482,0.33958,9.24692,7.99221,3.76251
> 1933,0.501695,0.276974,9.35356,7.98968,3.69071
> 1934,0.591426,0.287008,9.42084,8.0362,3.63564
> 1935,0.565047,0.244096,9.53972,8.15803,3.59285
> 1936,0.605954,0.239187,9.6914,8.32009,3.56678
> 1937,0.620161,0.218232,9.76817,8.42001,3.57381
> 1938,0.592091,0.243161,9.51295,8.19771,3.6024
> 1939,0.613115,0.217042,9.68047,8.30987,3.58147
> 1940,0.632455,0.215269,9.78417,8.49624,3.57744
> 1941,0.663139,0.184409,10.0606,8.69868,3.6095
> 1942,0.698179,0.164348,10.2892,8.84523,3.4
> 1943,0.70459,0.146865,10.4731,8.93024,3.65388
> 1944,0.694067,0.161722,10.4465,8.96044,3.62434
> 1945,0.674668,0.197231,10.279,8.82522,3.61489
> 1946,0.635916,0.204232,10.1536,8.77547,3.67562
> 1947,0.642855,0.187224,10.2053,8.77481,3.82632
> 1948,0.641063,0.186566,10.2227,8.83821,3.96038
> 1949,0.646317,0.203646,10.1127,8.82364,4.0447
> 1950,0.645476,0.187497,10.2067,8.84161,4.08128
> 1951,0.63803,0.197361,10.2773,8.9401,4.10951
> 1952,0.634626,0.209992,10.283,9.01603,4.1693
> 1953,0.631144,0.219287,10.3217,9.06317,4.21727
> 1954,0.593088,0.235335,10.2101,9.05664,4.2567
> 1955,0.60736,0.227035,10.272,9.07566,4.29193
> 1956,0.607204,0.246631,10.2743,9.12407,4.32252
> 1957,0.586994,0.256784,10.2396,9.1588,4.37792
> 1958,0.548281,0.271022,10.1248,9.14025,4.42641
> 1959,0.553401,0.261815,10.2012,9.1598,4.4346
> 1960,0.552105,0.275137,10.1846,9.19297,4.43173
> 1961,0.544133,0.280783,10.1479,9.19533,4.44407
> 1962,0.55382,0.281286,10.197,9.21544,4.45074
> 1963,0.549951,0.28303,10.2036,9.22841,4.46403
> 1964,0.547204,0.291287,10.2271,9.23954,4.48447
> 1965,0.55511,0.281313,10.2882,9.26531,4.52057
> 1966,0.558182,0.280151,10.353,9.31675,4.58156
> 1967,0.545735,0.294385,10.3351,9.35382,4.65983
> 1968,0.538964,0.294593,10.3525,9.38361,4.71804
> 1969,0.542764,0.299927,10.3676,9.40725,4.76329
> 1970,0.534595,0.315319,10.2968,9.39139,4.81136
> 1971,0.545591,0.315828,10.2592,9.34121,4.84082
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-06 Thread Alex Olssen
Dear R-help,

I am trying to reproduce some results presented in a paper by Anderson
and Blundell in 1982 in Econometrica using R.
The estimation I want to reproduce concerns maximum likelihood
estimation of a singular equation system.
I can estimate the static model successfully in Stata but for the
dynamic models I have difficulty getting convergence.
My R program which uses the same likelihood function as in Stata has
convergence properties even for the static case.

I have copied my R program and the data below.  I realise the code
could be made more elegant - but it is short enough.

Any ideas would be highly appreciated.

## model 18
lnl <- function(theta,y1, y2, x1, x2, x3) {
  n <- length(y1)
  beta <- theta[1:8]
  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e <- cbind(e1, e2)
  sigma <- t(e)%*%e
  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
  return(-logl)
}
p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2,
x1=x1, x2=x2, x3=x3)

"year","y1","y2","x1","x2","x3"
1929,0.554779,0.266051,9.87415,8.60371,3.75673
1930,0.516336,0.297473,9.68621,8.50492,3.80692
1931,0.508201,0.324199,9.4701,8.27596,3.80437
1932,0.500482,0.33958,9.24692,7.99221,3.76251
1933,0.501695,0.276974,9.35356,7.98968,3.69071
1934,0.591426,0.287008,9.42084,8.0362,3.63564
1935,0.565047,0.244096,9.53972,8.15803,3.59285
1936,0.605954,0.239187,9.6914,8.32009,3.56678
1937,0.620161,0.218232,9.76817,8.42001,3.57381
1938,0.592091,0.243161,9.51295,8.19771,3.6024
1939,0.613115,0.217042,9.68047,8.30987,3.58147
1940,0.632455,0.215269,9.78417,8.49624,3.57744
1941,0.663139,0.184409,10.0606,8.69868,3.6095
1942,0.698179,0.164348,10.2892,8.84523,3.4
1943,0.70459,0.146865,10.4731,8.93024,3.65388
1944,0.694067,0.161722,10.4465,8.96044,3.62434
1945,0.674668,0.197231,10.279,8.82522,3.61489
1946,0.635916,0.204232,10.1536,8.77547,3.67562
1947,0.642855,0.187224,10.2053,8.77481,3.82632
1948,0.641063,0.186566,10.2227,8.83821,3.96038
1949,0.646317,0.203646,10.1127,8.82364,4.0447
1950,0.645476,0.187497,10.2067,8.84161,4.08128
1951,0.63803,0.197361,10.2773,8.9401,4.10951
1952,0.634626,0.209992,10.283,9.01603,4.1693
1953,0.631144,0.219287,10.3217,9.06317,4.21727
1954,0.593088,0.235335,10.2101,9.05664,4.2567
1955,0.60736,0.227035,10.272,9.07566,4.29193
1956,0.607204,0.246631,10.2743,9.12407,4.32252
1957,0.586994,0.256784,10.2396,9.1588,4.37792
1958,0.548281,0.271022,10.1248,9.14025,4.42641
1959,0.553401,0.261815,10.2012,9.1598,4.4346
1960,0.552105,0.275137,10.1846,9.19297,4.43173
1961,0.544133,0.280783,10.1479,9.19533,4.44407
1962,0.55382,0.281286,10.197,9.21544,4.45074
1963,0.549951,0.28303,10.2036,9.22841,4.46403
1964,0.547204,0.291287,10.2271,9.23954,4.48447
1965,0.55511,0.281313,10.2882,9.26531,4.52057
1966,0.558182,0.280151,10.353,9.31675,4.58156
1967,0.545735,0.294385,10.3351,9.35382,4.65983
1968,0.538964,0.294593,10.3525,9.38361,4.71804
1969,0.542764,0.299927,10.3676,9.40725,4.76329
1970,0.534595,0.315319,10.2968,9.39139,4.81136
1971,0.545591,0.315828,10.2592,9.34121,4.84082

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.