Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
> From: marchy...@hotmail.com > To: rvarad...@jhmi.edu; pda...@gmail.com; alex.ols...@gmail.com > Date: Sat, 21 May 2011 20:40:44 -0400 > CC: r-help@r-project.org > Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell > 1982 Econometrica R vs Stata > > > > > > > > > > > > > > > From: rvarad...@jhmi.edu > > To: marchy...@hotmail.com; pda...@gmail.com; alex.ols...@gmail.com > > CC: r-help@r-project.org > > Date: Sat, 21 May 2011 17:26:29 -0400 > > Subject: RE: [R] maximum likelihood convergence reproducing Anderson > > Blundell 1982 Econometrica R vs Stata > > > > Hi, > > > > I don't think the final verdict has been spoken. Peter's posts have hinted > > at ill-conditioning as the crux of the problem. So, I decided to try a > > couple of more things: (1) standardizing the covariates, (2) exact > > gradient, and (3) both (1) and (2). > > Cool, I thought everyone lost interest. I'll get back to this then. > Before launching into this, I was curious however if the STATA > solution ( or whatever other proudct was used) was thought > to reprsent a good solution or the actual global optimum. > > IIRC, your earlier post along these lines, > > > # exact solution as the starting value > > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) > > was what got me started. > > > I guess my point was that the problem would not obviously > have a nice surface to optimize and IIRC the title of the paper > suggested the system was a bit difficult ( I won't pay for med papers, > not going to pay for econ LOL). The function to be minimized, and this > is stated as fact but only for sake of eliciting criticism, > is the determinant of 2 unrelated residuals vectors. And, from memory, > > > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 > > > > this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d)) with > d being angle between residual vectors ( in space with dimension of number > of data points). Or, E=F*Sin^2(d) and depending on data is would seem > possible to move in such a way that F increases but just more slowly than > Sin^2(d). Any solution for which they are colinear would seem to be optimal. > > I guess if my starting point is not too far off I may see if > I can find some diagnostics to determine is the data set creates > a condition as I have outlined and optimizer. > > It might, for example, be interesting to see what happens to |e1||e2| > and Sin(d) at various points. > ( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming "x0"=1 > LOL), > E1=Y1-T1*X , E2=Y2-T2*x > E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X) > > I guess I keep working on the above and see if it points > to anything pathological or that doesn't play nice with > optimizer. I was too lazy to continue the above on paper but empirically this seems to be part of symptoms. For example, e1m=sum(e1*e1); e2m=sum(e2*e2); e12=sum(e1*e2); phit=e12/sqrt(e1m*e2m); dete=det(sigma); xxx=paste("e1m*e2m=",e1m*e2m," aphi=",phit, " dete=",dete,sep=""); print(xxx); on this call, print("doing BGFS from zero"); p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=data$y1, y2=data$y2, x 1=data$x1, x2=data$x2, x3=data$x3) shows that the the product of e1 and e2 explodes with the error vectors become parallel, ( since I got an answer I like, I haven't bothered to check for typos or coding errors LOL note in particular things like sqrt() etc ) [1] "e1m*e2m=41.1669479546532 aphi=0.960776458479452 dete=3.16609220303528" [1] "e1m*e2m=41.0289397617336 aphi=0.960734505863182 dete=3.15878562836847" [1] "e1m*e2m=41.3051898313859 aphi=0.960818247708638 dete=3.17340730403131" [1] "e1m*e2m=39.7867410499191 aphi=0.960397122114274 dete=3.08893784983331" [1] "e1m*e2m=42.5708634641545 aphi=0.961141202950479 dete=3.24422282329344" [1] "e1m*e2m=39.952815544519 aphi=0.960383767149008 dete=3.10285630456706" [1] "e1m*e2m=42.3994556719457 aphi=0.961155360764465 dete=3.23000632577225" [1] "e1m*e2m=40.6077431942131 aphi=0.960502159746792 dete=3.14448500444145" [1] "e1m*e2m=41.7300849780045 aphi=0.96104579811445 dete=3.18780183350788" [1] "e1m*e2m=40.8461025929321 aphi=0.960565688832973 dete=3.15795749888956" [1] "e1m*e2m=41.4890962339491 aphi=0.960985035909235 dete=3.17423784875441" [1] "e1m*e2m=38.0026479775113 aphi=0.958424486959574 dete=3.09427071121483"
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
> From: rvarad...@jhmi.edu > To: marchy...@hotmail.com; pda...@gmail.com; alex.ols...@gmail.com > CC: r-help@r-project.org > Date: Sat, 21 May 2011 17:26:29 -0400 > Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell > 1982 Econometrica R vs Stata > > Hi, > > I don't think the final verdict has been spoken. Peter's posts have hinted at > ill-conditioning as the crux of the problem. So, I decided to try a couple of > more things: (1) standardizing the covariates, (2) exact gradient, and (3) > both (1) and (2). Cool, I thought everyone lost interest. I'll get back to this then. Before launching into this, I was curious however if the STATA solution ( or whatever other proudct was used) was thought to reprsent a good solution or the actual global optimum. IIRC, your earlier post along these lines, > # exact solution as the starting value > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) was what got me started. I guess my point was that the problem would not obviously have a nice surface to optimize and IIRC the title of the paper suggested the system was a bit difficult ( I won't pay for med papers, not going to pay for econ LOL). The function to be minimized, and this is stated as fact but only for sake of eliciting criticism, is the determinant of 2 unrelated residuals vectors. And, from memory, > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d)) with d being angle between residual vectors ( in space with dimension of number of data points). Or, E=F*Sin^2(d) and depending on data is would seem possible to move in such a way that F increases but just more slowly than Sin^2(d). Any solution for which they are colinear would seem to be optimal. I guess if my starting point is not too far off I may see if I can find some diagnostics to determine is the data set creates a condition as I have outlined and optimizer. It might, for example, be interesting to see what happens to |e1||e2| and Sin(d) at various points. ( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming "x0"=1 LOL), E1=Y1-T1*X , E2=Y2-T2*x E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X) I guess I keep working on the above and see if it points to anything pathological or that doesn't play nice with optimizer. > > I compute the "exact" gradient using a complex-step derivative approach. This > works just like the standard first-order, forward differencing. The only > (but, essential) difference is that an imaginary increment, i*dx, is used. > This, incredibly, gives exact gradients (up to machine precision). > > Here are the code and the results of my experiments: > > data <- read.table("h:/computations/optimx_example.dat", header=TRUE, sep=",") > attach(data) > require(optimx) > > ## model 18 > lnl <- function(theta,y1, y2, x1, x2, x3) { > n <- length(y1) > beta <- theta[1:8] > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 > e <- cbind(e1, e2) > sigma <- t(e)%*%e > det.sigma <- sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1] > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det.sigma)) > return(-logl) > } > > csd <- function(fn, x, ...) { > # Complex step derivatives; yields exact derivatives > h <- .Machine$double.eps > n <- length(x) > h0 <- g <- rep(0, n) > for (i in 1:n) { > h0[i] <- h * 1i > g[i] <- Im(fn(x+h0, ...))/h > h0[i] <- 0 > } > g > } > > gr.csd <- function(theta,y1, y2, x1, x2, x3) { > csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3) > } > > # exact solution as the starting value > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) > p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, > control=list(all.methods=TRUE, maxit=1500)) > > # numerical gradient > p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, > control=list(all.methods=TRUE, maxit=1500)) > > # exact gradient > p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, > control=list(all.methods=TRUE, maxit=1500)) > > # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact > gradient is used, we also see a major difference for L-BFGS-B > # Exact gradient did not affect the gradient methods, CG and spg, much. > However, convergence of Rcgmin improved when exact gradient was used > > x1s <- scale(x1) > x2s <- scale(x2) > x3s <- scale(x3) > > p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Hi, I don't think the final verdict has been spoken. Peter's posts have hinted at ill-conditioning as the crux of the problem. So, I decided to try a couple of more things: (1) standardizing the covariates, (2) exact gradient, and (3) both (1) and (2). I compute the "exact" gradient using a complex-step derivative approach. This works just like the standard first-order, forward differencing. The only (but, essential) difference is that an imaginary increment, i*dx, is used. This, incredibly, gives exact gradients (up to machine precision). Here are the code and the results of my experiments: data <- read.table("h:/computations/optimx_example.dat", header=TRUE, sep=",") attach(data) require(optimx) ## model 18 lnl <- function(theta,y1, y2, x1, x2, x3) { n <- length(y1) beta <- theta[1:8] e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e <- cbind(e1, e2) sigma <- t(e)%*%e det.sigma <- sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1] logl <- -1*n/2*(2*(1+log(2*pi)) + log(det.sigma)) return(-logl) } csd <- function(fn, x, ...) { # Complex step derivatives; yields exact derivatives h <- .Machine$double.eps n <- length(x) h0 <- g <- rep(0, n) for (i in 1:n) { h0[i] <- h * 1i g[i] <- Im(fn(x+h0, ...))/h h0[i] <- 0 } g } gr.csd <- function(theta,y1, y2, x1, x2, x3) { csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3) } # exact solution as the starting value start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) # numerical gradient p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) # exact gradient p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact gradient is used, we also see a major difference for L-BFGS-B # Exact gradient did not affect the gradient methods, CG and spg, much. However, convergence of Rcgmin improved when exact gradient was used x1s <- scale(x1) x2s <- scale(x2) x3s <- scale(x3) p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, control=list(all.methods=TRUE, maxit=1500)) # both scaling and exact gradient p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, control=list(all.methods=TRUE, maxit=1500)) # Comparing p3 and p3g, use of exact gradient improved spg dramatically. However, it made CG worse! # Of course, derivative-free methods newuoa, and Nelder-Mead are improved by scaling, but not by the availability of exact gradients. I don't know what is wrong with bobyqa in this example. In short, even with scaling and exact gradients, this optimization problem is recalcitrant. Best, Ravi. From: Mike Marchywka [marchy...@hotmail.com] Sent: Thursday, May 12, 2011 8:30 AM To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com Cc: r-help@r-project.org Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata So what was the final verdict on this discussion? I kind of lost track if anyone has a minute to summarize and critique my summary below. Apparently there were two issues, the comparison between R and Stata was one issue and the "optimum" solution another. As I understand it, there was some question about R numerical gradient calculation. This would suggest some features of the function may be of interest to consider. The function to be optimized appears to be, as OP stated, some function of residuals of two ( unrelated ) fits. The residual vectors e1 and e2 are dotted in various combinations creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which is the result to be minimized by choice of theta. Theta it seems is an 8 component vector, 4 components determine e1 and the other 4 e2. Presumably a unique solution would require that e1 and e2, both n-component vectors, point in different directions or else both could become aribtarily large while keeping the error signal at zero. For fixed magnitudes, colinearity would reduce the "Error." The intent would appear to be to keep the residuals distributed similarly in the two ( unrelated) fits. I guess my question is, " did anyone determine that there is a unique solution?" or am I totally wrong here ( I haven't used these myself to any extent and just try to run some simple teaching examples, asking for my own clarification as much as anything). Thanks. ---- &
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
So what was the final verdict on this discussion? I kind of lost track if anyone has a minute to summarize and critique my summary below. Apparently there were two issues, the comparison between R and Stata was one issue and the "optimum" solution another. As I understand it, there was some question about R numerical gradient calculation. This would suggest some features of the function may be of interest to consider. The function to be optimized appears to be, as OP stated, some function of residuals of two ( unrelated ) fits. The residual vectors e1 and e2 are dotted in various combinations creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which is the result to be minimized by choice of theta. Theta it seems is an 8 component vector, 4 components determine e1 and the other 4 e2. Presumably a unique solution would require that e1 and e2, both n-component vectors, point in different directions or else both could become aribtarily large while keeping the error signal at zero. For fixed magnitudes, colinearity would reduce the "Error." The intent would appear to be to keep the residuals distributed similarly in the two ( unrelated) fits. I guess my question is, " did anyone determine that there is a unique solution?" or am I totally wrong here ( I haven't used these myself to any extent and just try to run some simple teaching examples, asking for my own clarification as much as anything). Thanks. > From: rvarad...@jhmi.edu > To: pda...@gmail.com; alex.ols...@gmail.com > Date: Sat, 7 May 2011 11:51:56 -0400 > CC: r-help@r-project.org > Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell > 1982 Econometrica R vs Stata > > There is something strange in this problem. I think the log-likelihood is > incorrect. See the results below from "optimx". You can get much larger > log-likelihood values than for the exact solution that Peter provided. > > ## model 18 > lnl <- function(theta,y1, y2, x1, x2, x3) { > n <- length(y1) > beta <- theta[1:8] > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 > e <- cbind(e1, e2) > sigma <- t(e)%*%e > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is > something wrong here > return(-logl) > } > > data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",") > > attach(data) > > require(optimx) > > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) > > # the warnings can be safely ignored in the "optimx" calls > p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) > > p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) > > p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) > > Ravi. > ____________ > From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf > Of peter dalgaard [pda...@gmail.com] > Sent: Saturday, May 07, 2011 4:46 AM > To: Alex Olssen > Cc: r-help@r-project.org > Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell > 1982 Econometrica R vs Stata > > On May 6, 2011, at 14:29 , Alex Olssen wrote: > > > Dear R-help, > > > > I am trying to reproduce some results presented in a paper by Anderson > > and Blundell in 1982 in Econometrica using R. > > The estimation I want to reproduce concerns maximum likelihood > > estimation of a singular equation system. > > I can estimate the static model successfully in Stata but for the > > dynamic models I have difficulty getting convergence. > > My R program which uses the same likelihood function as in Stata has > > convergence properties even for the static case. > > > > I have copied my R program and the data below. I realise the code > > could be made more elegant - but it is short enough. > > > > Any ideas would be highly appreciated. > > Better starting values would help. In this case, almost too good values are > available: > > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) > > which appears to be the _exact_ solution. > > Apart from that, it seems that the conjugate gradient methods have > difficulties with this likelihood, for some less than obvious reason. > Increasing the maxit gets you closer but still not satisfactory. > > I would suggest trying out the experimental optimx package. Appar
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Alex Olssen wrote: > > Wow that is really interesting, > > Sorry I was asleep when you emailed these. > > And yes, of course, I had been trying to implement model 18, not 18s, > that was a typo, sorry. > > I will have a look at the code you posted. > > Thanks, > > Alex > I have run nlm with a zero starting vector on model 18. > start.zero <- rep(0,8) > > q.zero <- nlm(p=start.zero, f=lnl, hessian=TRUE, gradtol=1e-15, + y1=y1, y2=y2, x1=x1, x2=x2, x3=x3) > > q.zero $minimum [1] -76.73878 $estimate [1] -0.68124025 0.28686058 -0.15394903 -0.06173779 1.30304998 -0.25446946 [7] 0.14531856 0.05349810 $gradient [1] 0.0001356761 0.0014407493 0.0013240913 0.0006367333 -0.0005172628 [6] -0.0051911383 -0.0044429086 -0.0019081432 $hessian [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 197993.1 1982843 1746582 807261.9 135628.5 1355944 1194854 [2,] 1982842.8 19766730 17430209 8084282.7 1357883.8 13506353 11915748 [3,] 1746581.8 17430209 15376070 7138194.2 1196134.9 11911149 10512490 [4,] 807261.9 8084283 7138194 3323204.7 552932.9 5526861 4882154 [5,] 135628.5 1357884 1196135 552932.9 271571.7 2713953 2391658 [6,] 1355943.5 13506353 11911149 5526860.9 2713953.0 26956003 23785984 [7,] 1194854.2 11915748 10512490 4882154.2 2391658.1 23785984 20996480 [8,] 552860.7 5534816 4887292 2275676.3 1106862.5 11057031 9768028 [,8] [1,] 552860.7 [2,] 5534816.0 [3,] 4887292.4 [4,] 2275676.3 [5,] 1106862.5 [6,] 11057030.7 [7,] 9768027.6 [8,] 4554570.6 $code [1] 2 $iterations [1] 68 > > q.zero$estimate [1] -0.68124025 0.28686058 -0.15394903 -0.06173779 1.30304998 -0.25446946 [7] 0.14531856 0.05349810 Quite different from what you get if one starts with AB's model 18 results of Table II. I have also used Gretl's FIML procedure to estimate the model (I don't know what starting values Gretl uses). The results are very similar to those obtained with nlm as above. Berend -- View this message in context: http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-tp3502516p3512807.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Wow that is really interesting, Sorry I was asleep when you emailed these. And yes, of course, I had been trying to implement model 18, not 18s, that was a typo, sorry. I will have a look at the code you posted. Thanks, Alex On 10 May 2011 02:18, peter dalgaard wrote: > Hmm, I tried replacing the x's in the model with their principal component > scores, and suddenly everything converges as a greased lightning: > >> Z <- princomp(cbind(x1,x2,x3))$scores >> Z <- as.data.frame(princomp(cbind(x1,x2,x3))$scores) >> names(Z)<- letters[1:3] >> optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=Z$a, x2=Z$b, x3=Z$c, method="BFGS") > > par > 1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, > 0.18045177, -0.23657327 > fvalues method fns grs itns conv KKT1 KKT2 xtimes > 1 -76.73878 BFGS 157 37 NULL 0 TRUE TRUE 0.055 > Warning messages: > 1: In max(logpar) : no non-missing arguments to max; returning -Inf > 2: In min(logpar) : no non-missing arguments to min; returning Inf > > The likelihood appears to be spot on, but, obviously, the parameter estimates > need to be back-transformed to the original x1,x2,x3 system. I'll leave that > as the proverbial "exercise for the reader"... > > However, the whole thing leads me to a suspicion that maybe our numerical > gradients are misbehaving in cases with high local collinearity? > > -pd > > On May 9, 2011, at 13:40 , Alex Olssen wrote: > >> Hi Mike, >> >> Mike said >> "is this it, page 1559?" >> >> That is the front page yes, page 15*6*9 has the table, of which the >> model labelled 18s is the one I replicated. >> >> "did you actually cut/paste code anywhere and is your first >> coefficient -.19 or -.019? >> Presumably typos would be one possible problem." >> >> -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata >> estimation - as far as I can tell cutting and pasting is not the >> problem. >> >> "generally it helps if we could at least see the equations to check your >> code against typos ( note page number ?) in lnl that may fix part of the >> mystery. Is full text available >> on author's site, doesn't come up on citeseer AFAICT," >> >> Unfortunately I do not know of any place to get the full version of >> this paper that doesn't require access to a database such as JSTOR. >> The fact that this likelihood function reproduces the published >> results in Stata makes me confident that it is correct - I also have >> read a lot on systems estimation and it is a pretty standard >> likelihood function. >> >> "I guess one question would be " what is beta" in lnl supposed to be - >> it isn't used anywhere but I will also mentioned I'm not that familiar >> with the R code ( I'm trying to work through this to learn R and the >> optimizers). >> >> maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are >> e1 and e2 supposed to be?" >> >> The code above is certainly not as elegant as it could be - in this >> case the beta is rather superfluous. It is just a vector to hold >> parameters - but since I called all the parameters theta anyway there >> is no need for it. e1 and e2 are the residuals from the first and >> second equations of the system. Sigma is a 2x2 matrix which is the >> outer product of the two vectors of residuals. >> >> Kind regards, >> >> Alex >> >> >> >> On 9 May 2011 23:12, Mike Marchywka wrote: >>> >>> >>> >>> >>> >>> >>> >>>> Date: Mon, 9 May 2011 22:06:38 +1200 >>>> From: alex.ols...@gmail.com >>>> To: pda...@gmail.com >>>> CC: r-help@r-project.org; da...@otter-rsch.com >>>> Subject: Re: [R] maximum likelihood convergence reproducing Anderson >>>> Blundell 1982 Econometrica R vs Stata >>>> >>>> Peter said >>>> >>>> "Ahem! You might get us interested in your problem, but not to the >>>> level that we are going to install Stata and Tsp and actually dig out >>>> and study the scientific paper you are talking about. Please cite the >>>> results and explain the differences." >>>> >>>> Apologies Peter, will do, >>>> >>>> The results which I can emulate in Stata b
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Hmm, I tried replacing the x's in the model with their principal component scores, and suddenly everything converges as a greased lightning: > Z <- princomp(cbind(x1,x2,x3))$scores > Z <- as.data.frame(princomp(cbind(x1,x2,x3))$scores) > names(Z)<- letters[1:3] > optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2, + x1=Z$a, x2=Z$b, x3=Z$c, method="BFGS") par 1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, 0.18045177, -0.23657327 fvalues method fns grs itns conv KKT1 KKT2 xtimes 1 -76.73878 BFGS 157 37 NULL0 TRUE TRUE 0.055 Warning messages: 1: In max(logpar) : no non-missing arguments to max; returning -Inf 2: In min(logpar) : no non-missing arguments to min; returning Inf The likelihood appears to be spot on, but, obviously, the parameter estimates need to be back-transformed to the original x1,x2,x3 system. I'll leave that as the proverbial "exercise for the reader"... However, the whole thing leads me to a suspicion that maybe our numerical gradients are misbehaving in cases with high local collinearity? -pd On May 9, 2011, at 13:40 , Alex Olssen wrote: > Hi Mike, > > Mike said > "is this it, page 1559?" > > That is the front page yes, page 15*6*9 has the table, of which the > model labelled 18s is the one I replicated. > > "did you actually cut/paste code anywhere and is your first > coefficient -.19 or -.019? > Presumably typos would be one possible problem." > > -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata > estimation - as far as I can tell cutting and pasting is not the > problem. > > "generally it helps if we could at least see the equations to check your > code against typos ( note page number ?) in lnl that may fix part of the > mystery. Is full text available > on author's site, doesn't come up on citeseer AFAICT," > > Unfortunately I do not know of any place to get the full version of > this paper that doesn't require access to a database such as JSTOR. > The fact that this likelihood function reproduces the published > results in Stata makes me confident that it is correct - I also have > read a lot on systems estimation and it is a pretty standard > likelihood function. > > "I guess one question would be " what is beta" in lnl supposed to be - > it isn't used anywhere but I will also mentioned I'm not that familiar > with the R code ( I'm trying to work through this to learn R and the > optimizers). > > maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are > e1 and e2 supposed to be?" > > The code above is certainly not as elegant as it could be - in this > case the beta is rather superfluous. It is just a vector to hold > parameters - but since I called all the parameters theta anyway there > is no need for it. e1 and e2 are the residuals from the first and > second equations of the system. Sigma is a 2x2 matrix which is the > outer product of the two vectors of residuals. > > Kind regards, > > Alex > > > > On 9 May 2011 23:12, Mike Marchywka wrote: >> >> >> >> >> >> >> >>> Date: Mon, 9 May 2011 22:06:38 +1200 >>> From: alex.ols...@gmail.com >>> To: pda...@gmail.com >>> CC: r-help@r-project.org; da...@otter-rsch.com >>> Subject: Re: [R] maximum likelihood convergence reproducing Anderson >>> Blundell 1982 Econometrica R vs Stata >>> >>> Peter said >>> >>> "Ahem! You might get us interested in your problem, but not to the >>> level that we are going to install Stata and Tsp and actually dig out >>> and study the scientific paper you are talking about. Please cite the >>> results and explain the differences." >>> >>> Apologies Peter, will do, >>> >>> The results which I can emulate in Stata but not (yet) in R are reported >>> below. >> >> did you actually cut/paste code anywhere and is your first coefficient -.19 >> or -.019? >> Presumably typos would be one possible problem. >> >>> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569 >> >> >> is this it, page 1559? >> >> http://www.jstor.org/pss/1913396 >> >> generally it helps if we could at least see the equations to check your >> code against typos ( note page number ?) in lnl that may fix part of the >> mystery. Is full text available >> on author's site, doesn't come
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
On May 9, 2011, at 13:40 , Alex Olssen wrote: > Hi Mike, > > Mike said > "is this it, page 1559?" > > That is the front page yes, page 15*6*9 has the table, of which the > model labelled 18s is the one I replicated. > However, the R code you posted will at best replicate model 18. For 18s, you need to impose symmetry conditions. The fit of model 18 does seem to give similar results to those in Table II, but not exactly. Even more puzzling, if you start from the published coefficients, nlm seems to be getting stuck at a non-concave point of the negative likelihood. > nlm(p=ab, f=lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, gradtol=1e-15) $minimum [1] -76.60681 $estimate [1] -0.64701504 0.27284992 -0.13713084 -0.07205753 1.28200342 -0.24596766 0.13502787 0.06001181 $gradient [1] 2.782425 12.648629 -1.830244 -29.296795 -2.577734 -4.954042 1.495517 11.016190 $hessian [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 197435.9 1977197 1741724 805132.7 135821.9 1358013 1196714 553746 [2,] 1977197.4 19712076 17383474 8064540.2 1359856.0 13527720 11935305 5544903 [3,] 1741724.0 17383474 15336008 7121047.7 1197868.5 11929667 10529439 4896008 [4,] 805132.7 8064540 7121048 3315377.8 553733.1 5534923 4889523 2279404 [5,] 135821.9 1359856 1197868 553733.1 271511.7 2713733 2391332 1106572 [6,] 1358012.7 13527720 11929667 5534923.0 2713732.9 26955636 23784039 11054174 [7,] 1196714.3 11935305 10529439 4889522.6 2391331.9 23784039 20993413 9765022 [8,] 553746.0 5544903 4896008 2279404.1 1106571.9 11054174 9765022 4552693 $code [1] 2 $iterations [1] 4 -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
> Date: Mon, 9 May 2011 22:06:38 +1200 > From: alex.ols...@gmail.com > To: pda...@gmail.com > CC: r-help@r-project.org; da...@otter-rsch.com > Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell > 1982 Econometrica R vs Stata > > Peter said > > "Ahem! You might get us interested in your problem, but not to the > level that we are going to install Stata and Tsp and actually dig out > and study the scientific paper you are talking about. Please cite the > results and explain the differences." > > Apologies Peter, will do, > > The results which I can emulate in Stata but not (yet) in R are reported > below. did you actually cut/paste code anywhere and is your first coefficient -.19 or -.019? Presumably typos would be one possible problem. > They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569 is this it, page 1559? http://www.jstor.org/pss/1913396 generally it helps if we could at least see the equations to check your code against typos ( note page number ?) in lnl that may fix part of the mystery. Is full text available on author's site, doesn't come up on citeseer AFAICT, http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate I guess one question would be " what is beta" in lnl supposed to be - it isn't used anywhere but I will also mentioned I'm not that familiar with the R code ( I'm trying to work through this to learn R and the optimizers). maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are e1 and e2 supposed to be? > > TABLE II - model 18s > > coef std err > p10 -0.19 0.078 > p11 0.220 0.019 > p12 -0.148 0.021 > p13 -0.072 > p20 0.893 0.072 > p21 -0.148 > p22 0.050 0.035 > p23 0.098 > > The results which I produced in Stata are reported below. > I spent the last hour rewriting the code to reproduce this - since I > am now at home and not at work :( > My results are "identical" to those published. The estimates are for > a 3 equation symmetrical singular system. > I have not bothered to report symmetrical results and have backed out > an extra estimate using adding up constraints. > I have also backed out all standard errors using the delta method. > > . ereturn display > -- > | Coef. Std. Err. z P>|z| [95% Conf. Interval] > -+ > a | > a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664 > a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655 > a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275 > -+ > g | > g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223 > g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378 > g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089 > g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034 > g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803 > g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226 > -- > > In R I cannot get results like this - I think it is probably to do > with my inability at using the optimisers well. > Any pointers would be appreciated. > > Peter said "Are we maximizing over the same parameter space? You say > that the estimates from the paper gives a log-likelihood of 54.04, but > the exact solution clocked in at 76.74, which in my book is rather > larger." > > I meant +54.04 > -76.74. It is quite common to get positive > log-likelihoods in these system estimation. > > Kind regards, > > Alex > > On 9 May 2011 19:04, peter dalgaard wrote: > > > > On May 9, 2011, at 06:07 , Alex Olssen wrote: > > > >> Thank you all for your input. > >> > >> Unfortunately my problem is not yet resolved. Before I respond to > >> individual comments I make a clarification: > >> > >> In Stata, using the same likelihood function as above, I can reproduce > >> EXACTLY (to 3 decimal places or more, which is exactly considering I > >> am using different software) the results from model 8 of the paper. > >> > >> I take this as an indication that I am using the same likelihood > >> function as the authors, and that it does indeed work. > >> The reason I am trying to estimate the model in R is because while > >> Stata reproduces model 8 perfectly it has convergence > >> difficulties for some of the other models. > >> > >> Peter Dalgaard, > >> > >> &qu
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Hi Mike, Mike said "is this it, page 1559?" That is the front page yes, page 15*6*9 has the table, of which the model labelled 18s is the one I replicated. "did you actually cut/paste code anywhere and is your first coefficient -.19 or -.019? Presumably typos would be one possible problem." -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata estimation - as far as I can tell cutting and pasting is not the problem. "generally it helps if we could at least see the equations to check your code against typos ( note page number ?) in lnl that may fix part of the mystery. Is full text available on author's site, doesn't come up on citeseer AFAICT," Unfortunately I do not know of any place to get the full version of this paper that doesn't require access to a database such as JSTOR. The fact that this likelihood function reproduces the published results in Stata makes me confident that it is correct - I also have read a lot on systems estimation and it is a pretty standard likelihood function. "I guess one question would be " what is beta" in lnl supposed to be - it isn't used anywhere but I will also mentioned I'm not that familiar with the R code ( I'm trying to work through this to learn R and the optimizers). maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are e1 and e2 supposed to be?" The code above is certainly not as elegant as it could be - in this case the beta is rather superfluous. It is just a vector to hold parameters - but since I called all the parameters theta anyway there is no need for it. e1 and e2 are the residuals from the first and second equations of the system. Sigma is a 2x2 matrix which is the outer product of the two vectors of residuals. Kind regards, Alex On 9 May 2011 23:12, Mike Marchywka wrote: > > > > > > > >> Date: Mon, 9 May 2011 22:06:38 +1200 >> From: alex.ols...@gmail.com >> To: pda...@gmail.com >> CC: r-help@r-project.org; da...@otter-rsch.com >> Subject: Re: [R] maximum likelihood convergence reproducing Anderson >> Blundell 1982 Econometrica R vs Stata >> >> Peter said >> >> "Ahem! You might get us interested in your problem, but not to the >> level that we are going to install Stata and Tsp and actually dig out >> and study the scientific paper you are talking about. Please cite the >> results and explain the differences." >> >> Apologies Peter, will do, >> >> The results which I can emulate in Stata but not (yet) in R are reported >> below. > > did you actually cut/paste code anywhere and is your first coefficient -.19 > or -.019? > Presumably typos would be one possible problem. > >> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569 > > > is this it, page 1559? > > http://www.jstor.org/pss/1913396 > > generally it helps if we could at least see the equations to check your > code against typos ( note page number ?) in lnl that may fix part of the > mystery. Is full text available > on author's site, doesn't come up on citeseer AFAICT, > > > http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate > > I guess one question would be " what is beta" in lnl supposed to be - > it isn't used anywhere but I will also mentioned I'm not that familiar > with the R code ( I'm trying to work through this to learn R and the > optimizers). > > maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are > e1 and e2 supposed to be? > > > > >> >> TABLE II - model 18s >> >> coef std err >> p10 -0.19 0.078 >> p11 0.220 0.019 >> p12 -0.148 0.021 >> p13 -0.072 >> p20 0.893 0.072 >> p21 -0.148 >> p22 0.050 0.035 >> p23 0.098 >> >> The results which I produced in Stata are reported below. >> I spent the last hour rewriting the code to reproduce this - since I >> am now at home and not at work :( >> My results are "identical" to those published. The estimates are for >> a 3 equation symmetrical singular system. >> I have not bothered to report symmetrical results and have backed out >> an extra estimate using adding up constraints. >> I have also backed out all standard errors using the delta method. >> >> . ereturn display >> -- >> | Coef. Std. Err. z P>|z| [95% Conf. Interval] >> -+ >> a | >> a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664 >> a2
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Peter said "Ahem! You might get us interested in your problem, but not to the level that we are going to install Stata and Tsp and actually dig out and study the scientific paper you are talking about. Please cite the results and explain the differences." Apologies Peter, will do, The results which I can emulate in Stata but not (yet) in R are reported below. They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569 TABLE II - model 18s coef std err p10 -0.19 0.078 p11 0.2200.019 p12 -0.148 0.021 p13 -0.072 p20 0.8930.072 p21 -0.148 p22 0.0500.035 p23 0.098 The results which I produced in Stata are reported below. I spent the last hour rewriting the code to reproduce this - since I am now at home and not at work :( My results are "identical" to those published. The estimates are for a 3 equation symmetrical singular system. I have not bothered to report symmetrical results and have backed out an extra estimate using adding up constraints. I have also backed out all standard errors using the delta method. . ereturn display -- | Coef. Std. Err. zP>|z| [95% Conf. Interval] -+ a| a1 | -.0188115 .0767759-0.25 0.806-.1692895.1316664 a2 | .8926598 .070406812.68 0.000 .75466511.030655 a3 | .1261517 .0590193 2.14 0.033 .010476.2418275 -+ g| g11 | .2199442 .018407511.95 0.000 .183866.2560223 g12 | -.1476856 .0211982-6.97 0.000-.1892334 -.1061378 g13 | -.0722586 .0145154-4.98 0.000-.1007082 -.0438089 g22 | .0496865 .0348052 1.43 0.153-.0185305.1179034 g23 | .0979991 .0174397 5.62 0.000 .0638179.1321803 g33 | -.0257405 .0113869-2.26 0.024-.0480584 -.0034226 -- In R I cannot get results like this - I think it is probably to do with my inability at using the optimisers well. Any pointers would be appreciated. Peter said "Are we maximizing over the same parameter space? You say that the estimates from the paper gives a log-likelihood of 54.04, but the exact solution clocked in at 76.74, which in my book is rather larger." I meant +54.04 > -76.74. It is quite common to get positive log-likelihoods in these system estimation. Kind regards, Alex On 9 May 2011 19:04, peter dalgaard wrote: > > On May 9, 2011, at 06:07 , Alex Olssen wrote: > >> Thank you all for your input. >> >> Unfortunately my problem is not yet resolved. Before I respond to >> individual comments I make a clarification: >> >> In Stata, using the same likelihood function as above, I can reproduce >> EXACTLY (to 3 decimal places or more, which is exactly considering I >> am using different software) the results from model 8 of the paper. >> >> I take this as an indication that I am using the same likelihood >> function as the authors, and that it does indeed work. >> The reason I am trying to estimate the model in R is because while >> Stata reproduces model 8 perfectly it has convergence >> difficulties for some of the other models. >> >> Peter Dalgaard, >> >> "Better starting values would help. In this case, almost too good >> values are available: >> >> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) >> >> which appears to be the _exact_ solution." >> >> Thanks for the suggestion. Using these starting values produces the >> exact estimate that Dave Fournier emailed me. >> If these are the exact solution then why did the author publish >> different answers which are completely reproducible in >> Stata and Tsp? > > > Ahem! You might get us interested in your problem, but not to the level that > we are going to install Stata and Tsp and actually dig out and study the > scientific paper you are talking about. Please cite the results and explain > the differences. > > Are we maximizing over the same parameter space? You say that the estimates > from the paper gives a log-likelihood of 54.04, but the exact solution > clocked in at 76.74, which in my book is rather larger. > > Confused > > -p > > >> >> Ravi, >> >> Thanks for introducing optimx to me, I am new to R. I completely >> agree that you can get higher log-likelihood values >> than what those obtained with optim and the starting values suggested >> by Peter. In fact, in Stata, when I reproduce >> the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139. >> >> Furthermore if I estimate model 8 without symmetry imposed on the >> system I reproduce the Likelihood Ratio reported >> in the paper to 3 deci
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
On May 9, 2011, at 06:07 , Alex Olssen wrote: > Thank you all for your input. > > Unfortunately my problem is not yet resolved. Before I respond to > individual comments I make a clarification: > > In Stata, using the same likelihood function as above, I can reproduce > EXACTLY (to 3 decimal places or more, which is exactly considering I > am using different software) the results from model 8 of the paper. > > I take this as an indication that I am using the same likelihood > function as the authors, and that it does indeed work. > The reason I am trying to estimate the model in R is because while > Stata reproduces model 8 perfectly it has convergence > difficulties for some of the other models. > > Peter Dalgaard, > > "Better starting values would help. In this case, almost too good > values are available: > > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) > > which appears to be the _exact_ solution." > > Thanks for the suggestion. Using these starting values produces the > exact estimate that Dave Fournier emailed me. > If these are the exact solution then why did the author publish > different answers which are completely reproducible in > Stata and Tsp? Ahem! You might get us interested in your problem, but not to the level that we are going to install Stata and Tsp and actually dig out and study the scientific paper you are talking about. Please cite the results and explain the differences. Are we maximizing over the same parameter space? You say that the estimates from the paper gives a log-likelihood of 54.04, but the exact solution clocked in at 76.74, which in my book is rather larger. Confused -p > > Ravi, > > Thanks for introducing optimx to me, I am new to R. I completely > agree that you can get higher log-likelihood values > than what those obtained with optim and the starting values suggested > by Peter. In fact, in Stata, when I reproduce > the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139. > > Furthermore if I estimate model 8 without symmetry imposed on the > system I reproduce the Likelihood Ratio reported > in the paper to 3 decimal places as well, suggesting that the > log-likelihoods I am reporting differ from those in the paper > only due to a constant. > > Thanks for your comments, > > I am still highly interested in knowing why the results of the > optimisation in R are so different to those in Stata? > > I might try making my convergence requirements more stringent. > > Kind regards, > > Alex -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Thank you all for your input. Unfortunately my problem is not yet resolved. Before I respond to individual comments I make a clarification: In Stata, using the same likelihood function as above, I can reproduce EXACTLY (to 3 decimal places or more, which is exactly considering I am using different software) the results from model 8 of the paper. I take this as an indication that I am using the same likelihood function as the authors, and that it does indeed work. The reason I am trying to estimate the model in R is because while Stata reproduces model 8 perfectly it has convergence difficulties for some of the other models. Peter Dalgaard, "Better starting values would help. In this case, almost too good values are available: start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution." Thanks for the suggestion. Using these starting values produces the exact estimate that Dave Fournier emailed me. If these are the exact solution then why did the author publish different answers which are completely reproducible in Stata and Tsp? Ravi, Thanks for introducing optimx to me, I am new to R. I completely agree that you can get higher log-likelihood values than what those obtained with optim and the starting values suggested by Peter. In fact, in Stata, when I reproduce the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139. Furthermore if I estimate model 8 without symmetry imposed on the system I reproduce the Likelihood Ratio reported in the paper to 3 decimal places as well, suggesting that the log-likelihoods I am reporting differ from those in the paper only due to a constant. Thanks for your comments, I am still highly interested in knowing why the results of the optimisation in R are so different to those in Stata? I might try making my convergence requirements more stringent. Kind regards, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
On May 7, 2011, at 17:51 , Ravi Varadhan wrote: > There is something strange in this problem. I think the log-likelihood is > incorrect. See the results below from "optimx". You can get much larger > log-likelihood values than for the exact solution that Peter provided. > > ## model 18 > lnl <- function(theta,y1, y2, x1, x2, x3) { > n <- length(y1) > beta <- theta[1:8] > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 > e <- cbind(e1, e2) > sigma <- t(e)%*%e > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is > something wrong here > return(-logl) > } > > data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",") > > attach(data) > > require(optimx) > > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) > > # the warnings can be safely ignored in the "optimx" calls > p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) > > p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) > > p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) > Hm? I get stuff like below (for p3). Some of the entries have a considerably larger NEGATIVE log likelihood, but that's hardly a problem with the likelihood per se. fvalues method fns grs itns conv KKT1 KKT2 xtimes 12 8.988466e+307 newuoaNA NA NA NANA 0.002 11 8.988466e+307 bobyqaNA NA NA NANA 0.001 3 23.66768 Nelder-Mead 1501 NA NULL1 FALSE FALSE 0.18 7 -51.76068 spg 1925 NA 15011 FALSE FALSE 2.322 4 -55.78708L-BFGS-B 2093 2093 NULL0 FALSE FALSE 4.176 2 -70.57023 CG 5360 1501 NULL1 FALSE TRUE 3.465 1 -70.66286BFGS 21481 1500 NULL1 FALSE TRUE 5.383 8 -76.73765 ucminf 1500 1500 NULL0 FALSE TRUE 0.067 9 -76.73871 Rcgmin 2434 867 NULL0 FALSE TRUE 1.514 10 -76.73877 Rvmmin 231 45 NULL0 FALSE TRUE 0.101 6 -76.73878 nlminb 130 581 670 TRUE TRUE 0.085 5 -76.73878 nlmNA NA 460 TRUE TRUE 0.058 I must admit that I didn't check the likelihood in detail, but minimizing the determinant of the residual SSD matrix is generally what you end up doing in this sort of models. (Of course, you can safely lose the constants, and you can also think up more stable ways of computing the log-determinant, but it's only 2x2 for cryin' out loud.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
There is something strange in this problem. I think the log-likelihood is incorrect. See the results below from "optimx". You can get much larger log-likelihood values than for the exact solution that Peter provided. ## model 18 lnl <- function(theta,y1, y2, x1, x2, x3) { n <- length(y1) beta <- theta[1:8] e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e <- cbind(e1, e2) sigma <- t(e)%*%e logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is something wrong here return(-logl) } data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",") attach(data) require(optimx) start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) # the warnings can be safely ignored in the "optimx" calls p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) Ravi. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of peter dalgaard [pda...@gmail.com] Sent: Saturday, May 07, 2011 4:46 AM To: Alex Olssen Cc: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata On May 6, 2011, at 14:29 , Alex Olssen wrote: > Dear R-help, > > I am trying to reproduce some results presented in a paper by Anderson > and Blundell in 1982 in Econometrica using R. > The estimation I want to reproduce concerns maximum likelihood > estimation of a singular equation system. > I can estimate the static model successfully in Stata but for the > dynamic models I have difficulty getting convergence. > My R program which uses the same likelihood function as in Stata has > convergence properties even for the static case. > > I have copied my R program and the data below. I realise the code > could be made more elegant - but it is short enough. > > Any ideas would be highly appreciated. Better starting values would help. In this case, almost too good values are available: start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Apart from that, it seems that the conjugate gradient methods have difficulties with this likelihood, for some less than obvious reason. Increasing the maxit gets you closer but still not satisfactory. I would suggest trying out the experimental optimx package. Apparently, some of the algorithms in there are much better at handling this likelihood, notably "nlm" and "nlminb". > > ## model 18 > lnl <- function(theta,y1, y2, x1, x2, x3) { > n <- length(y1) > beta <- theta[1:8] > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 > e <- cbind(e1, e2) > sigma <- t(e)%*%e > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) > return(-logl) > } > p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2, > x1=x1, x2=x2, x3=x3) > > "year","y1","y2","x1","x2","x3" > 1929,0.554779,0.266051,9.87415,8.60371,3.75673 > 1930,0.516336,0.297473,9.68621,8.50492,3.80692 > 1931,0.508201,0.324199,9.4701,8.27596,3.80437 > 1932,0.500482,0.33958,9.24692,7.99221,3.76251 > 1933,0.501695,0.276974,9.35356,7.98968,3.69071 > 1934,0.591426,0.287008,9.42084,8.0362,3.63564 > 1935,0.565047,0.244096,9.53972,8.15803,3.59285 > 1936,0.605954,0.239187,9.6914,8.32009,3.56678 > 1937,0.620161,0.218232,9.76817,8.42001,3.57381 > 1938,0.592091,0.243161,9.51295,8.19771,3.6024 > 1939,0.613115,0.217042,9.68047,8.30987,3.58147 > 1940,0.632455,0.215269,9.78417,8.49624,3.57744 > 1941,0.663139,0.184409,10.0606,8.69868,3.6095 > 1942,0.698179,0.164348,10.2892,8.84523,3.4 > 1943,0.70459,0.146865,10.4731,8.93024,3.65388 > 1944,0.694067,0.161722,10.4465,8.96044,3.62434 > 1945,0.674668,0.197231,10.279,8.82522,3.61489 > 1946,0.635916,0.204232,10.1536,8.77547,3.67562 > 1947,0.642855,0.187224,10.2053,8.77481,3.82632 > 1948,0.641063,0.186566,10.2227,8.83821,3.96038 > 1949,0.646317,0.203646,10.1127,8.82364,4.0447 > 1950,0.645476,0.187497,10.2067,8.84161,4.08128 > 1951,0.63803,0.197361,10.2773,8.9401,4.10951 > 1952,0.634626,0.209992,10.283,9.01603,4.1693 > 1953,0.631144,0.219287,10.3217,9.06317,4.21727 > 1954,0.593088,0.235335,10.2101,9.05664,4.2567 > 1955,0.60
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
On May 6, 2011, at 14:29 , Alex Olssen wrote: > Dear R-help, > > I am trying to reproduce some results presented in a paper by Anderson > and Blundell in 1982 in Econometrica using R. > The estimation I want to reproduce concerns maximum likelihood > estimation of a singular equation system. > I can estimate the static model successfully in Stata but for the > dynamic models I have difficulty getting convergence. > My R program which uses the same likelihood function as in Stata has > convergence properties even for the static case. > > I have copied my R program and the data below. I realise the code > could be made more elegant - but it is short enough. > > Any ideas would be highly appreciated. Better starting values would help. In this case, almost too good values are available: start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Apart from that, it seems that the conjugate gradient methods have difficulties with this likelihood, for some less than obvious reason. Increasing the maxit gets you closer but still not satisfactory. I would suggest trying out the experimental optimx package. Apparently, some of the algorithms in there are much better at handling this likelihood, notably "nlm" and "nlminb". > > ## model 18 > lnl <- function(theta,y1, y2, x1, x2, x3) { > n <- length(y1) > beta <- theta[1:8] > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 > e <- cbind(e1, e2) > sigma <- t(e)%*%e > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) > return(-logl) > } > p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2, > x1=x1, x2=x2, x3=x3) > > "year","y1","y2","x1","x2","x3" > 1929,0.554779,0.266051,9.87415,8.60371,3.75673 > 1930,0.516336,0.297473,9.68621,8.50492,3.80692 > 1931,0.508201,0.324199,9.4701,8.27596,3.80437 > 1932,0.500482,0.33958,9.24692,7.99221,3.76251 > 1933,0.501695,0.276974,9.35356,7.98968,3.69071 > 1934,0.591426,0.287008,9.42084,8.0362,3.63564 > 1935,0.565047,0.244096,9.53972,8.15803,3.59285 > 1936,0.605954,0.239187,9.6914,8.32009,3.56678 > 1937,0.620161,0.218232,9.76817,8.42001,3.57381 > 1938,0.592091,0.243161,9.51295,8.19771,3.6024 > 1939,0.613115,0.217042,9.68047,8.30987,3.58147 > 1940,0.632455,0.215269,9.78417,8.49624,3.57744 > 1941,0.663139,0.184409,10.0606,8.69868,3.6095 > 1942,0.698179,0.164348,10.2892,8.84523,3.4 > 1943,0.70459,0.146865,10.4731,8.93024,3.65388 > 1944,0.694067,0.161722,10.4465,8.96044,3.62434 > 1945,0.674668,0.197231,10.279,8.82522,3.61489 > 1946,0.635916,0.204232,10.1536,8.77547,3.67562 > 1947,0.642855,0.187224,10.2053,8.77481,3.82632 > 1948,0.641063,0.186566,10.2227,8.83821,3.96038 > 1949,0.646317,0.203646,10.1127,8.82364,4.0447 > 1950,0.645476,0.187497,10.2067,8.84161,4.08128 > 1951,0.63803,0.197361,10.2773,8.9401,4.10951 > 1952,0.634626,0.209992,10.283,9.01603,4.1693 > 1953,0.631144,0.219287,10.3217,9.06317,4.21727 > 1954,0.593088,0.235335,10.2101,9.05664,4.2567 > 1955,0.60736,0.227035,10.272,9.07566,4.29193 > 1956,0.607204,0.246631,10.2743,9.12407,4.32252 > 1957,0.586994,0.256784,10.2396,9.1588,4.37792 > 1958,0.548281,0.271022,10.1248,9.14025,4.42641 > 1959,0.553401,0.261815,10.2012,9.1598,4.4346 > 1960,0.552105,0.275137,10.1846,9.19297,4.43173 > 1961,0.544133,0.280783,10.1479,9.19533,4.44407 > 1962,0.55382,0.281286,10.197,9.21544,4.45074 > 1963,0.549951,0.28303,10.2036,9.22841,4.46403 > 1964,0.547204,0.291287,10.2271,9.23954,4.48447 > 1965,0.55511,0.281313,10.2882,9.26531,4.52057 > 1966,0.558182,0.280151,10.353,9.31675,4.58156 > 1967,0.545735,0.294385,10.3351,9.35382,4.65983 > 1968,0.538964,0.294593,10.3525,9.38361,4.71804 > 1969,0.542764,0.299927,10.3676,9.40725,4.76329 > 1970,0.534595,0.315319,10.2968,9.39139,4.81136 > 1971,0.545591,0.315828,10.2592,9.34121,4.84082 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Dear R-help, I am trying to reproduce some results presented in a paper by Anderson and Blundell in 1982 in Econometrica using R. The estimation I want to reproduce concerns maximum likelihood estimation of a singular equation system. I can estimate the static model successfully in Stata but for the dynamic models I have difficulty getting convergence. My R program which uses the same likelihood function as in Stata has convergence properties even for the static case. I have copied my R program and the data below. I realise the code could be made more elegant - but it is short enough. Any ideas would be highly appreciated. ## model 18 lnl <- function(theta,y1, y2, x1, x2, x3) { n <- length(y1) beta <- theta[1:8] e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e <- cbind(e1, e2) sigma <- t(e)%*%e logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) return(-logl) } p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3) "year","y1","y2","x1","x2","x3" 1929,0.554779,0.266051,9.87415,8.60371,3.75673 1930,0.516336,0.297473,9.68621,8.50492,3.80692 1931,0.508201,0.324199,9.4701,8.27596,3.80437 1932,0.500482,0.33958,9.24692,7.99221,3.76251 1933,0.501695,0.276974,9.35356,7.98968,3.69071 1934,0.591426,0.287008,9.42084,8.0362,3.63564 1935,0.565047,0.244096,9.53972,8.15803,3.59285 1936,0.605954,0.239187,9.6914,8.32009,3.56678 1937,0.620161,0.218232,9.76817,8.42001,3.57381 1938,0.592091,0.243161,9.51295,8.19771,3.6024 1939,0.613115,0.217042,9.68047,8.30987,3.58147 1940,0.632455,0.215269,9.78417,8.49624,3.57744 1941,0.663139,0.184409,10.0606,8.69868,3.6095 1942,0.698179,0.164348,10.2892,8.84523,3.4 1943,0.70459,0.146865,10.4731,8.93024,3.65388 1944,0.694067,0.161722,10.4465,8.96044,3.62434 1945,0.674668,0.197231,10.279,8.82522,3.61489 1946,0.635916,0.204232,10.1536,8.77547,3.67562 1947,0.642855,0.187224,10.2053,8.77481,3.82632 1948,0.641063,0.186566,10.2227,8.83821,3.96038 1949,0.646317,0.203646,10.1127,8.82364,4.0447 1950,0.645476,0.187497,10.2067,8.84161,4.08128 1951,0.63803,0.197361,10.2773,8.9401,4.10951 1952,0.634626,0.209992,10.283,9.01603,4.1693 1953,0.631144,0.219287,10.3217,9.06317,4.21727 1954,0.593088,0.235335,10.2101,9.05664,4.2567 1955,0.60736,0.227035,10.272,9.07566,4.29193 1956,0.607204,0.246631,10.2743,9.12407,4.32252 1957,0.586994,0.256784,10.2396,9.1588,4.37792 1958,0.548281,0.271022,10.1248,9.14025,4.42641 1959,0.553401,0.261815,10.2012,9.1598,4.4346 1960,0.552105,0.275137,10.1846,9.19297,4.43173 1961,0.544133,0.280783,10.1479,9.19533,4.44407 1962,0.55382,0.281286,10.197,9.21544,4.45074 1963,0.549951,0.28303,10.2036,9.22841,4.46403 1964,0.547204,0.291287,10.2271,9.23954,4.48447 1965,0.55511,0.281313,10.2882,9.26531,4.52057 1966,0.558182,0.280151,10.353,9.31675,4.58156 1967,0.545735,0.294385,10.3351,9.35382,4.65983 1968,0.538964,0.294593,10.3525,9.38361,4.71804 1969,0.542764,0.299927,10.3676,9.40725,4.76329 1970,0.534595,0.315319,10.2968,9.39139,4.81136 1971,0.545591,0.315828,10.2592,9.34121,4.84082 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.