Re: [R] genoud problem
Hi Shubha, genoud does not return the initial fit value. But you could easily obtain it by passing your starting values to your function directly. Alternatively, one can have genoud print out the entire initial population (or the entire population as is evolves), and one can then decide to report whatever summary of this one would like. Note that the best fit in generation zero is printed by default. See the project.path and print.level options for details. Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Shubha Vishwanath Karanth writes: Hi R users, genoud function of rgenoud package will optimize my function. If opt = genoud(fn,2,max=TRUE,starting.value=c(1,10),) opt$value will give the optimized value of the function, fn. My problem is from the same opt, can I get the value of the function at the initial parameter values? I need the initial value of the function for reporting purposes. BR, Shubha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Increasing precision of rgenoud solutions
Hi Paul, Solution.tolerance is the right way to increase precision. In your example, extra precision *is* being obtained, but it is just not displayed because the number of digits which get printed is controlled by the options(digits) variable. But the requested solution precision is in the object returned by genoud(). For example, if I run a - genoud(myfunc, nvars=2, Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.01) I get a$par [1] 0.7062930 0.7079196 But if I set options(digits=12), and without rerunning anything I check a$par again, I observe that: a$par [1] 0.706293049455 0.707919577559 Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Paul Smith writes: Dear All I am using rgenoud to solve the following maximization problem: myfunc - function(x) { x1 - x[1] x2 - x[2] if (x1^2+x2^2 1) return(-999) else x1+x2 } genoud(myfunc, nvars=2, Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.01) How can one increase the precision of the solution $par [1] 0.7072442 0.7069694 ? I have tried solution.tolerance but without a significant improvement. Any ideas? Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Increasing precision of rgenoud solutions
Hi Paul, I see. You want to increase the population size (pop.size) option---of lesser importance are the max.generations, wait.generations and P9 options. For more details, see http://sekhon.berkeley.edu/papers/rgenoudJSS.pdf. For example, if I run a - genoud(myfunc, nvars=2, Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.001, pop.size=6000, P9=50) options(digits=12) I obtain: #approx analytical solution sum(c(0.707106781186548,0.707106781186548)) [1] 1.41421356237 #genoud solution #a$value [1] 1.41421344205 #difference a$value-sum(c(0.707106781186548,0.707106781186548)) [1] -2.91195978441e-09 If that's not enough precision, increase the options (and the run-time). This would be faster with analytical derivatives. Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Paul Smith writes: Thanks, Jasjeet, for your reply, but maybe I was not enough clear. The analytical solution for the optimization problem is the pair (sqrt(2)/2,sqrt(2)/2), which, approximately, is (0.707106781186548,0.707106781186548). The solution provided by rgenoud, with solution.tolerance=0.1 was $par [1] 0.7090278 0.7051806 which is not very precise comparing with the values of the (analytical) solution. Is it possible to increase the degree of closeness of the rgenoud solutions with the analytical ones? Paul Paul Smith writes: Dear All I am using rgenoud to solve the following maximization problem: myfunc - function(x) { x1 - x[1] x2 - x[2] if (x1^2+x2^2 1) return(-999) else x1+x2 } genoud(myfunc, nvars=2, Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.01) How can one increase the precision of the solution $par [1] 0.7072442 0.7069694 ? I have tried solution.tolerance but without a significant improvement. Any ideas? Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bad optimization solution
The issue is that you are using a derivative based optimizer for a problem for which it is well known that such optimizers will not perform well. You should consider using a global optimizer. For example, rgenoud combines a genetic search algorithm with a BFGS optimizer and it works well for your problem: library(rgenoud) myfunc - function(x) { x1 - x[1] x2 - x[2] abs(x1-x2) } optim(c(0.5,0.5),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control=list(fnscale=-1)) genoud(myfunc, nvars=2, Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2) myfunc - function(x) { x1 - x[1] x2 - x[2] (x1-x2)^2 } optim(c(0.2,0.2),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control=list(fnscale=-1)) genoud(myfunc, nvars=2, Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2) Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Paul Smith writes: It seems that there is here a problem of reliability, as one never knows whether the solution provided by R is correct or not. In the case that I reported, it is fairly simple to see that the solution provided by R (without any warning!) is incorrect, but, in general, that is not so simple and one may take a wrong solution as a correct one. Paul On 5/8/07, Ravi Varadhan [EMAIL PROTECTED] wrote: Your function, (x1-x2)^2, has zero gradient at all the starting values such that x1 = x2, which means that the gradient-based search methods will terminate there because they have found a critical point, i.e. a point at which the gradient is zero (which can be a maximum or a minimum or a saddle point). However, I do not why optim converges to the boundary maximum, when analytic gradient is supplied (as shown by Sundar). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Monday, May 07, 2007 6:26 PM To: R-help Subject: Re: [R] Bad optimization solution On 5/7/07, Paul Smith [EMAIL PROTECTED] wrote: I think the problem is the starting point. I do not remember the details of the BFGS method, but I am almost sure the (.5, .5) starting point is suspect, since the abs function is not differentiable at 0. If you perturb the starting point even slightly you will have no problem. Paul Smith [EMAIL PROTECTED] To Sent by: R-help r-help@stat.math.ethz.ch [EMAIL PROTECTED] cc at.math.ethz.ch Subject [R] Bad optimization solution 05/07/2007 04:30 PM Dear All I am trying to perform the below optimization problem, but getting (0.5,0.5) as optimal solution, which is wrong; the correct solution should be (1,0) or (0,1). Am I doing something wrong? I am using R 2.5.0 on Fedora Core 6 (Linux). Thanks in advance, Paul -- myfunc - function(x) { x1 - x[1] x2 - x[2] abs(x1-x2) } optim(c(0.5,0.5),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control= list(fnscale=-1)) Yes, with (0.2,0.9), a correct solution comes out. However, how can one be sure in general that the solution obtained by optim is correct? In ?optim says: Method 'L-BFGS-B' is that of Byrd _et. al._ (1995) which allows _box constraints_, that is each variable can be given a lower and/or upper bound. The initial value must satisfy the constraints. This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning. which only demands that the initial value must satisfy the constraints. Furthermore, X^2 is everywhere differentiable and notwithstanding the reported problem occurs with
Re: [R] The confidence level of p-value of ks.boot
Hi Gala, The default p-value is the bootstrap p-value for the ks-test. Bootstrapping is highly recommended because the bootstrapped Kolmogorov-Smirnov test, unlike the standard test, provides correct coverage even when there are point masses in the distributions being compared. The bootstrap p-value is returned in the ks.boot.pvalue object; so in your example code ks.b$ks.boot.pvalue. And the results from the standard ks.test function are contained in the ks object--i.e., ks.b$ks. For the theorem of correct coverage even with point masses see: Abadie, Alberto. 2002. ``Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models.'' Journal of the American Statistical Association, 97:457 (March) 284-292. For the algorithm see: http://sekhon.berkeley.edu/papers/GenMatch.pdf Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Gala writes: Hello! I need to compare 2 datasets whether they come from the same distribution. I use function ks.boot{Matching}. And what is the confidence level of the p-value, returned by ks.boot function? The code is: set=read.table(http://stella.sai.msu.ru:8080/~gala/data/testsets.csv;, header=T,sep=',') set1=set[!is.na(set$set1),'set1'] set2=set[!is.na(set$set2),'set2'] library(Matching) ks.b=ks.boot(set1,set2,1000) ks.b Thank you! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multinomial logistic regression with equality constraints?
As we noted earlier and as is clearly stated in the docs, multinomRob is estimating an OVERDISPERSED multinomial model. And in your models here the overdispersion parameter is not identified; you need more observations. Walter pointed out using the print.level trick to get the coefs for the standard MNL model, but when the model the function is actually trying to estimate is not identified, that trick will not work. As I also previously noted, it is a simple matter to change the multinomMLE function to estimate the standard MNL model. Since you don't want to change that file and since nnet's multinom function doesn't have some functionality people need, we'll add a MLEonly function to multinomRob which will allow you to do what you want. We'll post a new version on my webpage later tonight: http://sekhon.berkeley.edu/robust. And after some testing, we'll forward the new version to CRAN. Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Roger Levy writes: Walter Mebane wrote: Roger, Error in if (logliklambda loglik) bvec - blambda : missing value where TRUE/FALSE needed In addition: Warning message: NaNs produced in: sqrt(sigma2GN) That message comes from the Newton algorithm (defined in source file multinomMLE.R). It would be better if we bullet-proofed it a bit more. The first thing is to check the data. I don't have the multinomLogis() function, so I can't run your code. Whoops, sorry about that -- I'm putting revised code at the end of the message. But do you really mean for(i in 1:length(choice)) { and dim(counts) - c(length(choice),length(choice)) Should that be for(i in 1:n) { and dim(counts) - c(n, length(choice)) or instead of n, some number m length(choice). As it is it seems to me you have three observations for three categories, which isn't going to work (there are five coefficient parameters, plus sigma for the dispersion). I really did mean for(i in 1:length(choice)) -- once again, the proper code is at the end of this message. Also, I notice that I get the same error with another kind of data, which works for multinom from nnet: library(nnet) library(multinomRob) dtf - data.frame(y1=c(1,1),y2=c(2,1),y3=c(1,2),x=c(0,1)) summary(multinom(as.matrix(dtf[,1:3]) ~ x, data=dtf)) summary(multinomRob(list(y1 ~ 0, y2 ~ x, y3 ~ x), data=dtf,print.level=128)) The call to multinom fits the following coefficients: Coefficients: (Intercept) x y2 0.6933809622 -0.6936052 y3 0.0001928603 0.6928327 but the call to multinomRob gives me the following error: multinomRob(): Grouped MNL Estimation [1] multinomMLE: -loglik initial: 9.48247391895106 Error in if (logliklambda loglik) bvec - blambda : missing value where TRUE/FALSE needed In addition: Warning message: NaNs produced in: sqrt(sigma2GN) Does this shed any light on things? Thanks again, Roger *** set.seed(10) library(multinomRob) multinomLogis - function(vector) { x - exp(vector) z - sum(x) x/z } n - 20 choice - c(A,B,C) intercepts - c(0.5,0.3,0.2) prime.strength - rep(0.4,length(intercepts)) counts - c() for(i in 1:length(choice)) { u - intercepts[1:length(choice)] u[i] - u[i] + prime.strength[i] counts - c(counts,rmultinomial(n = n, pr = multinomLogis(u))) } dim(counts) - c(length(choice),length(choice)) counts - t(counts) row.names(counts) - choice colnames(counts) - choice data - data.frame(Prev.Choice=choice,counts) for(i in 1:length(choice)) { data[[paste(last,choice[i],sep=.)]] - ifelse(data$Prev.Choice==choice[i],1,0) } multinomRob(list(A ~ last.A , B ~ last.B , C ~ last.C - 1 , ), data=data, print.level=128) I obtained this output: Your Model (xvec): A B C (Intercept)/(Intercept)/last.C 1 1 1 last.A/last.B/NA 1 1 0 [1] multinomRob: WARNING. Limited regressor variation... [1] WARNING. ... A regressor has a distinct value for only one observation. [1] WARNING. ... I'm using a modified estimation algorithm (i.e., preventing LQD [1] WARNING. ... from modifying starting values for the affected parameters). [1] WARNING. ... Affected parameters are TRUE in the following table. A B C (Intercept)/(Intercept)/last.C FALSE FALSE TRUE last.A/last.B/NA
Re: [R] multinomial logistic regression with equality constraints?
Hi Roger, Yes, multinomRob can handle equality constraints of this type---see the 'equality' option. But the function assumes that the outcomes are multinomial counts and it estimates overdispersed multinomial logistic models via MLE, a robust redescending-M estimator, and LQD which is another high breakdown point estimator. It would be a simple matter to edit the 'multinomMLE' function to work without counts and to do straight MNL instead, but right now it estimates an overdispersed MNL model. Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Roger Levy writes: I'm interested in doing multinomial logistic regression with equality constraints on some of the parameter values. For example, with categorical outcomes Y_1 (baseline), Y_2, and Y_3, and covariates X_1 and X_2, I might want to impose the equality constraint that \beta_{2,1} = \beta_{3,2} that is, that the effect of X_1 on the logit of Y_2 is the same as the effect of X_2 on the logit of Y_3. Is there an existing facility or package in R for doing this? Would multinomRob fit the bill? Many thanks, Roger -- Roger Levy Email: [EMAIL PROTECTED] Assistant Professor Phone: 858-534-7219 Department of Linguistics Fax: 858-534-4789 UC San DiegoWeb: http://ling.ucsd.edu/~rlevy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multinomial logistic regression with equality constraints?
Hi Roger, Walter's command is correct. To match the exact normalization used by nnet's multinom(), however, you would need to make the coefficients zero for the first class (i.e., y1) and not the last (i.e., y3). mr - multinomRob(list(y2 ~ x1 + x2, y3 ~ x1 + x2, y1~0),data=d, print.level=1) The results are: MNL Estimates: y1 y2 y3 NA/(Intercept)/(Intercept) 0 -0.6931462 -0.6931462 NA/x1/x10 1.3862936 0.6931474 NA/x2/x20 0.6931474 1.3862936 Compare to the output from nnet's multinom: summary(m1) Call: multinom(formula = y ~ x1 + x2, data = d) Coefficients: (Intercept)x1x2 b -0.6931475 1.3862975 0.6931499 c -0.6931475 0.6931499 1.3862975 Also, the MLE MNL objects are in: mr$mnl To constrain the x1 coeffs to be equal do: emr - multinomRob(list(y2 ~ x1 + x2,y3 ~ x1 + x2, y1~0),data=d, print.level=1, equality=list(list(y2~x1+0,y3~x1+0))) To constrain y2's x1 to be equal to y3's x2: emr2 - multinomRob(list(y2 ~ x1 + x2,y3 ~ x1 + x2, y1~0),data=d, print.level=1, equality=list(list(y2~x1+0,y3~x2+0))) See the multinomRob help file for more details: http://sekhon.berkeley.edu/robust/multinomRob.html Any BFGS warnings can be ignored because you are not interested in the robust estimates (they are comming from LQD estimation and require changing 'genoud.parms'). Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Walter Mebane writes: Roger, summary(multinomRob(list(y1 ~ x1 + x2,y2 ~ x1 + x2, y3 ~ 0),data=d, print.level=1)) Walter Mebane Roger Levy writes: Many thanks for pointing this out to me! I'm still a bit confused, however, as to how to use multinomRob. For example I tried to translate the following example using nnet: x1 - c(1,1,1,1,0,0,0,0,0,0,0,0) x2 - c(0,0,0,0,1,1,1,1,0,0,0,0) y - factor(c(a,b,b,c,a,b,c,c,a,a,b,c)) library(nnet) d - data.frame(x1,x2,y) summary(multinom(y ~ x1 + x2, data=d)) into multinomRob as follows: x1 - c(1,1,1,1,0,0,0,0,0,0,0,0) x2 - c(0,0,0,0,1,1,1,1,0,0,0,0) y - factor(c(a,b,b,c,a,b,c,c,a,a,b,c)) y1 - ifelse(y==a,1, 0) y2 - ifelse(y==b, 1, 0) y3 - ifelse(y==c, 1, 0) d - data.frame(x1,x2,y,y1,y2,y3) summary(multinomRob(list(y1 ~ x1 + x2,y2 ~ x1 + x2, y3 ~ x1 + x2),data=d)) but the last command gives me the error message: [1] multinomMLE: Hessian is not positive definite Error in obsformation %*% opg : non-conformable arguments though it's not obvious to me why. I also tried a couple other variants: summary(multinomRob(list(y1 ~ 0,y2 ~ x1 + x2,y3 ~ x1 + x2),data=d)) Error in multinomT(Yp = Yp, Xarray = X, xvec = xvec, jacstack = jacstack, : (multinomT): invalid specification of Xarray (regressors not allowed for last category summary(multinomRob(list(y1 ~ 0,y2 ~ x1 ,y3 ~ x2),data=d)) Error in multinomT(Yp = Yp, Xarray = X, xvec = xvec, jacstack = jacstack, : (multinomT): invalid specification of Xarray (regressors not allowed for last category Any advice would be much appreciated! Many thanks, Roger Walter Mebane wrote: By default, with print.level=0 or greater, the multinomRob program prints the maximum likelihood estimates with conventional standard errors before going on to compute the robust estimates. Walter Mebane Jasjeet Singh Sekhon writes: Hi Roger, Yes, multinomRob can handle equality constraints of this type---see the 'equality' option. But the function assumes that the outcomes are multinomial counts and it estimates overdispersed multinomial logistic models via MLE, a robust redescending-M estimator, and LQD which is another high breakdown point estimator. It would be a simple matter to edit the 'multinomMLE' function to work without counts and to do straight MNL instead, but right now it estimates an overdispersed MNL model. Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524
Re: [R] ks.test not working?
cannot compute correct p-values with ties in: ks.test(x, pgev, fit$mle[1], fit$mle[2], fit$mle[3]) You may want to use the ks.boot function in the Matching package which implements a bootstrap ks-test which provides consistent pvalues (achieved significance levels) when there are ties. Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Benjamin Dickgiesser writes: Hi, I am trying the following: library(ismev) library(evd) fit - gev.fit(x,show=FALSE) ks.test(x,pgev,fit$mle[1],fit$mle[2],fit$mle[3]) but I am getting: Warning message: cannot compute correct p-values with ties in: ks.test(x, pgev, fit$mle[1], fit$mle[2], fit$mle[3]) where x is: [1] 239 381 43 22159 15619 156 253 1006 [18]5 100 10 103 25512 118 68 13 154 67 125 15 [35]5 130 47 143 176 573 592 213 54 10 179 198 293 77 11 44 [52]6 222 10812 164 70 1247 134 41 5158 [69] 200 1692 13 49 218 48 34 74 19 44 1286 96 238 17 167 [86] 308 204 416 32 77 14 62 103642 1114 [103] 22 15 13 12 34 14 1331122 52 3469 31 [120] 342 34827 52 39 795 88 238 40 294 69 878 7516 [137]5 381 58 84 588 345 161 12936 403 516 161 1112 54 3812 [154] 526 38 17 20 17 800 1891 57 90 92 16 17 31 114 17 [171] 129 10 46 14 23111 313 Can anyone tell me why that could be? Thank you very much, Benjamin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Var.calc in Match()
Does anyone else find that using the Var.calc option (for heteroscedasticity consistent std. errors) in Match() (from the Matching library) slows down computation of the matching estimator by a lot? The Var.calc option to Match() slows down the function because an additional loop through the data is required (with vector operations in it). And this loop, unlike the loop to find the matches themselves, is not written in C so it is much slower. With this option, the standard errors do not assume a constant causal effect so there are a lot of extra calculations to be done. I guess it would be easy enough to move this loop into C also. It's on the list Cheers, JS. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Brian Quinif writes: Does anyone else find that using the Var.calc option (for heteroscedasticity consistent std. errors) in Match() (from the Matching library) slows down computation of the matching estimator by a lot? I don't really understand why when I use this option it slows down so much, but for me it does significantly. I want to use the heteroscedasticity consistent std. errors in my project, but as long as it takes to compute them, I don't know if I will be able to. Thanks, BQ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] number of matches when using Match()
How do you go about deciding how many matches you will use? With my data, my standard errors generally get smaller if I use more matches. Generally, select the max number of matches that result in good or acceptable balance (hence bounding bias due to the observed confounders). See the MatchBalance() function to get some measures of balance. And GenMatch() for automatically maximizing (observed) covariate balance. How to measure good balance is an open research question. I will note that the degree of covariate balance that is usually thought to be acceptable in the applied literature isn't enough to get reliable estimates in practice. We can evaluate this by comparing an observational estimate (with matching adjustment) with a known experimental benchmark. See: http://sekhon.berkeley.edu/papers/GenMatch.pdf Speaking of standard errors, when correcting for heteroscedasticity, how many matches do you use (this is the Var.cal option). It seems to me that it might make sense to use the same number of matches as above, but that's just a guess... These are related but separate issues. The number of matches is all about covariate balance (bias reduction). And the Var.cal option is related to the heterogeneity of the causal effect. It could be that the data is such that one needs to do 1-to-1 matching to get good covariate balance, but that the causal effect is homogeneous so Var.cal can be set to 0 etc. One more question about Match()... I am calculating a number of SATT's that all have the same covariates (X's) and treatment variables (Tr's). I would like to take advantage of the matching that I do the first time to then quickly calculate the SATT for various different Y's? How can I do that? It would save serious computational time. The following code expands on your code and will estimate the mean causal effect and the naive standard errors without a second call to Match(). Doing this for the Abadie-Imbens SEs instead of the naive SEs is left as an exercise (take the code from the Matching.R file of the package). In a future version of the package, I'll make a separate function to make all of this transparent by using the predict() setup. ### library(Matching) set.seed(30) #make up some data X - matrix(rnorm(1000*5), ncol=5) Tr - c(rep(1,500),rep(0,500)) Y1 - as.vector(rnorm(1000)) Y2 - as.vector(rnorm(1000)) satt.Y1 - Match(Y=Y1, X=X, Tr=Tr, M=1) summary(satt.Y1, full=TRUE) cat(** Estimate Y2 BY Calling Match() \n) satt.Y2 - Match(Y=Y2, X=X, Tr=Tr, M=1) summary(satt.Y2, full=TRUE) cat(** Estimate Without Calling Match() \n) index.treated - satt.Y1$index.treated index.control - satt.Y1$index.control weights - satt.Y1$weights Y - Y2 mest - sum((Y[index.treated]-Y[index.control])*weights)/sum(weights) cat(estimate for Y2:, mest, \n) v1 - Y[index.treated] - Y[index.control] varest - sum( ((v1-mest)^2)*weights)/(sum(weights)*sum(weights)) se.naive - sqrt(varest) cat(naive SE Y2:, se.naive, \n) ### Cheers, JS. === Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Brian Quinif writes: To anyone who uses the Match() function in the Matching library... How do you go about deciding how many matches you will use? With my data, my standard errors generally get smaller if I use more matches. Speaking of standard errors, when correcting for heteroscedasticity, how many matches do you use (this is the Var.cal option). It seems to me that it might make sense to use the same number of matches as above, but that's just a guess... One more question about Match()... I am calculating a number of SATT's that all have the same covariates (X's) and treatment variables (Tr's). I would like to take advantage of the matching that I do the first time to then quickly calculate the SATT for various different Y's? How can I do that? It would save serious computational time. In case I'm not explaining myself well, in the example below, I would like to calculate satt.Y2 without having to perform the matching all over again, since with more data, the process can be very slow. #make up some data X - matrix(rnorm(1000*5), ncol=5) Tr - c(rep(1,500),rep(0,500)) Y1 - as.vector(rnorm(1000)) Y2 - as.vector(rnorm(1000)) satt.Y1 - Match(Y=Y1, X=X1, Tr=Tr, M=1) satt.Y2 - Match(Y=Y2, X=X1, Tr=Tr, M=1) Thanks, BQ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R performance: different CPUs
Hi, 64bit CPUs, such as opterons, help significantly with large databases or if you are running multiple processes. But there is a speed penalty if you are not. Some packages can make use of multiple processors, such as my rgenoud (genetic optimization using derivatives) and Matching packages, but most do not. For these packages the speed up is significant. There are also multithreaded BLAS which can be used reliably under LINUX, but the speed benefit is usually small. You may want to check out some benchmarks at: http://sekhon.berkeley.edu/macosx/ (linux does very well). Cheers, JS. === Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Toby Muhlhofer writes: Hello! I need to purchase a new box, which I would like to optimize for good R performance. For the record, I will run Fedora Core 5 as and OS, and I wanted to know if anyone has experience with how the following affects R performance: - Is there a big advantage to having a 64-bit CPU over having a 32-bit? - Does an Opteron offer any advantages over an Athlon, and if yes, does it justify an investment of about US $75 more for equivalent listed speeds? - Have people successfully multithreaded R computations, such as to justify a dual-core CPU? I understand R itself does not multithread, but of course it should be possible to write code that paralellizes computations and I wanted to know if anyone has experience doing so and gained large speed advantages by it. Thanks, Toby Muhlhfoer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R performance: different CPUs
This would be true of 64-bit builds of R, not 64-bit CPUs. [...] This doesn't mean that a 32-bit build of R on a 64-bit processor will be slower than a 32-bit build of R on a 32-bit processor. There is the issue, however, of running a 32bit application on a 64bit OS. Under RedHat and SuSE this works transparently and I've not noticed a performance issue, but under Debian's or Ubuntu's chroot setup there is in my experience a measurable performance hit. Of course, one could simply run 32bit Debian along with 32bit R on an Opteron. Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Thomas Lumley writes: On Wed, 5 Apr 2006, Jasjeet Singh Sekhon wrote: Hi, 64bit CPUs, such as opterons, help significantly with large databases or if you are running multiple processes. But there is a speed penalty if you are not. This would be true of 64-bit builds of R, not 64-bit CPUs. On a 64-bit processor you can usually run either 64-bit or 32-bit builds of R, and the 64-bit one will be able to access more memory but will be slower. This doesn't mean that a 32-bit build of R on a 64-bit processor will be slower than a 32-bit build of R on a 32-bit processor. -thomas Thomas LumleyAssoc. Professor, Biostatistics [EMAIL PROTECTED]University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] any more direct-search optimization method in R
Given that information, I think a genetic algorithm should probably do well with your problem. You may want to try the rgenoud package (R-GENetic Optimization Using Derivatives) which is on CRAN. For more information see: http://sekhon.berkeley.edu/rgenoud/ It works well for these kinds of problems and the package is relatively flexible. For time consuming problems it can be run in parallel if you have multiple cores/cpus or machines. Cheers, JS. === Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Given that information, I think a genetic algorithm should probably do well with your problem. Standard derivative-based optimizers are going to get frustrated and give up. I can believe that Nelder-Mead could get confused as well, though I'm not sure that it will. 'genopt' from S Poetry does have box constraints for the parameters. I'm not sure what other genetic algorithms that are in R are like. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Weijie Cai wrote: Hi All, Thanks for all your replies especially for Graves suggestions. You are right I should give more information about my function. So my responds to your questions are: 1. 2. the function itself is not continuous/smooth. The evaluation at each point is a random number with a non-constant variance. When it approaches the global minimum, the variance is very small. There is some kind of structure from the surface plot of my function but its form is intractable, unfortunately. 3. 4. each evaluation of my function is not slow. The returned results by constrOptim() are just not quite close to true global minimum (error can be as large as 0.2). Of course I can ignore the message of nonconvergence, the precision is really not satisfying. Every time nelder-mead will use up 300 default iterations when doing optimization. I guess the essential reason is the randomness of function surface. 5. Yes I am sure there is a global minimum. I did a lengthy computation at rough grids and global minimum is very close to true minimum. 6. Do you mean I start from a minimum found by grid searching? That's what I did. I never tried using smooth functions to approximate my function though. WC From: Spencer Graves [EMAIL PROTECTED] To: Ingmar Visser [EMAIL PROTECTED] CC: Weijie Cai [EMAIL PROTECTED], r-help@stat.math.ethz.ch Subject: Re: [R] any more direct-search optimization method in R Date: Tue, 28 Feb 2006 09:33:35 -0800 WC: What do you mean by noisy in this context? 1. You say, gradient, hessian not available. Is it continuous with perhaps discontinuities in the first derivative? 2. Or is it something you can compute only to, say, 5 significant digits, and some numerical optimizers get lost trying to estimate derivatives from so fine a grid that the gradient and hessian are mostly noise? 3. Also, why do you think constrOptim is too slow? Does it call your function too many times or does your function take too long to compute each time it's called? 4. What's not satisfactory about the results of constrOptim? 5. Do you know if only one it has only one local minimum in the region, or might it have more? 6. Regardless of the answers to the above, have you considered using expand.grid to get starting values and narrow the search (with possibly system.time or proc.time to find out how much time is required for each function evaluation)? I haven't tried this, but I would think it would be possible to fit a spline (either exactly or a smoothing spline) to a set of points, then optimize the spline. hope this helps. spencer graves Ingmar Visser wrote: If you have only boundary constraints on parameters you can use method L-BFGS in optim. Hth, ingmar From: Weijie Cai [EMAIL PROTECTED] Date: Tue, 28 Feb 2006 11:48:32 -0500 To: r-help@stat.math.ethz.ch Subject: [R] any more direct-search optimization method in R Hello list, I am dealing with a noisy function (gradient,hessian not available) with simple boundary constraints (x_i0). I've tried constrOptim() using nelder mead to minimize it but it is way too slow and the returned results are not satisfying. simulated annealing is so hard to tune and it always crashes R program in my case. I wonder if there are any packages or functions can do direct search optimization? A rough search in literature shows multidirectional search and DIRECT algorithm may help. Is there any other satisfying algorithm? Thanks, WC
Re: [R] Parallel computing in R for dummies--how to optimize an external model?
Hi Scott, It is difficult to debug your issue without more information. Would it be possible to email me code of a simple example? Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Travers Department of Political Science Survey Research Center UC Berkeley http://sekhon.polisci.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === Waichler, Scott R writes: I am trying to use the optimizing function genoud() with the snow package on a couple of i686 machines running Redhat Linux WS4 . I don't know anything about PVM or MPI, so I just followed the directions in snow and rgenoud for the simplest method and started a socket cluster. My function fn for genoud involves updating an input file for a separate numerical model with the latest parameter set from the optimizer, running the (compiled) model on the input file with system(), and processing the output including calculation of objective function. The whole process works on the localhost machine in one cpu, and I can see that an R session is created in the second, non-localhost machine, but it doesn't seem to be doing anything. All of the model runs generated by the application of genoud take place in the cpu. What am I missing? I can see where there would be a conflict in having multiple processors trying to access the same system model input file at once, but I don't see any indication of that type of problem. Grateful for any help, Scott Waichler Pacific Northwest National Laboratory [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] New Package for Multivariate and Propensity Score Matching
Matching version 0.48 is now available on CRAN. Matching provides functions for estimating causal effects by multivariate and propensity score matching. The package includes a variety of univariate and multivariate tests to determine if balance has been obtained by the matching procedure. These tests can also be used to determine if an experiment or quasi-experiment is balanced on baseline covariates. The functions provide valid standard errors and allow one to estimate various estimands. For documentation and further details see: http://jsekhon.fas.harvard.edu/matching Cheers, Jas. == Jasjeet S. Sekhon Associate Professor Harvard University Center for Basic Research in the Social Sciences [EMAIL PROTECTED] http://jsekhon.fas.harvard.edu/ Office: 617.496.2426 Fax: 617.507.5524 ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] New Package: multinomRob
We would like to announce the availability on CRAN of a new package multinomRob. It does robust estimation of overdispersed multinomial regression models. The package is also able to estimate overdispersed grouped multinomial logistic and multivariate-t logistic models. The code is relatively general; for example, it allows for equality constraints across parameters and it can handle datasets in which the number of categories varies by observation. DESCRIPTION: Package: multinomRob Version: 1.0 Date: 2004/02/18 Title: Robust Estimation of Overdispersed Multinomial Regression Models Author: Walter R. Mebane, Jr. [EMAIL PROTECTED], Jasjeet Singh Sekhon [EMAIL PROTECTED] Maintainer: Jasjeet Singh Sekhon [EMAIL PROTECTED] Description: overdispersed multinomial regression using robust (LQD and tanh) estimation Depends: R (= 1.7.0), rgenoud (= 1.22), MASS (= 7.1-8), mvtnorm (= 0.6-3) License: GPL version 2 or later URL: http://jsekhon.fas.harvard.edu/robust/ We look forward to receiving questions, comments and suggestions. Cheers, Jas. == Jasjeet S. Sekhon Associate Professor Harvard University Center for Basic Research in the Social Sciences [EMAIL PROTECTED] http://jsekhon.fas.harvard.edu/ Office: 617.496.2426 Fax: 617.496.5149 ___ R-packages mailing list [EMAIL PROTECTED] https://www.stat.math.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html