Re: [R] genoud problem

2007-09-08 Thread Jasjeet Singh Sekhon

Hi Shubha,

genoud does not return the initial fit value.  But you could easily
obtain it by passing your starting values to your function directly.
Alternatively, one can have genoud print out the entire initial
population (or the entire population as is evolves), and one can then
decide to report whatever summary of this one would like.  Note that
the best fit in generation zero is printed by default.  See the
project.path and print.level options for details.

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===


Shubha Vishwanath Karanth writes:
  Hi R users,
  
   
  
  genoud function of rgenoud package will optimize my function. If 
  
   
  
  opt = genoud(fn,2,max=TRUE,starting.value=c(1,10),)
  
   
  
  opt$value will give the optimized value of the function, fn. My
  problem is from the same opt, can I get the value of the function at the
  initial parameter values? I need the initial value of the function for
  reporting purposes.
  
   
  
   
  
   
  
  BR, Shubha
  
  
   [[alternative HTML version deleted]]
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Increasing precision of rgenoud solutions

2007-05-10 Thread Jasjeet Singh Sekhon

Hi Paul,

Solution.tolerance is the right way to increase precision.  In your
example, extra precision *is* being obtained, but it is just not
displayed because the number of digits which get printed is controlled
by the options(digits) variable.  But the requested solution
precision is in the object returned by genoud().

For example, if I run

a - genoud(myfunc, nvars=2,
 
Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.01)

I get

 a$par
[1] 0.7062930 0.7079196

But if I set options(digits=12), and without rerunning anything I check
a$par again, I observe that:

 a$par
[1] 0.706293049455 0.707919577559

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===


Paul Smith writes:
  Dear All
  
  I am using rgenoud to solve the following maximization problem:
  
  myfunc - function(x) {
x1 - x[1]
x2 - x[2]
if (x1^2+x2^2  1)
  return(-999)
else x1+x2
  }
  
  genoud(myfunc, nvars=2,
  Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.01)
  
  How can one increase the precision of the solution
  
  $par
  [1] 0.7072442 0.7069694
  
  ?
  
  I have tried solution.tolerance but without a significant improvement.
  
  Any ideas?
  
  Thanks in advance,
  
  Paul
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Increasing precision of rgenoud solutions

2007-05-10 Thread Jasjeet Singh Sekhon

Hi Paul,

I see.  You want to increase the population size (pop.size)
option---of lesser importance are the max.generations,
wait.generations and P9 options.  For more details, see
http://sekhon.berkeley.edu/papers/rgenoudJSS.pdf.

For example, if I run

a - genoud(myfunc, nvars=2,

Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.001,
pop.size=6000, P9=50)

options(digits=12)

I obtain:

#approx analytical solution
sum(c(0.707106781186548,0.707106781186548))
[1] 1.41421356237

#genoud solution
#a$value
[1] 1.41421344205

#difference
a$value-sum(c(0.707106781186548,0.707106781186548))

[1] -2.91195978441e-09

If that's not enough precision, increase the options (and the
run-time).  This would be faster with analytical derivatives.

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===



Paul Smith writes:
Thanks, Jasjeet, for your reply, but maybe I was not enough clear.

The analytical solution for the optimization problem is the pair

(sqrt(2)/2,sqrt(2)/2),

which, approximately, is

(0.707106781186548,0.707106781186548).

The solution provided by rgenoud, with

solution.tolerance=0.1

was

$par
[1] 0.7090278 0.7051806

which is not very precise comparing with the values of the
(analytical) solution. Is it possible to increase the degree of
closeness of the rgenoud solutions with the analytical ones?

Paul

Paul Smith writes:
  Dear All
  
  I am using rgenoud to solve the following maximization problem:
  
  myfunc - function(x) {
x1 - x[1]
x2 - x[2]
if (x1^2+x2^2  1)
  return(-999)
else x1+x2
  }
  
  genoud(myfunc, nvars=2,
  Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2,solution.tolerance=0.01)
  
  How can one increase the precision of the solution
  
  $par
  [1] 0.7072442 0.7069694
  
  ?
  
  I have tried solution.tolerance but without a significant improvement.
  
  Any ideas?
  
  Thanks in advance,
  
  Paul
  
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad optimization solution

2007-05-08 Thread Jasjeet Singh Sekhon

The issue is that you are using a derivative based optimizer for a
problem for which it is well known that such optimizers will not
perform well.  You should consider using a global optimizer.  For
example, rgenoud combines a genetic search algorithm with a BFGS
optimizer and it works well for your problem:

library(rgenoud)

myfunc - function(x) {
  x1 - x[1]
   x2 - x[2]
   abs(x1-x2)
 }

optim(c(0.5,0.5),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control=list(fnscale=-1))

genoud(myfunc, nvars=2, 
Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2)

myfunc - function(x) {
  x1 - x[1]
  x2 - x[2]
  (x1-x2)^2
}

optim(c(0.2,0.2),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control=list(fnscale=-1))
genoud(myfunc, nvars=2, 
Domains=rbind(c(0,1),c(0,1)),max=TRUE,boundary.enforcement=2)

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===






Paul Smith writes:
  It seems that there is here a problem of reliability, as one never
  knows whether the solution provided by R is correct or not. In the
  case that I reported, it is fairly simple to see that the solution
  provided by R (without any warning!) is incorrect, but, in general,
  that is not so simple and one may take a wrong solution as a correct
  one.
  
  Paul
  
  
  On 5/8/07, Ravi Varadhan [EMAIL PROTECTED] wrote:
   Your function, (x1-x2)^2, has zero gradient at all the starting values such
   that x1 = x2, which means that the gradient-based search methods will
   terminate there because they have found a critical point, i.e. a point at
   which the gradient is zero (which can be a maximum or a minimum or a saddle
   point).
  
   However, I do not why optim converges to the boundary maximum, when 
   analytic
   gradient is supplied (as shown by Sundar).
  
   Ravi.
  
   
   ---
  
   Ravi Varadhan, Ph.D.
  
   Assistant Professor, The Center on Aging and Health
  
   Division of Geriatric Medicine and Gerontology
  
   Johns Hopkins University
  
   Ph: (410) 502-2619
  
   Fax: (410) 614-9625
  
   Email: [EMAIL PROTECTED]
  
   Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
  
  
  
   
   
  
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith
   Sent: Monday, May 07, 2007 6:26 PM
   To: R-help
   Subject: Re: [R] Bad optimization solution
  
   On 5/7/07, Paul Smith [EMAIL PROTECTED] wrote:
 I think the problem is the starting point.  I do not remember the
   details
 of the BFGS method, but I am almost sure the (.5, .5) starting point is
 suspect, since the abs function is not differentiable at 0.  If you
   perturb
 the starting point even slightly you will have no problem.

  Paul Smith
  [EMAIL PROTECTED]
  
   To
  Sent by:  R-help 
 r-help@stat.math.ethz.ch
  [EMAIL PROTECTED]
   cc
  at.math.ethz.ch

   Subject
[R] Bad optimization solution
  05/07/2007 04:30
  PM








 Dear All

 I am trying to perform the below optimization problem, but getting
 (0.5,0.5) as optimal solution, which is wrong; the correct solution
 should be (1,0) or (0,1).

 Am I doing something wrong? I am using R 2.5.0 on Fedora Core 6 
 (Linux).

 Thanks in advance,

 Paul

 --
 myfunc - function(x) {
   x1 - x[1]
   x2 - x[2]
   abs(x1-x2)
 }


   optim(c(0.5,0.5),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control=
   list(fnscale=-1))
   
Yes, with (0.2,0.9), a correct solution comes out. However, how can
one be sure in general that the solution obtained by optim is correct?
In ?optim says:
   
 Method 'L-BFGS-B' is that of Byrd _et. al._ (1995) which allows
 _box constraints_, that is each variable can be given a lower
 and/or upper bound. The initial value must satisfy the
 constraints. This uses a limited-memory modification of the BFGS
 quasi-Newton method. If non-trivial bounds are supplied, this
 method will be selected, with a warning.
   
which only demands that the initial value must satisfy the constraints.
  
   Furthermore, X^2 is everywhere differentiable and notwithstanding the
   reported problem occurs with
 

Re: [R] The confidence level of p-value of ks.boot

2007-04-29 Thread Jasjeet Singh Sekhon

Hi Gala,

The default p-value is the bootstrap p-value for the ks-test.
Bootstrapping is highly recommended because the bootstrapped
Kolmogorov-Smirnov test, unlike the standard test, provides correct
coverage even when there are point masses in the distributions being
compared.  The bootstrap p-value is returned in the ks.boot.pvalue
object; so in your example code ks.b$ks.boot.pvalue.  And the results
from the standard ks.test function are contained in the ks
object--i.e., ks.b$ks.

For the theorem of correct coverage even with point masses see:
Abadie, Alberto. 2002. ``Bootstrap Tests for Distributional Treatment
Effects in Instrumental Variable Models.'' Journal of the American
Statistical Association, 97:457 (March) 284-292.

For the algorithm see:
http://sekhon.berkeley.edu/papers/GenMatch.pdf

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===


Gala writes:
  Hello!
  I need to compare 2 datasets whether they come from the same distribution. I 
  use function ks.boot{Matching}. And what is the confidence level of the 
  p-value, returned by ks.boot function?
  
  The code is:
  
  set=read.table(http://stella.sai.msu.ru:8080/~gala/data/testsets.csv;,
  header=T,sep=',')
  set1=set[!is.na(set$set1),'set1']
  set2=set[!is.na(set$set2),'set2']
  library(Matching)
  ks.b=ks.boot(set1,set2,1000)
  ks.b
  
  Thank you!
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multinomial logistic regression with equality constraints?

2007-02-08 Thread Jasjeet Singh Sekhon

As we noted earlier and as is clearly stated in the docs, multinomRob
is estimating an OVERDISPERSED multinomial model.  And in your models
here the overdispersion parameter is not identified; you need more
observations.  Walter pointed out using the print.level trick to get
the coefs for the standard MNL model, but when the model the function
is actually trying to estimate is not identified, that trick will not
work.

As I also previously noted, it is a simple matter to change the
multinomMLE function to estimate the standard MNL model.  Since you
don't want to change that file and since nnet's multinom function
doesn't have some functionality people need, we'll add a MLEonly
function to multinomRob which will allow you to do what you want.
We'll post a new version on my webpage later tonight:
http://sekhon.berkeley.edu/robust.  And after some testing, we'll
forward the new version to CRAN.

Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===




Roger Levy writes:
  Walter Mebane wrote:
   Roger,
   
 Error in if (logliklambda  loglik) bvec - blambda :
 missing value where TRUE/FALSE needed
 In addition: Warning message:
 NaNs produced in: sqrt(sigma2GN)
   
   That message comes from the Newton algorithm (defined in source file
   multinomMLE.R).  It would be better if we bullet-proofed it a bit
   more.  The first thing is to check the data.  I don't have the
   multinomLogis() function, so I can't run your code.  
  
  Whoops, sorry about that -- I'm putting revised code at the end of the 
  message.
  
   But do you really
   mean
   
 for(i in 1:length(choice)) {
   and
 dim(counts) - c(length(choice),length(choice))
   
   Should that be
   
 for(i in 1:n) {
   and
 dim(counts) - c(n, length(choice))
   
   or instead of n, some number m  length(choice).  As it is it seems to
   me you have three observations for three categories, which isn't going
   to work (there are five coefficient parameters, plus sigma for the
   dispersion).
  
  I really did mean for(i in 1:length(choice)) -- once again, the proper 
  code is at the end of this message.
  
  Also, I notice that I get the same error with another kind of data, 
  which works for multinom from nnet:
  
  
  library(nnet)
  library(multinomRob)
  dtf - data.frame(y1=c(1,1),y2=c(2,1),y3=c(1,2),x=c(0,1))
  summary(multinom(as.matrix(dtf[,1:3]) ~ x, data=dtf))
  summary(multinomRob(list(y1 ~ 0, y2 ~ x, y3 ~ x), data=dtf,print.level=128))
  
  
  The call to multinom fits the following coefficients:
  
  Coefficients:
   (Intercept)  x
  y2 0.6933809622 -0.6936052
  y3 0.0001928603  0.6928327
  
  but the call to multinomRob gives me the following error:
  
  multinomRob(): Grouped MNL Estimation
  [1] multinomMLE: -loglik initial: 9.48247391895106
  Error in if (logliklambda  loglik) bvec - blambda :
   missing value where TRUE/FALSE needed
  In addition: Warning message:
  NaNs produced in: sqrt(sigma2GN)
  
  
  Does this shed any light on things?
  
  
  Thanks again,
  
  Roger
  
  
  
  
  
  ***
  
  set.seed(10)
  library(multinomRob)
  multinomLogis - function(vector) {
 x - exp(vector)
 z - sum(x)
 x/z
  }
  
  n - 20
  choice - c(A,B,C)
  intercepts - c(0.5,0.3,0.2)
  prime.strength - rep(0.4,length(intercepts))
  counts - c()
  for(i in 1:length(choice)) {
 u - intercepts[1:length(choice)]
 u[i] - u[i] + prime.strength[i]
 counts - c(counts,rmultinomial(n = n, pr = multinomLogis(u)))
  }
  dim(counts) - c(length(choice),length(choice))
  counts - t(counts)
  row.names(counts) - choice
  colnames(counts) - choice
  data - data.frame(Prev.Choice=choice,counts)
  
  for(i in 1:length(choice)) {
 data[[paste(last,choice[i],sep=.)]] - 
  ifelse(data$Prev.Choice==choice[i],1,0)
  }
  
  multinomRob(list(A ~ last.A ,
B ~ last.B ,
C ~ last.C - 1 ,
),
   data=data,
   print.level=128)
  
  
  
  I obtained this output:
  
  
  Your Model (xvec):
  A B C
  (Intercept)/(Intercept)/last.C 1 1 1
  last.A/last.B/NA   1 1 0
  
  [1] multinomRob:  WARNING.  Limited regressor variation...
  [1] WARNING.  ... A regressor has a distinct value for only one 
  observation.
  [1] WARNING.  ... I'm using a modified estimation algorithm (i.e., 
  preventing LQD
  [1] WARNING.  ... from modifying starting values for the affected 
  parameters).
  [1] WARNING.  ... Affected parameters are TRUE in the following table.
  
  A B C
  (Intercept)/(Intercept)/last.C FALSE FALSE  TRUE
  last.A/last.B/NA 

Re: [R] multinomial logistic regression with equality constraints?

2007-02-03 Thread Jasjeet Singh Sekhon

Hi Roger,

Yes, multinomRob can handle equality constraints of this type---see
the 'equality' option.  But the function assumes that the outcomes are
multinomial counts and it estimates overdispersed multinomial logistic
models via MLE, a robust redescending-M estimator, and LQD which is
another high breakdown point estimator.  It would be a simple matter
to edit the 'multinomMLE' function to work without counts and to do
straight MNL instead, but right now it estimates an overdispersed MNL
model.

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===



Roger Levy writes:
  I'm interested in doing multinomial logistic regression with equality 
  constraints on some of the parameter values.  For example, with 
  categorical outcomes Y_1 (baseline), Y_2, and Y_3, and covariates X_1 
  and X_2, I might want to impose the equality constraint that
  
 \beta_{2,1} = \beta_{3,2}
  
  that is, that the effect of X_1 on the logit of Y_2 is the same as the 
  effect of X_2 on the logit of Y_3.
  
  Is there an existing facility or package in R for doing this?  Would 
  multinomRob fit the bill?
  
  Many thanks,
  
  Roger
  
  
  -- 
  
  Roger Levy  Email: [EMAIL PROTECTED]
  Assistant Professor Phone: 858-534-7219
  Department of Linguistics   Fax:   858-534-4789
  UC San DiegoWeb:   http://ling.ucsd.edu/~rlevy
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multinomial logistic regression with equality constraints?

2007-02-03 Thread Jasjeet Singh Sekhon

Hi Roger,

Walter's command is correct.  To match the exact normalization used by
nnet's multinom(), however, you would need to make the coefficients
zero for the first class (i.e., y1) and not the last (i.e., y3).

mr - multinomRob(list(y2 ~ x1 + x2, y3 ~ x1 + x2, y1~0),data=d,
print.level=1)

The results are:

MNL Estimates:
   y1 y2 y3
NA/(Intercept)/(Intercept)  0 -0.6931462 -0.6931462
NA/x1/x10  1.3862936  0.6931474
NA/x2/x20  0.6931474  1.3862936

Compare to the output from nnet's multinom:

 summary(m1)
Call:
multinom(formula = y ~ x1 + x2, data = d)

Coefficients:
  (Intercept)x1x2
b  -0.6931475 1.3862975 0.6931499
c  -0.6931475 0.6931499 1.3862975

Also, the MLE MNL objects are in:

mr$mnl

To constrain the x1 coeffs to be equal do:

emr - multinomRob(list(y2 ~ x1 + x2,y3 ~ x1 + x2, y1~0),data=d, print.level=1,
  equality=list(list(y2~x1+0,y3~x1+0)))

To constrain y2's x1 to be equal to y3's x2:
emr2 - multinomRob(list(y2 ~ x1 + x2,y3 ~ x1 + x2, y1~0),data=d, print.level=1,
  equality=list(list(y2~x1+0,y3~x2+0)))


See the multinomRob help file for more details:
http://sekhon.berkeley.edu/robust/multinomRob.html

Any BFGS warnings can be ignored because you are not interested in the
robust estimates (they are comming from LQD estimation and require
changing 'genoud.parms').

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===

Walter Mebane writes:
  Roger,
  
  summary(multinomRob(list(y1 ~ x1 + x2,y2 ~ x1 + x2, y3 ~ 0),data=d,
print.level=1))
  
  Walter Mebane
  
  Roger Levy writes:
Many thanks for pointing this out to me!

I'm still a bit confused, however, as to how to use multinomRob.  For 
example I tried to translate the following example using nnet:


x1 - c(1,1,1,1,0,0,0,0,0,0,0,0)
x2 - c(0,0,0,0,1,1,1,1,0,0,0,0)
y - factor(c(a,b,b,c,a,b,c,c,a,a,b,c))
library(nnet)
d - data.frame(x1,x2,y)
summary(multinom(y ~ x1 + x2, data=d))


into multinomRob as follows:


x1 - c(1,1,1,1,0,0,0,0,0,0,0,0)
x2 - c(0,0,0,0,1,1,1,1,0,0,0,0)
y - factor(c(a,b,b,c,a,b,c,c,a,a,b,c))
y1 - ifelse(y==a,1, 0)
y2 - ifelse(y==b, 1, 0)
y3 - ifelse(y==c, 1, 0)
d - data.frame(x1,x2,y,y1,y2,y3)
summary(multinomRob(list(y1 ~ x1 + x2,y2 ~ x1 + x2, y3 ~ x1 + x2),data=d))

but the last command gives me the error message:


[1] multinomMLE: Hessian is not positive definite
Error in obsformation %*% opg : non-conformable arguments


though it's not obvious to me why.  I also tried a couple other variants:


  summary(multinomRob(list(y1 ~ 0,y2 ~ x1 + x2,y3 ~ x1 + x2),data=d))
Error in multinomT(Yp = Yp, Xarray = X, xvec = xvec, jacstack = 
jacstack,  :
 (multinomT): invalid specification of Xarray (regressors not 
allowed for last category
  summary(multinomRob(list(y1 ~ 0,y2 ~ x1 ,y3 ~ x2),data=d))
Error in multinomT(Yp = Yp, Xarray = X, xvec = xvec, jacstack = 
jacstack,  :
 (multinomT): invalid specification of Xarray (regressors not 
allowed for last category


Any advice would be much appreciated!


Many thanks,

Roger

Walter Mebane wrote:
 By default, with print.level=0 or greater, the multinomRob program
 prints the maximum likelihood estimates with conventional standard
 errors before going on to compute the robust estimates.
 
 Walter Mebane
 
 Jasjeet Singh Sekhon writes:
   
   Hi Roger,
   
   Yes, multinomRob can handle equality constraints of this type---see
   the 'equality' option.  But the function assumes that the outcomes 
  are
   multinomial counts and it estimates overdispersed multinomial 
  logistic
   models via MLE, a robust redescending-M estimator, and LQD which is
   another high breakdown point estimator.  It would be a simple matter
   to edit the 'multinomMLE' function to work without counts and to do
   straight MNL instead, but right now it estimates an overdispersed MNL
   model.
   
   Cheers,
   Jas.
   
   ===
   Jasjeet S. Sekhon 
 
   Associate Professor 
   Travers Department of Political Science
   Survey Research Center  
   UC Berkeley 
   
   http://sekhon.berkeley.edu/
   V: 510-642-9974  F: 617-507-5524

Re: [R] ks.test not working?

2007-01-15 Thread Jasjeet Singh Sekhon

 cannot compute correct p-values with ties in: ks.test(x, pgev,
 fit$mle[1], fit$mle[2], fit$mle[3])

You may want to use the ks.boot function in the Matching package which
implements a bootstrap ks-test which provides consistent pvalues
(achieved significance levels) when there are ties.

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===

Benjamin Dickgiesser writes:
  Hi,
  
  I am trying the following:
  
  library(ismev)
  library(evd)
  
  fit - gev.fit(x,show=FALSE)
  ks.test(x,pgev,fit$mle[1],fit$mle[2],fit$mle[3])
  
  
  but I am getting:
  Warning message:
  cannot compute correct p-values with ties in: ks.test(x, pgev,
  fit$mle[1], fit$mle[2], fit$mle[3])
  
  where x is:
   [1]  239   381   43   22159   15619  156
   253  1006
   [18]5  100   10  103   25512   118   68   13  154
67   125   15
   [35]5  130   47  143  176  573  592  213   54   10   179  198
   293   77   11   44
   [52]6   222   10812   164   70   1247
   134   41  5158
   [69]  200  1692   13   49  218   48   34   74   19   44  1286
96  238   17  167
   [86]  308  204   416   32   77   14   62   103642
 1114
  [103]   22   15   13   12   34   14   1331122   52
3469   31
  [120]  342  34827   52   39   795   88  238   40  294   69
   878   7516
  [137]5  381   58   84  588  345  161 12936  403  516   161
  1112   54  3812
  [154]  526   38   17   20   17  800  1891   57   90   92   16   17
31   114   17
  [171]   129   10   46   14   23111  313
  
  
  Can anyone tell me why that could be?
  
  Thank you very much,
  Benjamin
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Var.calc in Match()

2006-04-22 Thread Jasjeet Singh Sekhon

 Does anyone else find that using the Var.calc option (for
 heteroscedasticity consistent std. errors) in Match() (from the
 Matching library) slows down computation of the matching estimator by
 a lot?

The Var.calc option to Match() slows down the function because an
additional loop through the data is required (with vector operations
in it).  And this loop, unlike the loop to find the matches
themselves, is not written in C so it is much slower.  With this
option, the standard errors do not assume a constant causal effect so
there are a lot of extra calculations to be done.

I guess it would be easy enough to move this loop into C also.  It's
on the list

Cheers,
JS.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===


Brian Quinif writes:
  Does anyone else find that using the Var.calc option (for
  heteroscedasticity consistent std. errors) in Match() (from the
  Matching library) slows down computation of the matching estimator by
  a lot?
  
  I don't really understand why when I use this option it slows down so
  much, but for me it does significantly. I want to use the
  heteroscedasticity consistent std. errors in my project, but as long
  as it takes to compute them, I don't know if I will be able to.
  
  Thanks,
  
  BQ
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] number of matches when using Match()

2006-04-22 Thread Jasjeet Singh Sekhon

 How do you go about deciding how many matches you will use?  With my
 data, my standard errors generally get smaller if I use more
 matches.

Generally, select the max number of matches that result in good or
acceptable balance (hence bounding bias due to the observed
confounders).  See the MatchBalance() function to get some measures of
balance.  And GenMatch() for automatically maximizing (observed)
covariate balance.

How to measure good balance is an open research question.  I will note
that the degree of covariate balance that is usually thought to be
acceptable in the applied literature isn't enough to get reliable
estimates in practice.  We can evaluate this by comparing an
observational estimate (with matching adjustment) with a known
experimental benchmark.  See:

http://sekhon.berkeley.edu/papers/GenMatch.pdf

 Speaking of standard errors, when correcting for heteroscedasticity,
 how many matches do you use (this is the Var.cal option).  It seems to
 me that it might make sense to use the same number of matches as
 above, but that's just a guess...

These are related but separate issues.  The number of matches is all
about covariate balance (bias reduction).  And the Var.cal option is
related to the heterogeneity of the causal effect.  It could be that
the data is such that one needs to do 1-to-1 matching to get good
covariate balance, but that the causal effect is homogeneous so
Var.cal can be set to 0 etc.

 One more question about Match()...
 I am calculating a number of SATT's that all have the same covariates
 (X's) and treatment variables (Tr's).  I would like to take advantage
 of the matching that I do the first time to then quickly calculate the
 SATT for various different Y's?  How can I do that?  It would save
 serious computational time.

The following code expands on your code and will estimate the mean
causal effect and the naive standard errors without a second call to
Match().  Doing this for the Abadie-Imbens SEs instead of the naive
SEs is left as an exercise (take the code from the Matching.R file of
the package).  In a future version of the package, I'll make a
separate function to make all of this transparent by using the
predict() setup.

###
library(Matching)

set.seed(30)
#make up some data
X - matrix(rnorm(1000*5), ncol=5)
Tr - c(rep(1,500),rep(0,500))
Y1 - as.vector(rnorm(1000))
Y2 - as.vector(rnorm(1000))

satt.Y1 - Match(Y=Y1, X=X, Tr=Tr, M=1)
summary(satt.Y1, full=TRUE)

cat(** Estimate Y2 BY Calling Match() \n)
satt.Y2 - Match(Y=Y2, X=X, Tr=Tr, M=1)
summary(satt.Y2, full=TRUE)

cat(** Estimate Without Calling Match() \n)
index.treated - satt.Y1$index.treated
index.control - satt.Y1$index.control
weights - satt.Y1$weights
Y - Y2

mest  - sum((Y[index.treated]-Y[index.control])*weights)/sum(weights)
cat(estimate for Y2:, mest, \n)

v1  - Y[index.treated] - Y[index.control]
varest  - sum( ((v1-mest)^2)*weights)/(sum(weights)*sum(weights))
se.naive  - sqrt(varest)
cat(naive SE Y2:, se.naive, \n)

###

Cheers,
JS.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===



Brian Quinif writes:
  To anyone who uses the Match() function in the Matching library...
  
  How do you go about deciding how many matches you will use?  With my
  data, my standard errors generally get smaller if I use more matches.
  
  Speaking of standard errors, when correcting for heteroscedasticity,
  how many matches do you use (this is the Var.cal option).  It seems to
  me that it might make sense to use the same number of matches as
  above, but that's just a guess...
  
  One more question about Match()...
  I am calculating a number of SATT's that all have the same covariates
  (X's) and treatment variables (Tr's).  I would like to take advantage
  of the matching that I do the first time to then quickly calculate the
  SATT for various different Y's?  How can I do that?  It would save
  serious computational time.
  
  In case I'm not explaining myself well, in the example below, I would
  like to calculate satt.Y2 without having to perform the matching all
  over again, since with more data, the process can be very slow.
  
  #make up some data
  X - matrix(rnorm(1000*5), ncol=5)
  Tr - c(rep(1,500),rep(0,500))
  Y1 - as.vector(rnorm(1000))
  Y2 - as.vector(rnorm(1000))
  
  satt.Y1 - Match(Y=Y1, X=X1, Tr=Tr, M=1)
  satt.Y2 - Match(Y=Y2, X=X1, Tr=Tr, M=1)
  
  Thanks,
  
  BQ
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R performance: different CPUs

2006-04-05 Thread Jasjeet Singh Sekhon

Hi,

64bit CPUs, such as opterons, help significantly with large databases
or if you are running multiple processes.  But there is a speed
penalty if you are not.

Some packages can make use of multiple processors, such as my rgenoud
(genetic optimization using derivatives) and Matching packages, but
most do not.  For these packages the speed up is significant.  There
are also multithreaded BLAS which can be used reliably under LINUX,
but the speed benefit is usually small.

You may want to check out some benchmarks at:

http://sekhon.berkeley.edu/macosx/ (linux does very well).

Cheers,
JS.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===







Toby Muhlhofer writes:
  Hello!
  
  I need to purchase a new box, which I would like to optimize for good R 
  performance.
  
  For the record, I will run Fedora Core 5 as and OS, and I wanted to know 
  if anyone has experience with how the following affects R performance:
  
  - Is there a big advantage to having a 64-bit CPU over having a 32-bit?
  
  - Does an Opteron offer any advantages over an Athlon, and if yes, does 
  it justify an investment of about US $75 more for equivalent listed speeds?
  
  - Have people successfully multithreaded R computations, such as to 
  justify a dual-core CPU? I understand R itself does not multithread, but 
of course it should be possible to write code that paralellizes 
  computations and I wanted to know if anyone has experience doing so and 
  gained large speed advantages by it.
  
  Thanks,
   Toby Muhlhfoer
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R performance: different CPUs

2006-04-05 Thread Jasjeet Singh Sekhon

 This would be true of 64-bit builds of R, not 64-bit CPUs.
 [...]
 This doesn't mean that a 32-bit build of R on a 64-bit processor will be 
 slower than a 32-bit build of R on a 32-bit processor.

There is the issue, however, of running a 32bit application on a 64bit
OS.  Under RedHat and SuSE this works transparently and I've not
noticed a performance issue, but under Debian's or Ubuntu's chroot
setup there is in my experience a measurable performance hit.  Of
course, one could simply run 32bit Debian along with 32bit R on an
Opteron.

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===



Thomas Lumley writes:
  On Wed, 5 Apr 2006, Jasjeet Singh Sekhon wrote:
  
  
   Hi,
  
   64bit CPUs, such as opterons, help significantly with large databases
   or if you are running multiple processes.  But there is a speed
   penalty if you are not.
  
  This would be true of 64-bit builds of R, not 64-bit CPUs.
  
  On a 64-bit processor you can usually run either 64-bit or 32-bit builds 
  of R, and the 64-bit one will be able to access more memory but will be 
  slower.
  
  This doesn't mean that a 32-bit build of R on a 64-bit processor will be 
  slower than a 32-bit build of R on a 32-bit processor.
  
   -thomas
  
  Thomas LumleyAssoc. Professor, Biostatistics
  [EMAIL PROTECTED]University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] any more direct-search optimization method in R

2006-03-01 Thread Jasjeet Singh Sekhon

 Given that information, I think a genetic algorithm
 should probably do well with your problem. 

You may want to try the rgenoud package (R-GENetic Optimization Using
Derivatives) which is on CRAN.  For more information see:

http://sekhon.berkeley.edu/rgenoud/

It works well for these kinds of problems and the package is
relatively flexible. For time consuming problems it can be run in
parallel if you have multiple cores/cpus or machines.

Cheers,
JS.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===



 Given that information, I think a genetic algorithm
 should probably do well with your problem.  Standard
 derivative-based optimizers are going to get frustrated
 and give up.  I can believe that Nelder-Mead could
 get confused as well, though I'm not sure that it will.
 
 'genopt' from S Poetry does have box constraints for
 the parameters.  I'm not sure what other genetic algorithms
 that are in R are like.
 
 Patrick Burns
 [EMAIL PROTECTED]
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of S Poetry and A Guide for the Unwilling S User)
 
 Weijie Cai wrote:
 
 Hi All,
 Thanks for all your replies especially for Graves suggestions. You are right 
 I should give more information about my function. So my responds to your 
 questions are:
 1. 2. the function itself is not continuous/smooth. The evaluation at each 
 point is a random number with a non-constant variance. When it approaches 
 the global minimum, the variance is very small. There is some kind of 
 structure from the surface plot of my function but its form is intractable, 
 unfortunately.
 
 3. 4. each evaluation of my function is not slow. The returned results by 
 constrOptim() are just not quite close to true global minimum (error can be 
 as large as 0.2). Of course I can ignore the message of nonconvergence, the 
 precision is really not satisfying. Every time nelder-mead will use up 300 
 default iterations when doing optimization. I guess the essential reason is 
 the randomness of function surface.
 
 5. Yes I am sure there is a global minimum. I did a lengthy computation at 
 rough grids and global minimum is very close to true minimum.
 
 6. Do you mean I start from a minimum found by grid searching? That's what 
 I did. I never tried using smooth functions to approximate my function 
 though.
 
 WC
 
 
   
 
 From: Spencer Graves [EMAIL PROTECTED]
 To: Ingmar Visser [EMAIL PROTECTED]
 CC: Weijie Cai [EMAIL PROTECTED], r-help@stat.math.ethz.ch
 Subject: Re: [R] any more direct-search optimization method in R
 Date: Tue, 28 Feb 2006 09:33:35 -0800
 
 WC:
 
   What do you mean by noisy in this context?
 
   1.  You say, gradient, hessian not available.  Is it continuous 
  with 
 perhaps discontinuities in the first derivative?
 
   2.  Or is it something you can compute only to, say, 5 significant 
 digits, and some numerical optimizers get lost trying to estimate 
 derivatives from so fine a grid that the gradient and hessian are mostly 
 noise?
 
   3.  Also, why do you think constrOptim is too slow?  Does it call 
  your 
 function too many times or does your function take too long to compute each 
 time it's called?
 
   4.  What's not satisfactory about the results of constrOptim?
 
   5.  Do you know if only one it has only one local minimum in the 
  region, 
 or might it have more?
 
   6.  Regardless of the answers to the above, have you considered using 
 expand.grid to get starting values and narrow the search (with possibly 
 system.time or proc.time to find out how much time is required for each 
 function evaluation)?  I haven't tried this, but I would think it would be 
 possible to fit a spline (either exactly or a smoothing spline) to a set of 
 points, then optimize the spline.
 
   hope this helps.
   spencer graves
 
 Ingmar Visser wrote:
 
 
 
 If you have only boundary constraints on parameters you can use method
 L-BFGS in optim.
 Hth, ingmar
 
 
 
   
 
 From: Weijie Cai [EMAIL PROTECTED]
 Date: Tue, 28 Feb 2006 11:48:32 -0500
 To: r-help@stat.math.ethz.ch
 Subject: [R] any more direct-search optimization method in R
 
 Hello list,
 
 I am dealing with a noisy function (gradient,hessian not available) with
 simple boundary constraints (x_i0). I've tried constrOptim() using 
 nelder
 mead to minimize it but it is way too slow and the returned results are 
 not
 satisfying. simulated annealing is so hard to tune and it always crashes 
 R
 program in my case. I wonder if there are any packages or functions can 
 do
 direct search optimization?
 
 A rough search in literature shows multidirectional search and DIRECT
 algorithm may help. Is there any other satisfying algorithm?
 
 Thanks,
 WC
 
 

Re: [R] Parallel computing in R for dummies--how to optimize an external model?

2006-02-14 Thread Jasjeet Singh Sekhon

Hi Scott,

It is difficult to debug your issue without more information.  Would
it be possible to email me code of a simple example?

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Travers Department of Political Science
Survey Research Center  
UC Berkeley 

http://sekhon.polisci.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===


Waichler, Scott R writes:
  
  I am trying to use the optimizing function genoud() with the snow
  package on a couple of i686 machines running Redhat Linux WS4 .  I don't
  know anything about PVM or MPI, so I just followed the directions in
  snow and rgenoud for the simplest method and started a socket cluster.
  My function fn for genoud involves updating an input file for a separate
  numerical model with the latest parameter set from the optimizer,
  running the (compiled) model on the input file with system(), and
  processing the output including calculation of objective function.  The
  whole process works on the localhost machine in one cpu, and I can see
  that an R session is created in the second, non-localhost machine, but
  it doesn't seem to be doing anything.  All of the model runs generated
  by the application of genoud take place in the cpu.  What am I missing?
  I can see where there would be a conflict in having multiple processors
  trying to access the same system model input file at once, but I don't
  see any indication of that type of problem.
  
  Grateful for any help,
  
  Scott Waichler
  Pacific Northwest National Laboratory
  [EMAIL PROTECTED]
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] New Package for Multivariate and Propensity Score Matching

2004-10-22 Thread Jasjeet Singh Sekhon

Matching version 0.48 is now available on CRAN.

Matching provides functions for estimating causal effects by
multivariate and propensity score matching. The package includes a
variety of univariate and multivariate tests to determine if balance
has been obtained by the matching procedure. These tests can also be
used to determine if an experiment or quasi-experiment is balanced on
baseline covariates.  The functions provide valid standard errors and
allow one to estimate various estimands.

For documentation and further details see:

http://jsekhon.fas.harvard.edu/matching

Cheers,
Jas.

==
Jasjeet S. Sekhon
Associate Professor
Harvard University
Center for Basic Research in the 
  Social Sciences
[EMAIL PROTECTED]
http://jsekhon.fas.harvard.edu/
Office: 617.496.2426 Fax: 617.507.5524

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] New Package: multinomRob

2004-02-19 Thread Jasjeet Singh Sekhon

We would like to announce the availability on CRAN of a new package
multinomRob.  It does robust estimation of overdispersed multinomial
regression models. The package is also able to estimate overdispersed
grouped multinomial logistic and multivariate-t logistic models.  The
code is relatively general; for example, it allows for equality
constraints across parameters and it can handle datasets in which the
number of categories varies by observation.

DESCRIPTION:

Package: multinomRob
Version: 1.0
Date: 2004/02/18
Title: Robust Estimation of Overdispersed Multinomial Regression Models
Author: Walter R. Mebane, Jr. [EMAIL PROTECTED], Jasjeet Singh Sekhon [EMAIL 
PROTECTED]
Maintainer: Jasjeet Singh Sekhon [EMAIL PROTECTED]
Description: overdispersed multinomial regression using robust (LQD and tanh) 
estimation
Depends: R (= 1.7.0), rgenoud (= 1.22), MASS (= 7.1-8), mvtnorm (= 0.6-3)
License: GPL version 2 or later
URL: http://jsekhon.fas.harvard.edu/robust/


We look forward to receiving questions, comments and suggestions.

Cheers,
Jas.

==
Jasjeet S. Sekhon
Associate Professor
Harvard University
Center for Basic Research in the 
  Social Sciences
[EMAIL PROTECTED]
http://jsekhon.fas.harvard.edu/
Office: 617.496.2426 Fax: 617.496.5149

___
R-packages mailing list
[EMAIL PROTECTED]
https://www.stat.math.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html