Re: [R] integrate
You can divide your domain of integration into smaller intervals and then add up the individual contributions. This could improve the speed of adaptive Gauss-Kronrod quadrature used in integrate(). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Santanu Pramanik Sent: Wednesday, August 22, 2007 6:41 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] integrate Hi Andy, Thank your very much for your input. I also tried something like that which gives a value close to 20, basically using the same trapezoidal rule. sum(apply(as.matrix(seq(-10,10,by=0.1)),1,my.fcn))*0.1 [1] 20.17385 Actually my function is much more complicated than the posted example and it is taking a long time... Anyway, thanks again, Santanu JPSM, 1218J Lefrak Hall University of Maryland, College Park Phone: 301-314-9916 [EMAIL PROTECTED] 8/22/2007 5:20:05 PM As Duncan Murdoch mentioned in his reply, the problem is with the fact that your function is not really a properly defined function in the sense of assigning a unique y to each x. The integrate function uses an adaptive quadrature routine which probably makes multiple calls to the function being integrated and expects to get the same y's for the same x's every time. If you want to get a number close to 20 (for your example) you need an integration routine which will use single evaluation of your function at each value of x. A simple method like rectangular approximation on a grid or the so-called trapezoidal rule will do just that. Here is a very crude prototype of such an integrator: integrate1 - function(f, lower, upper){ f - match.fun(f) xx - seq(lower, upper, length=100) del - xx[2] - xx[1] yy - f(xx[-100]) return(del*sum(yy)) Now when you run integrate1(my.fun, -10, 10) you will get a number close to 20 but, of course, every time you do it you will get a different value. Hope this helps, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: [EMAIL PROTECTED] Tel: (651) 733-6092 Fax: (651) 736-3122 Santanu Pramanik [EMAIL PROTECTED] To .umd.edu r-help@stat.math.ethz.ch Sent by: cc [EMAIL PROTECTED] at.math.ethz.ch Subject [R] integrate 08/22/2007 02:56 PM Hi, I am trying to integrate a function which is approximately constant over the range of the integration. The function is as follows: my.fcn = function(mu){ + m = 1000 + z = 0 + z.mse = 0 + for(i in 1:m){ + z[i] = rnorm(1, mu, 1) + z.mse = z.mse + (z[i] - mu)^2 + } + return(z.mse/m) + } my.fcn(-10) [1] 1.021711 my.fcn(10) [1] 0.9995235 my.fcn(-5) [1] 1.012727 my.fcn(5) [1] 1.033595 my.fcn(0) [1] 1.106282 The function takes the value (approx) 1 over the range of mu. So the integration result should be close to 20 if we integrate over (-10, 10). But R gives: integrate(my.fcn, -10, 10) 685.4941 with absolute error 7.6e-12 integrate(Vectorize(my.fcn), -10, 10) # this code never terminate I have seen in the ?integrate page it is clearly written: If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong. But this doesn't help in solving the problem. Thanks, Santanu JPSM, 1218J Lefrak Hall University of Maryland, College Park Phone: 301-314-9916 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https
Re: [R] Linear models over large datasets
The simplest trick is to use the QR decomposition: The OLS solution (X'X)^{-1}X'y can be easily computed as: qr.solve(X, y) Here is an illustration: set.seed(123) X - matrix(round(rnorm(100),1),20,5) b - c(1,1,2,2,3) y - X %*% b + rnorm(20) ans1 - solve(t(X)%*%X,t(X)%*%y) ans2 - qr.solve(X,y) all.equal(ans1,ans2) [1] TRUE Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of dave fournier Sent: Friday, August 17, 2007 12:43 PM To: r-help@stat.math.ethz.ch Subject: [R] Linear models over large datasets Its actually only a few lines of code to do this from first principles. The coefficients depend only on the cross products X'X and X'y and you can build them up easily by extending this example to read files or a database holding x and y instead of getting them from the args. Here we process incr rows of builtin matrix state.x77 at a time building up the two cross productxts, xtx and xty, regressing Income (variable 2) on the other variables: mylm - function(x, y, incr = 25) { start - xtx - xty - 0 while(start nrow(x)) { idx - seq(start + 1, min(start + incr, nrow(x))) x1 - cbind(1, x[idx,]) xtx - xtx + crossprod(x1) xty - xty + crossprod(x1, y[idx]) start - start + incr } solve(xtx, xty) } mylm(state.x77[,-2], state.x77[,2]) On 8/16/07, Alp ATICI alpatici at gmail.com wrote: I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. __ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If your design matrix X is very well behaved this approach may work for you. Often just doing solve(X'X,y) will fail for numerical reasons. The right way to do it is tofactor the matrix X as X = A * B where B is 200x200 in your case and A is 200 x 200 with A'*A = I (That is A is orthogonal.) so X'*X = B' *B and you use solve(B'*B,y); To find A and B you can use modified Gram-Schmidt which is very easy to program and works well when you wish to store the columns of X on a hard disk and just read in a bit at a time. Some people claim that modifed Gram-Schmidt is unstable but it has always worked well for me. In any event you can check to ensure that A'*A = I and X=A*B Cheers, Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
An even simpler solution is: mat2 - 1 * (mat1 0.25) Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lanre Okusanya Sent: Friday, August 10, 2007 2:20 PM To: jim holtman Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Help wit matrices that was ridiculously simple. duh. THanks Lanre On 8/10/07, jim holtman [EMAIL PROTECTED] wrote: Is this what you want: x - matrix(runif(100), 10) round(x, 3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.268 0.961 0.262 0.347 0.306 0.762 0.524 0.062 0.028 0.226 [2,] 0.219 0.100 0.165 0.131 0.578 0.933 0.317 0.109 0.527 0.131 [3,] 0.517 0.763 0.322 0.374 0.910 0.471 0.278 0.382 0.880 0.982 [4,] 0.269 0.948 0.510 0.631 0.143 0.604 0.788 0.169 0.373 0.327 [5,] 0.181 0.819 0.924 0.390 0.415 0.485 0.702 0.299 0.048 0.507 [6,] 0.519 0.308 0.511 0.690 0.211 0.109 0.165 0.192 0.139 0.681 [7,] 0.563 0.650 0.258 0.689 0.429 0.248 0.064 0.257 0.321 0.099 [8,] 0.129 0.953 0.046 0.555 0.133 0.499 0.755 0.181 0.155 0.119 [9,] 0.256 0.954 0.418 0.430 0.460 0.373 0.620 0.477 0.132 0.050 [10,] 0.718 0.340 0.854 0.453 0.943 0.935 0.170 0.771 0.221 0.929 ifelse(x .5, 1, 0) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]010001100 0 [2,]000011001 0 [3,]110010001 1 [4,]011101100 0 [5,]011000100 1 [6,]101100000 1 [7,]110100000 0 [8,]010100100 0 [9,]010000100 0 [10,]101011010 1 On 8/10/07, Lanre Okusanya [EMAIL PROTECTED] wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixture of Normals with Large Data
Another possibility is to use data squashing methods. Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999). Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: [EMAIL PROTECTED] - Original Message - From: Charles C. Berry [EMAIL PROTECTED] Date: Saturday, August 4, 2007 8:01 pm Subject: Re: [R] Mixture of Normals with Large Data To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch On Sat, 4 Aug 2007, Tim Victor wrote: All: I am trying to fit a mixture of 2 normals with 110 million observations. I am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and I continue to run out of memory. Does anyone have any suggestions. If the first few million observations can be regarded as a SRS of the rest, then just use them. Or read in blocks of a convenient size and sample some observations from each block. You can repeat this process a few times to see if the results are sufficiently accurate. Otherwise, read in blocks of a convenient size (perhaps 1 million observations at a time), quantize the data to a manageable number of intervals - maybe a few thousand - and tabulate it. Add the counts over all the blocks. Then use mle() to fit a multinomial likelihood whose probabilities are the masses associated with each bin under a mixture of normals law. Chuck Thanks so much, Tim [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E UC San Diego La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing the rank of a matrix.
Hi Martin, I just realized (courtesy: ?qr) that LAPACK=TRUE always gives full rank, no matter what the matrix and tolerance are. So, clearly my results using LAPACK=TRUE should be ignored. So, the real comparison is only between rankMat and qr(., LAPACK=FALSE)$rank. I cant help but fell that there can be no correct solution to an ill-posed problem. Furthermore, I haven't come across a real application where the numerical estimate of a rank is directly useful. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Martin Maechler [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 1:43 PM To: RAVI VARADHAN Cc: Martin Maechler; r-help@stat.math.ethz.ch Subject: Re: [R] Computing the rank of a matrix. RV == RAVI VARADHAN [EMAIL PROTECTED] on Sat, 07 Apr 2007 18:39:36 -0400 writes: this is a bit a late reply... better late than never RV Hi Martin, Hi Ravi, thanks for your research. RV I played a bit with rankMat. I will first state the RV conclusions of my numerical experiments and then present RV the results to support them: RV 1. I don't believe that rankMat, or equivalently RV Matlab's rank computation, is necessarily better than RV qr(.)$rank or (qr., LAPACK=TRUE)$rank. In fact, for the RV notorious Hilbert matrix, rankMat can give poor RV estimates of rank. RV 2. There exists no universally optimal (i.e. optimal RV for all A) tol in qr(A, tol)$rank that would be RV optimally close to rankMat. The discrepancy in the RV ranks computed by qr(A)$rank and rankMat(A) obviously RV depends on the matrix A. This is evident from the tol RV used in rankMat: RV tol - max(d) * .Machine$double.eps * abs(singValA[1]) RV So, this value of tol in qr will also minimize the discrepancy. RV 3. The tol parameter is irrelevant in qr(A, tol, RV LAPACK=TRUE)$rank, i.e. LAPACK doesn't seem to utilize RV the tol parameter when computing the rank. However, RV qr(A, tol, LAPACK=FALSE)$rank does depend on tol. Yes, indeed! The help page of qr() actually says so {at least now, don't know about 3 months ago}. RV Now, here are the results: RV 1. hilbert.rank - matrix(NA,20,3) hilbert - function(n) { i - 1:n; 1 / outer(i - 1, i, +) } for (i in 1:20) { RV + himat - hilbert(i) RV + hilbert.rank[i,1] - rankMat(himat) RV + hilbert.rank[i,2] - qr(himat,tol=1.0e-14)$rank RV + hilbert.rank[i,3] - qr(himat, tol = .Machine$double.eps, LAPACK = TRUE)$rank RV + } hilbert.rank RV [,1] [,2] [,3] RV [1,]111 RV [2,]222 RV [3,]333 RV [4,]444 RV [5,]555 RV [6,]666 RV [7,]777 RV [8,]888 RV [9,]999 RV [10,] 10 10 10 RV [11,] 10 11 11 RV [12,] 11 12 12 RV [13,] 11 12 13 RV [14,] 11 13 14 RV [15,] 12 13 15 RV [16,] 12 15 16 RV [17,] 12 16 17 RV [18,] 12 16 18 RV [19,] 13 17 19 RV [20,] 13 17 20 RV We see that rankMat underestimates the rank for n 10, but qr(himat, tol = .Machine$double.eps, LAPACK=TRUE)$rank gets it right. Yes, indeed; and that's seems a bit against the ``common knowledge'' that svd() is more reliable than qr() Hmm I'm still baffled a bit.. Note that with the Hilbert matrices, one might argue that hilbert(20) might really not have a correct estimated rank of 20, but at least for hilbert(13) or so, the rank should be == n. BTW, there's a nice plot, slightly related to this problem, using rcond() from the Matrix package : library(Matrix) hilbert - function(n) { i - 1:n; 1 / outer(i - 1, i, +) } rcHilb - sapply(1:20, function(n) rcond(Matrix(hilbert(n plot(rcHilb, xlab = n, log = y, col = 2, type = b, main = reciprocal condition numbers of Hilbert(n)) where one sees that the LAPACK code that underlies Matrix::rcond() also gets into problem when estimating the condition number for hilbert(n) when n ~ 16 . I think if we wanted to make real progress here, we'd have to consult with numerical analyist specialists. But for me the topic is too remote to be followed up further at the moment. One conclusion for me is that to estimate the rank of a matrix in current versions of R, one should use rankMat - function(x) qr(x, LAPACK = TRUE)$rank (as was suggested as one possibility in the original
Re: [R] Linear programming question
Tobias, Adding the first constraints yields: S1 + S2 = -2A Similarly adding the second set of constraints: S3 + S4 = 2B If A and B are positive (which you didn't specify) then The minimum of S1+S2 is -2A, and the maximum of S3+S4 is 2B. Thus, the minimum of S1+S2-S3-S4 is -2(A+B). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tobias Schlottmann Sent: Wednesday, July 18, 2007 1:24 PM To: r-help@stat.math.ethz.ch Subject: [R] Linear programming question Hi everybody, consider please an optimization problem: minimize sum S1+S2 Subject to : y - x = A + S1 x - y = A + S2 and we want to add two more constraints: y - x = B - S3 x - y = B - S4 where A is a small constant value and B is a large constant value, S1 and S2 are surplus and S3 and S4 are slack variables. S3 and S4 have to be maximized in objective function. As objective function, is this correct? : minimize sum S1+ S2 - S3 -S4 where actually we want to minimize S1 and S2; and maximize S3 and S4. If it is not correct, what to do ? Thank you for any guide. Tobias - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear programming question
Tobias, Just a clarification/correction to my solution: it makes no difference whether A and B are positive or negative. The minimum of S1+S2-S3-S4 is always -2(A+B). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Wednesday, July 18, 2007 1:52 PM To: 'Tobias Schlottmann'; r-help@stat.math.ethz.ch Subject: Re: [R] Linear programming question Tobias, Adding the first constraints yields: S1 + S2 = -2A Similarly adding the second set of constraints: S3 + S4 = 2B If A and B are positive (which you didn't specify) then The minimum of S1+S2 is -2A, and the maximum of S3+S4 is 2B. Thus, the minimum of S1+S2-S3-S4 is -2(A+B). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tobias Schlottmann Sent: Wednesday, July 18, 2007 1:24 PM To: r-help@stat.math.ethz.ch Subject: [R] Linear programming question Hi everybody, consider please an optimization problem: minimize sum S1+S2 Subject to : y - x = A + S1 x - y = A + S2 and we want to add two more constraints: y - x = B - S3 x - y = B - S4 where A is a small constant value and B is a large constant value, S1 and S2 are surplus and S3 and S4 are slack variables. S3 and S4 have to be maximized in objective function. As objective function, is this correct? : minimize sum S1+ S2 - S3 -S4 where actually we want to minimize S1 and S2; and maximize S3 and S4. If it is not correct, what to do ? Thank you for any guide. Tobias - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization
Hi, Your problem can be solved analytically. Eliminate one of the variables, say x3, from the problem by using the equality x1 + x2 + x3 = 1. Then solve for the intersection of the circle (in x1 and x2) defined by the radical constraint, with the straight line defined by the objective function. There will be, at most, two intersection points. The extremum has to be one of these two points, provided they also satisfy the other inequalities (To me, this sounds an awful lot like a homework problem). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of massimiliano.talarico Sent: Monday, July 16, 2007 4:50 PM To: r-help Subject: [R] Optimization Dear all, I need a suggest to obtain the max of this function: Max x1*0.021986+x2*0.000964+x3*0.02913 with these conditions: x1+x2+x3=1; radq((x1*0.114434)^2+(x2*0.043966)^2+(x3*0.100031)^2)=0.04; x1=0; x1=1; x2=0; x2=1; x3=0; x3=1; Any suggests ? Thanks in advanced, Massimiliano __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nearest correlation to polychoric
Martin, I sent you the Matlab code for this which I had obtained from Nich Higham. Cheng, Sheung Hun and Higham, Nick (1998) A Modified Cholesky Algorithm Based on a Symmetric Indefinite Factorization; \emph{SIAM J. Matrix Anal.\ Appl.}, \bold{19}, 1097--1110. Do you remember? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Maechler Sent: Friday, July 13, 2007 10:29 AM To: Dimitris Rizopoulos Cc: Jens Oehlschlägel; r-help@stat.math.ethz.ch Subject: Re: [R] nearest correlation to polychoric DR == Dimitris Rizopoulos [EMAIL PROTECTED] on Fri, 13 Jul 2007 14:43:08 +0200 writes: DR you could also have a look at function posdefify() from DR package `sfsmisc'. DR I hope it helps. Yes, thanks, Dimitris; note that my posdefify() function uses a pretty arbitrary fudge value for posdefification, namely eps.ev = 1e-07 As a matter of fact, earlier this week (in the first international R/Rmetrics workshop), I've talked to people from finance who also need that (or something better?). Jordi Molins Coronado (Madrid) drew my attention to an idea he found in the book (English re-edition of French as of 1996) Jean-Philippe Bouchaud (2000) Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management which supposedly uses theory of random matrices and the entailing distribution of random eigenvalues in order to find a more sensible cutoff than my 'eps.ev' default of 1e-7. Unfortunately that book is checked out from our library and I can't have a look. Googling and Wikipedia seem to indicate to me that most of the random matrix theory does not directly apply here, since I'm really interested in the spectrum of X'X where X is a de-meaned n x p random matrix. OTOH, help(posdefify) already mentions more sophisticated approaches to the problem, the one I (as I vaguely remember) should be made available being Cheng, Sheung Hun and Higham, Nick (1998) A Modified Cholesky Algorithm Based on a Symmetric Indefinite Factorization; \emph{SIAM J. Matrix Anal.\ Appl.}, \bold{19}, 1097--1110. Jens, could you make your code (mentioned below) available to the community, or even donate to be included as a new method of posdefify() ? Regards, Martin Maechler, ETH Zurich DR Best, Dimitris DR Dimitris Rizopoulos Ph.D. Student Biostatistical DR Centre School of Public Health Catholic University of DR Leuven DR Address: Kapucijnenvoer 35, Leuven, Belgium Tel: DR +32/(0)16/336899 Fax: +32/(0)16/337015 Web: DR http://med.kuleuven.be/biostat/ DR http://www.student.kuleuven.be/~m0390867/dimitris.htm DR - Original Message - From: Jens Oehlschlägel DR [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: DR Friday, July 13, 2007 2:25 PM Subject: [R] nearest DR correlation to polychoric DR Dear all, DR Has someone implemented in R (or any other language) DR Knol DL, ten Berge JMF. Least-squares approximation of DR an improper correlation matrix by a proper one. DR Psychometrika, 1989, 54, 53-61. DR or any other similar algorithm? DR Best regards DR Jens Oehlschlägel DR Background: DR I want to factanal() matrices of polychoric correlations DR which have negative eigenvalue. I coded DR Highham 2002 Computing the nearest correlation matrix - DR a problem from finance, IMA Journal of Numerical DR Analysis (2002), 22, 329-343. DR which basically works but still leaves very small DR negative eigenvalues which causes factanal() to fail DR with factanal(cov=ncor$cor, factors=2) DR Fehler in optim(start, FAfn, FAgr, method = L-BFGS-B, DR lower = lower, : L-BFGS-B benötigt endliche Werte von DR 'fn' Zusätzlich: Warning message: NaNs wurden erzeugt DR in: log(x) version DR_ platform i386-pc-mingw32 arch i386 os DR mingw32 system i386, mingw32 status major 2 minor 4.1 DR year 2006 month 12 day 18 svn rev 40228 language R DR version.string R version 2.4.1 (2006-12-18) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman
Re: [R] Stepwise GLM selection by LRT?
Check out the stepAIC function in MASS package. This is a nice tool, where you can actually implement any penalty even though the function's name has AIC in it because it is the default. Although this doesn't do an LRT test based variable selection, you can sort of approximate it by using a penalty of k = qchisq(1-p, df=1), where p is the p-value for variable selection. This penalty means that a variable enters/exits an existing model, when the absolute value of change in log-likelihood is greater than qchisq(1-p, df=1). For p = 0.1, k = 2.71, and for p=0.05, k = 3.84. Is this whhant you'd like to do? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lutz Ph. Breitling Sent: Wednesday, July 11, 2007 3:06 PM To: r-help@stat.math.ethz.ch Subject: [R] Stepwise GLM selection by LRT? Dear List, having searched the help and archives, I have the impression that there is no automatic model selection procedure implemented in R that includes/excludes predictors in logistic regression models based on LRT P-values. Is that true, or is someone aware of an appropriate function somewhere in a custom package? Even if automatic model selection and LRT might not be the most appropriate methods, I actually would like to use these in order to simulate someone else's modeling approach... Many thanks for all comments- Lutz - Lutz Ph. Breitling German Cancer Research Center Heidelberg/Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integration over a simplex
Hi Robin, A Monte-Carlo approach could be attempted, if one could generate samples that are either uniformly distributed over the simplex. There is a small section in Luc Devroye's book (Generation of Non-uniform random deviates) on random uniform sampling from a simplex, if I remeber correctly. Another approach is importance sampling, where the sampling points have a characterized distribution. I have seen a technique called polyEDA, based on Gibbs sampling and truncated multivariate normal distribution. I had previously emailed the authors of this approach for the code, but haven't received a reply yet. You can google polyEDA for more info. I am interested in various computational problems related to polyhedra (e.g. enumeration of vertices, locating extrema, random sampling). I would appreciate if you'd keep me posted on how you solved this problem. Best, Ravi. - Original Message - From: Robin Hankin [EMAIL PROTECTED] Date: Tuesday, July 10, 2007 6:58 am Subject: [R] integration over a simplex To: RHelp help r-help@stat.math.ethz.ch Hello The excellent adapt package integrates over multi-dimensional hypercubes. I want to integrate over a multidimensional simplex. Has anyone implemented such a thing in R? I can transform an n-simplex to a hyperrectangle but the Jacobian is a rapidly-varying (and very lopsided) function and this is making adapt() slow. [ A \dfn{simplex} is an n-dimensional analogue of a triangle or tetrahedron. It is the convex hull of (n+1) points in an n-dimensional Euclidean space. My application is a variant of the Dirichlet distribution: With p~D(a), if length(p) = n+1 then the requirement that all(p0) and sum(p)=1 mean that the support of the Dirichlet distribution is an n-simplex. ] -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] No convergence using ADAPT
Since the covariance is zero (i.e. you have independent normals), you can simplify your problem so that you just need to perform one-dimensional integration. Here is how you can do it: trial2 - function(input) { #pmvnorm(lower = c(0,0), upper = c(10, 10), mean = input, sigma = matrix(c(.01, 0, 0, .01), nrow = 2, ncol = 2, byrow = FALSE)) pnorm(q=10, mean = input, sd = sqrt(.01)) - pnorm(q=0, mean = input, sd = sqrt(.01)) } bottomB - -5*sqrt(.01) topB - 10 + 5*sqrt(.01) areaB - (topB - bottomB)^2 new.ans - 1/areaB * (integrate(f=trial2, lo = bottomB, up = topB)$val)^2 new.ans [1] 0.8264463 Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philip Turk Sent: Saturday, July 07, 2007 3:20 PM To: r-help@stat.math.ethz.ch Subject: [R] No convergence using ADAPT I am trying calculate a probability using numerical integration. The first program I ran spit out an answer in a very short time. The program is below: ## START PROGRAM trial - function(input) { pmvnorm(lower = c(0,0), upper = c(2, 2), mean = input, sigma = matrix(c(.1, 0, 0, .1), nrow = 2, ncol = 2, byrow = FALSE)) } require(mvtnorm) require(adapt) bottomB - -5*sqrt(.1) topB - 2 + 5*sqrt(.1) areaB - (topB - bottomB)^2 unscaled.Po.in.a - adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB), minpts = 1000, eps = 1e-4, functn = trial) (1/areaB)*unscaled.Po.in.a$value ## FINISH PROGRAM I tried to run the program again changing a.) sigma in the trial function, b.) upper in the trial function, and c.) the bounds of integration; that is, bottomB and topB. The new program is below: ## START PROGRAM trial - function(input) { pmvnorm(lower = c(0,0), upper = c(10, 10), mean = input, sigma = matrix(c(.01, 0, 0, .01), nrow = 2, ncol = 2, byrow = FALSE)) } require(mvtnorm) require(adapt) bottomB - -5*sqrt(.01) topB - 10 + 5*sqrt(.01) areaB - (topB - bottomB)^2 unscaled.Po.in.a - adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB), minpts = 1000, eps = 1e-4, functn = trial) (1/areaB)*unscaled.Po.in.a$value ## FINISH PROGRAM Now, the program just runs and runs (48 hours at last count!). By playing around with the program, I have deduced the program is highly sensitive to changing the upper option in the trial function. For example, using a vector like (4, 4) causes no problems and the program quickly yields an answer. I have a couple of other programs where I can easily obtain a simulation-based answer, but I would ultimately like to know what's up with this program before I give up on it so I can learn a thing or two. Does anyone have any clues or tricks to get around this problem? My guess is that it will simply be very difficult (impossible?) to obtain this type of relative error (eps = 1e-4) and I will have no choice but to pursue the simulation approach. Thanks for any responses ([EMAIL PROTECTED])! -- Phil Philip Turk Assistant Professor of Statistics Northern Arizona University Department of Mathematics and Statistics PO Box 5717 Flagstaff, AZ 86011 Phone: 928-523-6884 Fax: 928-523-5847 E-mail: [EMAIL PROTECTED] Web Site: http://jan.ucc.nau.edu/~stapjt-p/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incidence estimated from Kaplan-Meier
The 1-Pr(disease free survival) estimate from KM is not appropriate if competing risk of mortality (from causes other than the disease of interest) are present. In that case, 1-Pr(disease free survival) over-estimates the cumulative incidence of disease. The larger the hazard of mortality, the larger the over-estimation. This is a well-known phenomenon in the competing risks literature. See, for example, Gooley et al. (Stats in Med 1999). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Thursday, July 05, 2007 8:48 AM To: Nguyen Dinh Nguyen Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Incidence estimated from Kaplan-Meier Nguyen Dinh Nguyen wrote: Dear all, I have a stat question that may not be related to R, but I would like to have your advice. I have just read a medical paper in which the authors report the 1-p (where p is the cumulative survival probability from the Kaplan Meier curve) as incidence of disease. Specifically, the study followed ~12000 women on drug A and ~2 women on drug B for 12 months. During that period 29 women on drug A and 80 on drug B had the disease. The incidence of disease for A and B was 0.24% and 0.30% respectively. However, instead of reporting these numbers, they report the 1-p figure which was 0.3% for A and 0.6% for B. So, the incidence from 1-p was substantially higher than the actual incidence. My question is: is it appropriate to use 1-p estimated from Kaplan-Meier as the incidence of disease? If not, why not? Regards, Nguyen Yes it's appropriate, and it makes you state the cumulative incidence by time t rather than leaving time unspecified. In your example it is likely that all women weren't followed completely, so simple incidences are not appropriate to compute because the denominator is not constant. Frank Nguyen Dinh Nguyen, Bone and Mineral Research Program Garvan Institute of Medical Research St Vincent's Hospital 384 Victoria Street, Darlinghurst Sydney, NSW 2010 Australia Tel; 61-2-9295 8274 Fax: 61-2-9295 8241 E-mail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected result in function valuation
The problem is that you are dividing two numbers that are both very small. Any small imprecision in the denominator creates a big error. Here you can re-write your function using a trig identity to get accurate results: sinca - function(N,th) { #return(sin((N+0.5)* th)/sin(0.5*th)) return( (sin(N*th)/tan(th/2)) + cos(N*th)) } This function works well, as you had expected. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of James Foadi Sent: Thursday, July 05, 2007 1:39 PM To: r-help@stat.math.ethz.ch Subject: [R] unexpected result in function valuation Dear all, I have a very small script to plot a function. Here it is: ## sinca - function(N,th) { return(sin((N+0.5)*th)/sin(0.5*th)) } plot_sinca - function(N) { x - seq(-5*pi,5*pi,by=pi/100) y - rep(0,length=length(x)) for (i in 1:length(x))y[i] - sinca(N,x[i]) plot(x,y,type=l,ylim=c(0,2*N+4)) return(c(x,y)) } ## When I load the script and run the function like this: ### data - plot_sinca(4) y - data[1002:2002] ### I notice a spike on the plot which should not be there. In fact I have checked and: ### y[701] [1] 10.07404 sinca(4,2*pi) [1] 9 ### The second result is the correct one. Why, then do I get the y[701]=10.07404? This function is not supposed to be higher than 9... Any help is greatly appreciated. Regards, J Dr James Foadi Membrane Protein Laboratory Diamond Light Source Ltd Chilton, Didcot Oxfordshire OX11 0DE --- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to solve a min problem
Whether you can use optim or not depends on the nature of the constraints on S. If you have simple box constraints, you can use the L-BFGS-B method in optim. If not, optim may not be directly applicable, unless you can somehow transform your problem into an unconstrained minimization problem. Ravi. - Original Message - From: domenico pestalozzi [EMAIL PROTECTED] Date: Wednesday, July 4, 2007 11:26 am Subject: Re: [R] how to solve a min problem To: R-help r-help@stat.math.ethz.ch S is an array 1-dimensional, for example 1 X 10, and mean(S) is the mean of these 10 elements. So, I want to do: minimize mean(S) with 0 b_func(S) 800. That is, there are some boundaries on S according the b_funct The function apply an iterative convergent criterion: f_1=g(S), f_2=g(f_1), f_3=g(f_2), ecc The function stops when f_n - f_n-1 =0.1e-09 and g(S) is a non-linear function of S and the convergence is mathematically assured. Is it possible to use 'optimize'? thanks domenico 2007/7/3, Spencer Graves [EMAIL PROTECTED]: Do you mean minimize mu with 0 b_func(S+mu) 800? For this kind of problem, I'd first want to know the nature of b_func. Without knowing more, I might try to plot b_func(S+mu) vs. mu, then maybe use 'optimize'. If this is not what you mean, please be more specific: I'm confused. Hope this helps. Spencer Graves domenico pestalozzi wrote: I know it's possible to solve max e min problems by using these functions: nlm, optimize, optim but I don't know how to use them (...if possible...) to solve this problem. I have a personal function called b_func(S) where S is an input array (1 X n) and I'd like: minimize mean(S) with 0 b_funct 800. I know that the solution exists, but It's possible to calculate it in R? The b_func is non linear and it calculates a particular value using S as input and applying a convergent iterative algorithm. thanks domenico [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to solve a min problem
If the constraints on S are linear inequalities, then linear programming methods would work. See function solveLP in package linprog or simplex in boot or package lpSolve. Ravi. - Original Message - From: domenico pestalozzi [EMAIL PROTECTED] Date: Wednesday, July 4, 2007 11:26 am Subject: Re: [R] how to solve a min problem To: R-help r-help@stat.math.ethz.ch S is an array 1-dimensional, for example 1 X 10, and mean(S) is the mean of these 10 elements. So, I want to do: minimize mean(S) with 0 b_func(S) 800. That is, there are some boundaries on S according the b_funct The function apply an iterative convergent criterion: f_1=g(S), f_2=g(f_1), f_3=g(f_2), ecc The function stops when f_n - f_n-1 =0.1e-09 and g(S) is a non-linear function of S and the convergence is mathematically assured. Is it possible to use 'optimize'? thanks domenico 2007/7/3, Spencer Graves [EMAIL PROTECTED]: Do you mean minimize mu with 0 b_func(S+mu) 800? For this kind of problem, I'd first want to know the nature of b_func. Without knowing more, I might try to plot b_func(S+mu) vs. mu, then maybe use 'optimize'. If this is not what you mean, please be more specific: I'm confused. Hope this helps. Spencer Graves domenico pestalozzi wrote: I know it's possible to solve max e min problems by using these functions: nlm, optimize, optim but I don't know how to use them (...if possible...) to solve this problem. I have a personal function called b_func(S) where S is an input array (1 X n) and I'd like: minimize mean(S) with 0 b_funct 800. I know that the solution exists, but It's possible to calculate it in R? The b_func is non linear and it calculates a particular value using S as input and applying a convergent iterative algorithm. thanks domenico [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fine tunning rgenoud
Paul, You had indicated in your previous email that you are having trouble finding a feasible starting value for constrOptim(). So, you basically need to solve a system of linear inequalities to obtain a starting point. Have you considered using linear programming? Either simplex() in the boot package or solveLP() in linprog would work. It seems to me that you could use any linear objective function in solveLP to obtain a feasible starting point. This is not the most efficient solution, but it might be worth a try. I am aware of other methods for generating n-tuples that satisfy linear inequality constraints, but AFAIK those are not available in R. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Tuesday, July 03, 2007 4:10 PM To: R-help Subject: [R] Fine tunning rgenoud Dear All, I am trying to solve the following maximization problem, but I cannot have rgenoud giving me a reliable solution. Any ideas? Thanks in advance, Paul library(rgenoud) v - 0.90 O1 - 10 O2 - 20 O0 - v*O1+(1-v)*O2 myfunc - function(x) { U0 - x[1] U1 - x[2] U2 - x[3] q0 - x[4] q1 - x[5] q2 - x[6] p - x[7] if (U0 0) return(-1e+200) else if (U1 0) return(-1e+200) else if (U2 0) return(-1e+200) else if ((U0-(U1+(O1-O0)*q1)) 0) return(-1e+200) else if ((U0-(U2+(O2-O0)*q2)) 0) return(-1e+200) else if ((U1-(U0+(O0-O1)*q0)) 0) return(-1e+200) else if ((U1-(U2+(O2-O1)*q2)) 0) return(-1e+200) else if((U2-(U0+(O0-O2)*q0)) 0) return(-1e+200) else if((U2-(U1+(O1-O2)*q1)) 0) return(-1e+200) else if(p 0) return(-1e+200) else if(p 1) return(-1e+200) else if(q0 0) return(-1e+200) else if(q1 0) return(-1e+200) else if(q2 0) return(-1e+200) else return(p*(sqrt(q0)-(O0*q0+U0))+(1-p)*(v*(sqrt(q1)-(O1*q1+U1))+(1-v)*(sqrt(q2 )-(O2*q2+U2 } genoud(myfunc,nvars=7,max=T,pop.size=6000,starting.values=runif(7),wait.gene rations=150,max.generations=300,boundary.enforcement=2) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fine tunning rgenoud
Paul, It should be easy enough to check that your solution is valid (i.e. a local minimum): first, check to see if the solution satisfies all the constraints; secondly, check to see if it is an interior point (i.e. none of the constraints become equality); and finally, if the solution is an interior point, check to see whether the gradient there is close to zero. Note that if the solution is one of the vertices of the polyhedron, then the gradient may not be zero. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Tuesday, July 03, 2007 5:10 PM To: R-help Subject: Re: [R] Fine tunning rgenoud On 7/3/07, Ravi Varadhan [EMAIL PROTECTED] wrote: You had indicated in your previous email that you are having trouble finding a feasible starting value for constrOptim(). So, you basically need to solve a system of linear inequalities to obtain a starting point. Have you considered using linear programming? Either simplex() in the boot package or solveLP() in linprog would work. It seems to me that you could use any linear objective function in solveLP to obtain a feasible starting point. This is not the most efficient solution, but it might be worth a try. I am aware of other methods for generating n-tuples that satisfy linear inequality constraints, but AFAIK those are not available in R. Thanks, Ravi. I had already conceived the solution that you suggest, actually using lpSolve. I am able to get a solution for my problem with constrOptim, but I am not enough confident that the solution is right. That is why I am trying to get a solution with rgenoud, but unsuccessfully until now. Paul -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Tuesday, July 03, 2007 4:10 PM To: R-help Subject: [R] Fine tunning rgenoud Dear All, I am trying to solve the following maximization problem, but I cannot have rgenoud giving me a reliable solution. Any ideas? Thanks in advance, Paul library(rgenoud) v - 0.90 O1 - 10 O2 - 20 O0 - v*O1+(1-v)*O2 myfunc - function(x) { U0 - x[1] U1 - x[2] U2 - x[3] q0 - x[4] q1 - x[5] q2 - x[6] p - x[7] if (U0 0) return(-1e+200) else if (U1 0) return(-1e+200) else if (U2 0) return(-1e+200) else if ((U0-(U1+(O1-O0)*q1)) 0) return(-1e+200) else if ((U0-(U2+(O2-O0)*q2)) 0) return(-1e+200) else if ((U1-(U0+(O0-O1)*q0)) 0) return(-1e+200) else if ((U1-(U2+(O2-O1)*q2)) 0) return(-1e+200) else if((U2-(U0+(O0-O2)*q0)) 0) return(-1e+200) else if((U2-(U1+(O1-O2)*q1)) 0) return(-1e+200) else if(p 0) return(-1e+200) else if(p 1) return(-1e+200) else if(q0 0) return(-1e+200) else if(q1 0) return(-1e+200) else if(q2 0) return(-1e+200) else return(p*(sqrt(q0)-(O0*q0+U0))+(1-p)*(v*(sqrt(q1)-(O1*q1+U1))+(1-v)*(sqrt(q2 )-(O2*q2+U2 } genoud(myfunc,nvars=7,max=T,pop.size=6000,starting.values=runif(7),wait.gene rations=150,max.generations=300,boundary.enforcement=2) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fine tunning rgenoud
Paul, Here is another approach: I wrote an R function that would generate interior points as starting values for constrOptim. This might work better than the LP approach, since the LP approach gives you a starting value that is on the boundary of the feasible region, i.e a vertex of the polyhedron, whereas this new approach gives you points on the interior. You can generate as many points as you wish, but the approach is brute-force and is very inefficient - it takes on the order of a 1000 tries to find one feasible point. Hope this helps, Ravi. - Original Message - From: Paul Smith [EMAIL PROTECTED] Date: Tuesday, July 3, 2007 7:32 pm Subject: Re: [R] Fine tunning rgenoud To: R-help r-help@stat.math.ethz.ch On 7/4/07, Ravi Varadhan [EMAIL PROTECTED] wrote: It should be easy enough to check that your solution is valid (i.e. a local minimum): first, check to see if the solution satisfies all the constraints; secondly, check to see if it is an interior point (i.e. none of the constraints become equality); and finally, if the solution is an interior point, check to see whether the gradient there is close to zero. Note that if the solution is one of the vertices of the polyhedron, then the gradient may not be zero. I am having bad luck: all constraints are satisfied, but the solution given by constrOptim is not interior; the gradient is not equal to zero. Paul -Original Message- From: [EMAIL PROTECTED] [ On Behalf Of Paul Smith Sent: Tuesday, July 03, 2007 5:10 PM To: R-help Subject: Re: [R] Fine tunning rgenoud On 7/3/07, Ravi Varadhan [EMAIL PROTECTED] wrote: You had indicated in your previous email that you are having trouble finding a feasible starting value for constrOptim(). So, you basically need to solve a system of linear inequalities to obtain a starting point. Have you considered using linear programming? Either simplex() in the boot package or solveLP() in linprog would work. It seems to me that you could use any linear objective function in solveLP to obtain a feasible starting point. This is not the most efficient solution, but it might be worth a try. I am aware of other methods for generating n-tuples that satisfy linear inequality constraints, but AFAIK those are not available in R. Thanks, Ravi. I had already conceived the solution that you suggest, actually using lpSolve. I am able to get a solution for my problem with constrOptim, but I am not enough confident that the solution is right. That is why I am trying to get a solution with rgenoud, but unsuccessfully until now. Paul -Original Message- From: [EMAIL PROTECTED] [ On Behalf Of Paul Smith Sent: Tuesday, July 03, 2007 4:10 PM To: R-help Subject: [R] Fine tunning rgenoud Dear All, I am trying to solve the following maximization problem, but I cannot have rgenoud giving me a reliable solution. Any ideas? Thanks in advance, Paul library(rgenoud) v - 0.90 O1 - 10 O2 - 20 O0 - v*O1+(1-v)*O2 myfunc - function(x) { U0 - x[1] U1 - x[2] U2 - x[3] q0 - x[4] q1 - x[5] q2 - x[6] p - x[7] if (U0 0) return(-1e+200) else if (U1 0) return(-1e+200) else if (U2 0) return(-1e+200) else if ((U0-(U1+(O1-O0)*q1)) 0) return(-1e+200) else if ((U0-(U2+(O2-O0)*q2)) 0) return(-1e+200) else if ((U1-(U0+(O0-O1)*q0)) 0) return(-1e+200) else if ((U1-(U2+(O2-O1)*q2)) 0) return(-1e+200) else if((U2-(U0+(O0-O2)*q0)) 0) return(-1e+200) else if((U2-(U1+(O1-O2)*q1)) 0) return(-1e+200) else if(p 0) return(-1e+200) else if(p 1) return(-1e+200) else if(q0 0) return(-1e+200) else if(q1 0) return(-1e+200) else if(q2 0) return(-1e+200) else return(p*(sqrt(q0)-(O0*q0+U0))+(1-p)*(v*(sqrt(q1)-(O1*q1+U1))+(1-v)*(sqrt(q2 )-(O2*q2+U2 } genoud(myfunc,nvars=7,max=T,pop.size=6000,starting.values=runif(7),wait.gene rations=150,max.generations=300,boundary.enforcement=2) __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal
Re: [R] Question about PCA with prcomp
Mark, What you are referring to deals with the selection of covariates, since PC doesn't do dimensionality reduction in the sense of covariate selection. But what Mark is asking for is to identify how much each data point contributes to individual PCs. I don't think that Mark's query makes much sense, unless he meant to ask: which individuals have high/low scores on PC1/PC2. Here are some comments that may be tangentially related to Mark's question: 1. If one is worried about a few data points contributing heavily to the estimation of PCs, then one can use robust PCA, for example, using robust covariance matrices. MASS has some tools for this. 2. The biplot for the first 2 PCs can give some insights 3. PCs, especially, the last few PCs, can be used to identify outliers. Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Difford Sent: Monday, July 02, 2007 1:55 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp Hi James, Have a look at Cadima et al.'s subselect package [Cadima worked with/was a student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes part of a Chapter to this question in his text (Principal Component Analysis, pub. Springer)]. Then you should look at psychometric stuff: a good place to start would be Professor Revelle's psych package. BestR, Mark. James R. Graham wrote: Hello All, The basic premise of what I want to do is the following: I have 20 entities for which I have ~500 measurements each. So, I have a matrix of 20 rows by ~500 columns. The 20 entities fall into two classes: good and bad. I eventually would like to derive a model that would then be able to classify new entities as being in good territory or bad territory based upon my existing data set. I know that not all ~500 measurements are meaningful, so I thought the best place to begin would be to do a PCA in order to reduce the amount of data with which I have to work. I did this using the prcomp function and found that nearly 90% of the variance in the data is explained by PC1 and 2. So far, so good. I would now like to find out which of the original ~500 measurements contribute to PC1 and 2 and by how much. Any tips would be greatly appreciated! And apologies in advance if this turns out to be an idiotic question. james __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a1139860 8 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
The PCs that are associated with the smaller eigenvalues. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Patrick Connolly [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: | Mark, | | What you are referring to deals with the selection of covariates, since PC | doesn't do dimensionality reduction in the sense of covariate selection. | But what Mark is asking for is to identify how much each data point | contributes to individual PCs. I don't think that Mark's query makes much | sense, unless he meant to ask: which individuals have high/low scores on | PC1/PC2. Here are some comments that may be tangentially related to Mark's | question: | | 1. If one is worried about a few data points contributing heavily to the | estimation of PCs, then one can use robust PCA, for example, using robust | covariance matrices. MASS has some tools for this. | 2. The biplot for the first 2 PCs can give some insights | 3. PCs, especially, the last few PCs, can be used to identify outliers. What is meant by last few PCs? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dominant eigenvector displayed as third (Marco Visser)
Yes, Spencer, your observation is correct, because the characeristic equation det(A - \lambda*I) is a sixth degree polynomial: \lambda^6 - 5 = 0. So the eigenvalues are the complex numbers (generally) that are located at equal angles on the circle of radius 5^(1/6), at angles 2*pi*k/6, where k runs from 0 to 5. Thus, the roots are: z_k = 5^(1/6) * exp(i * 2*pi*k/6), k= 0, 1, ..., 5. where i = sqrt(-1). Ravi. - Original Message - From: Spencer Graves [EMAIL PROTECTED] Date: Friday, June 29, 2007 6:51 pm Subject: Re: [R] Dominant eigenvector displayed as third (Marco Visser) To: Marco Visser [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch There is no dominant eigenvalue: The eigenvalues of that matrix are the 6 different roots of 5. All have modulus (or absolute value) = 1.307660. When I raised them all to the 6th power, all 6 were 5+0i. Someone else can tell us why this is, but this should suffice as an initial answer to your question. Hope this helps. Spencer Graves Marco Visser wrote: Dear R users Experts, This is just a curiousity, I was wondering why the dominant eigenvetor and eigenvalue of the following matrix is given as the third. I guess this could complicate automatic selection procedures. 000005 100000 010000 001000 000100 000010 Please copy paste the following into R; a=c(0,0,0,0,0,5,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0) mat=matrix(a, ncol=6,byrow=T) eigen(mat) The matrix is a population matrix for a plant pathogen (Powell et al 2005). Basically I would really like to know why this happens so I will know if it can occur again. Thanks for any comments, Marco Visser Comment: In Matlab the the dominant eigenvetor and eigenvalue of the described matrix are given as the sixth. Again no idea why. reference J. A. Powell, I. Slapnicar and W. van der Werf. Epidemic spread of a lesion-forming plant pathogen - analysis of a mechanistic model with infinite age structure. (2005) Linear Algebra and its Applications 298. p 117-140. Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solving a system of nonlinear equations
Hi All, Here is a modification of nlsolve() that I had submitted before. The new version of nlsolve() has the option to generate multiple random starting values in order to improve the chances of locating the zeros of the nonlinear system. The last test example in the attached file, the Freudenstein-Roth function, shows the usefulness of the multiple random starts. Hope this is useful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ## nlsolve - function(par, fn, method=BFGS, nstarts = 1, ...) { ### # A solver for finding a zero of a system of non-linear equations # Actually minimizes the squared-norm of the set of functions by calling optim() # Uses the BFGS algorithm within optim() # All the control parameters can be passed as in the call to optim() # Uses a random multiple start approach to locate the zeros of the system # # Author: Ravi Varadhan, Center on Aging and Health, Johns Hopkins University, [EMAIL PROTECTED] # # June 21, 2007 # func - function(x, ...) sum(fn(x, ...)^2) ans - optim(par, fn=func, method=method, ...) err - sqrt(ans$val/length(par)) istart - 1 while (err 0.01 istart nstarts) { par - par * ( 1 + runif(length(par), -1, 1) ) # generating random starting values ans - try (optim(par, fn=func, method=method, ...)) if (class(ans) != try-error) err - sqrt(ans$val/length(par)) istart - istart + 1 } if (err 0.01) cat( \n Caution: Zero of the function has not been located!!! \n Increase nstart \n \n) ans } # ## # # Some examples illustrating the use of nlsolve() # expo1 - function(p) { # From La Cruz and Raydan, Optim Methods and Software 2003, 18 (583-599) n - length(p) f - rep(NA, n) f[1] - exp(p[1] - 1) - 1 f[2:n] - (2:n) * (exp(p[2:n] - 1) - p[2:n]) f } n - 50 p0 - runif(n) nlsolve(par=p0, fn=expo1) expo3 - function(p) { # From La Cruz and Raydan, Optim Methods and Software 2003, 18 (583-599) n - length(p) f - rep(NA, n) onm1 - 1:(n-1) f[onm1] - onm1/10 * (1 - p[onm1]^2 - exp(-p[onm1]^2)) f[n] - n/10 * (1 - exp(-p[n]^2)) f } n - 15 p0 - (1:n)/(4*n^2) nlsolve(par=p0, fn=expo3) func1 - function(x) { f - rep(0,3) f[1] - 1/3*cos(x[2]*x[3]) - x[1] + 1/6 f[2] - 1/9 * sqrt(x[1]^2 + sin(x[3])+ 1.06) - x[2] - 0.1 f[3] - -1/20*exp(-x[1]*x[2]) - x[3] - (10*pi - 3)/60 f } p0 - runif(3) nlsolve(par=p0, fn=func1) # This example illustrates the use of multiple random starts to obtain the global minimum of zero (i.e. the roots of the system) # froth - function(p){ # Freudenstein and Roth function (Broyden, Mathematics of Computation 1965, p. 577-593) f - rep(NA,length(p)) f[1] - -13 + p[1] + (p[2]*(5 - p[2]) - 2) * p[2] f[2] - -29 + p[1] + (p[2]*(1 + p[2]) - 14) * p[2] f } p0 - c(3,2) # this gives the zero of the system nlsolve(par=p0, fn=froth) p0 - c(1,1) # this gives the local minimum that is not the zero of the system nlsolve(par=p0, fn=froth) nlsolve(par=p0, fn=froth, nstart=100) p0 - rpois(2,10) nlsolve(par=p0, fn=froth) nlsolve(par=p0, fn=froth, nstart=10) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher information matrix
Hi, You can use the function hessian() in the package numDeriv. This will yield a very accurate estimate of the observed Fisher information matrix. library(numDeriv) ?hessian Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Federico Calboli Sent: Tuesday, June 26, 2007 7:58 AM To: r-help Cc: Gambhir, Manoj Subject: [R] fisher information matrix Hi All, a colleague wants to calculate the Fisher information matrix for a model he wrote (not in R). He can easily get the neg-log-likelihood and the best fit parameters at the minimum. He can also get negLLs for other parameter values too. Given these data, is there a way in R to calculate the Fisher information matrix? Best, Federico -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creatiing an R package for solving nonlinear system ofequations was: RE: finding roots of multivariate equation
Local minima, other than the actual roots, will be present only when the Jacobian of the system is singular. If the Jacobian is well-behaved then there should be no problem, although this is hard to verify in practice. Furthermore, as I had pointed out in one of my previous emails, if convergence to a local optimum takes place, you simply restart the procedure with another initial value. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Rob Creecy [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 26, 2007 1:01 PM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch; 'Bill Shipley' Subject: Re: [R] Creatiing an R package for solving nonlinear system ofequations was: RE: finding roots of multivariate equation This seems useful, but it is important to note that the approach may not work well unless the system of nonlinear equations is very well behaved and a good starting point is chosen. A good explanation of the problems with this exact approach, that is adding up the sums of squares of the individual functions, is described in Numerical Recipes for C, second edition, p 382 (see http://www.nrbook.com/a/bookcpdf.php) Briefly there will often be a great number of local minima even when there is only a single root of the original equations. Rob Ravi Varadhan wrote: Hi, I have written a simple function to solve a system of nonlinear equations. I have called it nlsolve(). It actually minimizes the squared-norm of the set of functions by calling optim(). It uses the BFGS algorithm within optim(). Apart from this restriction, the user can pass all the arguments available in optim(). All the control parameters can be passed as in the call to optim(). I have attached a text file containing the source for nlsolve() and also a number of test problems illustrating the use of nlsolve(). Any feedback and suggestions to improve it are welcome. Hope this is useful. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Wednesday, June 20, 2007 5:23 PM To: r-help@stat.math.ethz.ch Subject: [R] Creatiing an R package for solving nonlinear system of equations was: RE: finding roots of multivariate equation Hi All, Replying to this and numerous other requests in the past has made me realize that a nonlinear solver is very much needed for R users. I have successfully used a nonlinear solver based on the spectral gradient method, in FORTRAN. I can readily translate that to R and make it available as an R function, but what I would really like to do is to make that into a package. I can provide the R function and several test examples. But I am not good at creating a good/reliable package. So, it would be ideal if one of the R gurus is interested in collaborating with me on this project. Any one interested? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Shipley Sent: Wednesday, June 20, 2007 1:37 PM To: r-help@stat.math.ethz.ch Subject: [R] finding roots of multivariate equation Hello, I want to find the roots of an equation in two variables. I am aware of the uniroot function, which can do this for a function with a single variable (as I understand it...) but cannot find a function that does this for an equation with more than one variable. I am looking for something implementing similar to a Newton-Raphson algorithm. Thanks
Re: [R] fft and the derivative
Todd, Your idea is correct for continuous Fourier transform, but I am not sure how one could apply that to fft, which corresponds to the discrete Fourier transform. For instance, what values of omega would you use for the term i*omega to get the discrete fourier transform of the derivative of f(t)? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Todd Remund Sent: Monday, June 25, 2007 5:16 PM To: r-help@stat.math.ethz.ch Subject: [R] fft and the derivative Can one take f(t) and transform to F(omega) in the frequency domain using fft(), and use the properties of the fft and find the derivative of f(t)? For example, f(t) - F(omega) = f(t)^n - (i*omega)^n * F(omega) Use this and get, f(t)^n = F^(-) [ (i*omega)^n * F(omega) ] to get the nth derivative of f(t)? Todd Remund __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creatiing an R package for solving nonlinear system of equations was: RE: finding roots of multivariate equation
Hi, I have written a simple function to solve a system of nonlinear equations. I have called it nlsolve(). It actually minimizes the squared-norm of the set of functions by calling optim(). It uses the BFGS algorithm within optim(). Apart from this restriction, the user can pass all the arguments available in optim(). All the control parameters can be passed as in the call to optim(). I have attached a text file containing the source for nlsolve() and also a number of test problems illustrating the use of nlsolve(). Any feedback and suggestions to improve it are welcome. Hope this is useful. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Wednesday, June 20, 2007 5:23 PM To: r-help@stat.math.ethz.ch Subject: [R] Creatiing an R package for solving nonlinear system of equations was: RE: finding roots of multivariate equation Hi All, Replying to this and numerous other requests in the past has made me realize that a nonlinear solver is very much needed for R users. I have successfully used a nonlinear solver based on the spectral gradient method, in FORTRAN. I can readily translate that to R and make it available as an R function, but what I would really like to do is to make that into a package. I can provide the R function and several test examples. But I am not good at creating a good/reliable package. So, it would be ideal if one of the R gurus is interested in collaborating with me on this project. Any one interested? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Shipley Sent: Wednesday, June 20, 2007 1:37 PM To: r-help@stat.math.ethz.ch Subject: [R] finding roots of multivariate equation Hello, I want to find the roots of an equation in two variables. I am aware of the uniroot function, which can do this for a function with a single variable (as I understand it...) but cannot find a function that does this for an equation with more than one variable. I am looking for something implementing similar to a Newton-Raphson algorithm. Thanks. -- Bill Shipley North American Editor for Annals of Botany Subject Editor for Ecology Département de biologie Université de Sherbrooke Sherbrooke (Québec) J1K 2R9 Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding roots of multivariate equation
R does not really have a dedicated solver for nonlinear systems of equations, but instead you can use optim(), which is a minimizer. Suppose your system is F(x) = 0, where x \in R^p and F is a mapping from R^p to R^p, then you minimize the norm of F. The problem with this approach is that it can sometimes yield local minima which are not the zeros of the original system. However, this can be easily remedied by using different starting values. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Shipley Sent: Wednesday, June 20, 2007 1:37 PM To: r-help@stat.math.ethz.ch Subject: [R] finding roots of multivariate equation Hello, I want to find the roots of an equation in two variables. I am aware of the uniroot function, which can do this for a function with a single variable (as I understand it...) but cannot find a function that does this for an equation with more than one variable. I am looking for something implementing similar to a Newton-Raphson algorithm. Thanks. -- Bill Shipley North American Editor for Annals of Botany Subject Editor for Ecology Département de biologie Université de Sherbrooke Sherbrooke (Québec) J1K 2R9 Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creatiing an R package for solving nonlinear system of equations was: RE: finding roots of multivariate equation
Hi All, Replying to this and numerous other requests in the past has made me realize that a nonlinear solver is very much needed for R users. I have successfully used a nonlinear solver based on the spectral gradient method, in FORTRAN. I can readily translate that to R and make it available as an R function, but what I would really like to do is to make that into a package. I can provide the R function and several test examples. But I am not good at creating a good/reliable package. So, it would be ideal if one of the R gurus is interested in collaborating with me on this project. Any one interested? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Shipley Sent: Wednesday, June 20, 2007 1:37 PM To: r-help@stat.math.ethz.ch Subject: [R] finding roots of multivariate equation Hello, I want to find the roots of an equation in two variables. I am aware of the uniroot function, which can do this for a function with a single variable (as I understand it...) but cannot find a function that does this for an equation with more than one variable. I am looking for something implementing similar to a Newton-Raphson algorithm. Thanks. -- Bill Shipley North American Editor for Annals of Botany Subject Editor for Ecology Département de biologie Université de Sherbrooke Sherbrooke (Québec) J1K 2R9 Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Normal and Poisson tail area expectations in R
Perfect, Chuck. I got a closed-form solution after some algebraic labor, but your solution is simple and elegant. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 1:36 PM To: Ravi Varadhan Cc: 'kavindra malik'; r-help@stat.math.ethz.ch Subject: RE: [R] Normal and Poisson tail area expectations in R Ravi, This looks simple to me. km_G - function(lambda,k) lambda*ppois(k-1,lambda,lower=FALSE) - k*ppois(k,lambda,lower=FALSE) Am I confused here? Chuck On Wed, 13 Jun 2007, Ravi Varadhan wrote: More interesting is the Poisson convolution. I don't know if there is an analytic solution to this. I looked at Jolley's Summation of Series and Abramowitz and Stegun, but no help there. It seems that discrete FFT technique should work. Does anyone know the answer? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of kavindra malik Sent: Wednesday, June 13, 2007 5:45 PM To: Charles C. Berry Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Normal and Poisson tail area expectations in R Thank you very much. This solves the problem I was trying to solve. I am new to R and am learning. A great lesson in the power of R... Charles C. Berry [EMAIL PROTECTED] wrote: On Wed, 13 Jun 2007, kavindra malik wrote: I am interested in R functions for the following integrals / sums (expressed best I can in text) - Normal: G_u(k) = Integration_{Lower limit=k}^{Upper limit=infinity} [(u -k) f(u) d(u)], where where u is N(0,1), and f(u) is the density function. Poisson: G(lambda,k) = Sum_{Lower limit=k}^{Upper limit=infinity} [(x-k) p(x, lambda)] where P(x,lambda) is the Poisson prob function with parameter lambda. The Normal expression is very commonly used in inventory management to determine safety stocks (and its tabular values can be found in some texts) - and I am also looking for Poisson and/or Gamma as that'd fit the situation better. I am wondering if there are standard functions in R that might allow me to get these values, instead of needing to do the numerical integration, etc. myself. Not that I know of, but it is not difficult to do the integration: k - 1.1 # for example integrate(function(x) (x-k)*dnorm(x),lower=k,upper=Inf) 0.06861951 with absolute error 5.5e-07 see ?integrate ?qnorm ?qpois ?qgamma Thank you very much. - Sucker-punch spam with award-winning protection. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented
Re: [R] Normal and Poisson tail area expectations in R
Inspired by Chuck's elegant solution to the Poisson tail area problem, here is a simple solution to the normal tail area expectation that does not use integrate(). Gu.k - function(k) {1/sqrt(2*pi) * exp(-k*k/2) - k * pnorm(k, lower=FALSE)} k - 1:10 Gu.k(k) [1] 8.331547e-02 8.490703e-03 3.821543e-04 7.145258e-06 5.346166e-08 1.563570e-10 1.760326e-13 7.550262e-17 1.224779e-20 7.474560e-25 sapply(k, function(k)integrate(function(x) (x-k)*dnorm(x),lower=k,upper=Inf)$val) [1] 8.331547e-02 8.490702e-03 3.821543e-04 7.145259e-06 5.346163e-08 1.563570e-10 1.760326e-13 7.550262e-17 1.224779e-20 7.474560e-25 Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Thursday, June 14, 2007 2:08 PM To: 'Charles C. Berry' Cc: 'kavindra malik'; r-help@stat.math.ethz.ch Subject: Re: [R] Normal and Poisson tail area expectations in R Perfect, Chuck. I got a closed-form solution after some algebraic labor, but your solution is simple and elegant. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 1:36 PM To: Ravi Varadhan Cc: 'kavindra malik'; r-help@stat.math.ethz.ch Subject: RE: [R] Normal and Poisson tail area expectations in R Ravi, This looks simple to me. km_G - function(lambda,k) lambda*ppois(k-1,lambda,lower=FALSE) - k*ppois(k,lambda,lower=FALSE) Am I confused here? Chuck On Wed, 13 Jun 2007, Ravi Varadhan wrote: More interesting is the Poisson convolution. I don't know if there is an analytic solution to this. I looked at Jolley's Summation of Series and Abramowitz and Stegun, but no help there. It seems that discrete FFT technique should work. Does anyone know the answer? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of kavindra malik Sent: Wednesday, June 13, 2007 5:45 PM To: Charles C. Berry Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Normal and Poisson tail area expectations in R Thank you very much. This solves the problem I was trying to solve. I am new to R and am learning. A great lesson in the power of R... Charles C. Berry [EMAIL PROTECTED] wrote: On Wed, 13 Jun 2007, kavindra malik wrote: I am interested in R functions for the following integrals / sums (expressed best I can in text) - Normal: G_u(k) = Integration_{Lower limit=k}^{Upper limit=infinity} [(u -k) f(u) d(u)], where where u is N(0,1), and f(u) is the density function. Poisson: G(lambda,k) = Sum_{Lower limit=k}^{Upper limit=infinity} [(x-k) p(x, lambda)] where P(x,lambda) is the Poisson prob function with parameter lambda. The Normal expression is very commonly used in inventory management to determine safety stocks (and its tabular values can be found in some texts) - and I am also looking for Poisson and/or Gamma as that'd fit the situation better. I am wondering if there are standard functions in R that might allow me to get these values, instead of needing to do the numerical integration, etc. myself. Not that I know of, but it is not difficult to do the integration: k - 1.1 # for example integrate(function(x) (x-k)*dnorm(x),lower=k,upper=Inf) 0.06861951 with absolute error 5.5e-07 see ?integrate ?qnorm ?qpois ?qgamma Thank you very much. - Sucker-punch spam with award-winning protection. [[alternative HTML version deleted]] __ R
Re: [R] Normal and Poisson tail area expectations in R
More interesting is the Poisson convolution. I don't know if there is an analytic solution to this. I looked at Jolley's Summation of Series and Abramowitz and Stegun, but no help there. It seems that discrete FFT technique should work. Does anyone know the answer? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of kavindra malik Sent: Wednesday, June 13, 2007 5:45 PM To: Charles C. Berry Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Normal and Poisson tail area expectations in R Thank you very much. This solves the problem I was trying to solve. I am new to R and am learning. A great lesson in the power of R... Charles C. Berry [EMAIL PROTECTED] wrote: On Wed, 13 Jun 2007, kavindra malik wrote: I am interested in R functions for the following integrals / sums (expressed best I can in text) - Normal: G_u(k) = Integration_{Lower limit=k}^{Upper limit=infinity} [(u -k) f(u) d(u)], where where u is N(0,1), and f(u) is the density function. Poisson: G(lambda,k) = Sum_{Lower limit=k}^{Upper limit=infinity} [(x-k) p(x, lambda)] where P(x,lambda) is the Poisson prob function with parameter lambda. The Normal expression is very commonly used in inventory management to determine safety stocks (and its tabular values can be found in some texts) - and I am also looking for Poisson and/or Gamma as that'd fit the situation better. I am wondering if there are standard functions in R that might allow me to get these values, instead of needing to do the numerical integration, etc. myself. Not that I know of, but it is not difficult to do the integration: k - 1.1 # for example integrate(function(x) (x-k)*dnorm(x),lower=k,upper=Inf) 0.06861951 with absolute error 5.5e-07 see ?integrate ?qnorm ?qpois ?qgamma Thank you very much. - Sucker-punch spam with award-winning protection. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find how many modes in 2 dimensions case
Hi Patrick, Here is a simple R code for locating ALL the local maximum of a bivariate function, which is known on a rectangular grid. I have illustrated it with a function called the Branin function, which is commonly used as a test function in the global optimization literature. It has 6 local maxima, two of which are global. branin - function(x1,x2,p) { p[1] * x1^2 + p[2]*x1^4 + p[3]*x1^6 - x1*x2 + p[4]*x2^2 + p[5]*x2^4 } x - seq(-2, 2, length=100) y - seq(-1, 1, length=100) p - c(-4, 2.1, -1/3, 4, -4) z - outer(x, y, branin,p=p) persp(x, y, z, theta=30, phi=30, col=lightblue) # here is a brute-force algorithm to locate ALL the local maxima for (i in 2:(nrow(z)-1) ) { for (j in 2:(ncol(z)-1) ) { lmax - (z[i,j] z[i-1,j]) (z[i,j] z[i+1,j]) (z[i,j] z[i,j-1]) (z[i,j] z[i,j+1]) if(lmax) cat(x: ,x[i], y: , y[j], function: , z[i,j], \n) } } x: -1.72 y: 0.798 function: 0.214 x: -1.60 y: -0.576 function: -2.10 x: -0.101 y: 0.717 function: 1.03 x: 0.101 y: -0.717 function: 1.03 x: 1.60 y: 0.576 function: -2.10 x: 1.72 y: -0.798 function: 0.214 Of course, this brute-force grid search is highly inefficient for dimensions greater than 2. Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 3:35 PM To: Bert Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] how to find how many modes in 2 dimensions case Thanks for the reply, maybe I shall say bumps, I can use persp to show a density on a X Y dimensions. one peak is one mode I think. I try to find an automatic way to detect how many peaks of the densities. Pat Note that the number of modes (local maxima??) is a function of the bandwidth, so I'm not sure your question is even meaningful. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 11:54 AM To: R-help@stat.math.ethz.ch Subject: [R] how to find how many modes in 2 dimensions case Hi, Does anyone know how to count the number of modes in 2 dimensions using kde2d function? Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find how many modes in 2 dimensions case
Patrick, Here is an example closer to what you are interested in - locating bumps in kernel density estimator. I am using the example from package KernSmooth, using the function bkde2D(). # Another example for locating maxima in kernel density estimation data(geyser, package=MASS) x - cbind(geyser$duration, geyser$waiting) est - bkde2D(x, bandwidth=c(0.7,7)) persp(est$fhat) x - est$x1 y - est$x2 z - est$fhat # here is a brute-force algorithm to locate ALL the local maxima for (i in 2:(nrow(z)-1) ) { for (j in 2:(ncol(z)-1) ) { lmax - (z[i,j] z[i-1,j]) (z[i,j] z[i+1,j]) (z[i,j] z[i,j-1]) (z[i,j] z[i,j+1]) if(lmax) cat(x: ,x[i], y: , y[j], function: , z[i,j], \n) } } x: 0.724 y: 41.1 function: 2.71e-20 x: 0.858 y: 39.4 function: 1.08e-19 x: 0.992 y: 35.9 function: 2.17e-19 x: 2.07 y: 82.4 function: 0.00795 x: 4.08 y: 77.2 function: 0.00722 x: 4.35 y: 54.9 function: 0.00778 Of these, you can ignore the first 3, which have zero density. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 3:35 PM To: Bert Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] how to find how many modes in 2 dimensions case Thanks for the reply, maybe I shall say bumps, I can use persp to show a density on a X Y dimensions. one peak is one mode I think. I try to find an automatic way to detect how many peaks of the densities. Pat Note that the number of modes (local maxima??) is a function of the bandwidth, so I'm not sure your question is even meaningful. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 11:54 AM To: R-help@stat.math.ethz.ch Subject: [R] how to find how many modes in 2 dimensions case Hi, Does anyone know how to count the number of modes in 2 dimensions using kde2d function? Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Using odesolve to produce non-negative solutions
Spencer, Lsoda does not estimate any parameters (nlmeODE does parameter estimation). It just computes the solution trajectory, at discrete times, of a dynamical systems (i.e. set of differential equations). It only works with real numbers, as far as I know. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves Sent: Monday, June 11, 2007 12:53 PM To: Martin Henry H. Stevens Cc: Jeremy Goldhaber-Fiebert; r-help@stat.math.ethz.ch Subject: Re: [R] Fwd: Using odesolve to produce non-negative solutions in line Martin Henry H. Stevens wrote: Hi Jeremy, First, setting hmax to a small number could prevent a large step, if you think that is a problem. Second, however, I don't see how you can get a negative population size when using the log trick. SG: Can lsoda estimate complex or imaginary parameters? I would think that that would prevent completely any negative values of N (i.e. e^-10 0). Can you explain? or do you want to a void that trick? The only other solver I know of is rk4 and it is not recommended. Hank On Jun 11, 2007, at 11:46 AM, Jeremy Goldhaber-Fiebert wrote: Hi Spencer, Thank you for your response. I also did not see anything on the lsoda help page which is the reason that I wrote to the list. From your response, I am not sure if I asked my question clearly. I am modeling a group of people (in a variety of health states) moving through time (and getting infected with an infectious disease). This means that the count of the number of people in each state should be positive at all times. What appears to happen is that lsoda asks for a derivative at a given point in time t and then adjusts the state of the population. However, perhaps due to numerical instability, it occasionally lower the population count below 0 for one of the health states (perhaps because it's step size is too big or something). I have tried both the logarithm trick snip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Using odesolve to produce non-negative solutions
Hi Jeremy, A smaller step size may or may not help. If the issue is simply truncation error, that is the error involved in discretizing the differential equations, then a smaller step size would help. If, however, the true solution to the differential equation is negative, for some t, then the numerical solution should also be negative. If the negative solution does not make sense, then the system of equation needs to be examined to see when and why negative solutions arise. Perhaps, I am just making this up - there needs to be a dampening function that slows down the trajectory as it approaches zero from its initial value. It is also possible that only certain regions of the parameter space (note that initial conditions are also parameters) are allowed in the sense that only there the solution is feasible for all t. So, in your example, the parameters might not be realistic. In short, if you are sure that the numerical solution is accurate, then you need to go back to your system of equations and analyze them carefully. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Goldhaber-Fiebert Sent: Monday, June 11, 2007 11:47 AM To: Spencer Graves Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Fwd: Using odesolve to produce non-negative solutions Hi Spencer, Thank you for your response. I also did not see anything on the lsoda help page which is the reason that I wrote to the list. From your response, I am not sure if I asked my question clearly. I am modeling a group of people (in a variety of health states) moving through time (and getting infected with an infectious disease). This means that the count of the number of people in each state should be positive at all times. What appears to happen is that lsoda asks for a derivative at a given point in time t and then adjusts the state of the population. However, perhaps due to numerical instability, it occasionally lower the population count below 0 for one of the health states (perhaps because it's step size is too big or something). I have tried both the logarithm trick and also changing the relative and absolute tolerance inputs but I still get the problem for certain combinations of parameters and initial conditions. It occurs both under MS Windows XP Service Pack 2 and on a Linux cluster so I am pretty sure it is not platform specific. My real question to the group is if there is not a work around in lsoda are there other ode solvers in R that will allow the constraint of solutions to the ODEs remain non-negative? Best regards, Jeremy Spencer Graves [EMAIL PROTECTED] 6/8/2007 9:51 AM On the 'lsoda' help page, I did not see any option to force some or all parameters to be nonnegative. Have you considered replacing the parameters that must be nonnegative with their logarithms? This effective moves the 0 lower limit to (-Inf) and seems to have worked well for me in the past. Often, it can even make the log likelihood or sum of squares surface more elliptical, which means that the standard normal approximation for the sampling distribution of parameter estimates will likely be more accurate. Hope this helps. Spencer Graves p.s. Your example seems not to be self contained. If I could have easily copied it from your email and run it myself, I might have been able to offer more useful suggestions. Jeremy Goldhaber-Fiebert wrote: Hello, I am using odesolve to simulate a group of people moving through time and transmitting infections to one another. In Matlab, there is a NonNegative option which tells the Matlab solver to keep the vector elements of the ODE solution non-negative at all times. What is the right way to do this in R? Thanks, Jeremy P.S., Below is a simplified version of the code I use to try to do this, but I am not sure that it is theoretically right dynmodel - function(t,y,p) { ## Initialize parameter values birth - p$mybirth(t) death - p$mydeath(t) recover - p$myrecover beta - p$mybeta vaxeff - p$myvaxeff vaccinated - p$myvax(t) vax - vaxeff*vaccinated/100 ## If the state currently has negative quantities (shouldn't have), then reset to reasonable values for computing meaningful derivatives for (i in 1:length(y)) { if (y[i]0) { y[i] - 0 } } S - y[1] I - y[2] R - y[3] N - y[4] shat - (birth
Re: [R] Fwd: Using odesolve to produce non-negative solutions
Hi Jeremy, A smaller step size may or may not help. If the issue is simply truncation error, that is the error involved in discretizing the differential equations, then a smaller step size would help. If, however, the true solution to the differential equation is negative, for some t, then the numerical solution should also be negative. If the negative solution does not make sense, then the system of equation needs to be examined to see when and why negative solutions arise. Perhaps, I am just making this up - there needs to be a barrier function that slows down the trajectory as it approaches zero from its initial value. It is also possible that only certain regions of the parameter space are allowed in the sense that only there the solution is feasible for all t. So, in your example, the parameters might not be realistic. In short, if you are sure that the numerical solution is accurate, then you need to go back to your system of equations and analyze them carefully. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Henry H. Stevens Sent: Monday, June 11, 2007 1:03 PM To: Spencer Graves Cc: Jeremy Goldhaber-Fiebert; R-Help; [EMAIL PROTECTED] Subject: Re: [R] Fwd: Using odesolve to produce non-negative solutions Hi Spencer, I have copied Woody Setzer. I have no idea whether lsoda can estimate parameters that could take imaginary values. Hank On Jun 11, 2007, at 12:52 PM, Spencer Graves wrote: in line Martin Henry H. Stevens wrote: Hi Jeremy, First, setting hmax to a small number could prevent a large step, if you think that is a problem. Second, however, I don't see how you can get a negative population size when using the log trick. SG: Can lsoda estimate complex or imaginary parameters? Hmm. I have no idea. I would think that that would prevent completely any negative values of N (i.e. e^-10 0). Can you explain? or do you want to a void that trick? The only other solver I know of is rk4 and it is not recommended. Hank On Jun 11, 2007, at 11:46 AM, Jeremy Goldhaber-Fiebert wrote: Hi Spencer, Thank you for your response. I also did not see anything on the lsoda help page which is the reason that I wrote to the list. From your response, I am not sure if I asked my question clearly. I am modeling a group of people (in a variety of health states) moving through time (and getting infected with an infectious disease). This means that the count of the number of people in each state should be positive at all times. What appears to happen is that lsoda asks for a derivative at a given point in time t and then adjusts the state of the population. However, perhaps due to numerical instability, it occasionally lower the population count below 0 for one of the health states (perhaps because it's step size is too big or something). I have tried both the logarithm trick snip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Dr. Hank Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Using odesolve to produce non-negative solutions
Jeremy, You should examine the steady-state solution to your system of equations, by setting the time-derivatives to zero and then solving/analyzing the resulting algebraic equations. This should give you some insights. Let us say you have 3 groups, A,B, and C, with initial conditions: N_A(t=0) = N_{A0}, N_B(t=0) = N_{B0}, and N_C(t=0) = N_{C0}, and that people transition in and out of these 3 states (one of the states could even be absorbing, e.g. death), but it is true for any time t that N_A(t) + N_B(t) + N_C(t) = N_{A0} + N_{B0} + N_{C0}. Furthermore, you have 3 diff-equations that describe the rate of change of N_A, N_B, and N_C, for t 0. If it happens that one of the N's, say N_A, becomes negative, you could set it equal to zero. But then you have to figure out how to re-adjust N_B and N_C so that they add up to the initial total count. After re-adjustment, you also have to think about whether the resulting system of equations are valid, when there are no A people. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Goldhaber-Fiebert Sent: Monday, June 11, 2007 11:47 AM To: Spencer Graves Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Fwd: Using odesolve to produce non-negative solutions Hi Spencer, Thank you for your response. I also did not see anything on the lsoda help page which is the reason that I wrote to the list. From your response, I am not sure if I asked my question clearly. I am modeling a group of people (in a variety of health states) moving through time (and getting infected with an infectious disease). This means that the count of the number of people in each state should be positive at all times. What appears to happen is that lsoda asks for a derivative at a given point in time t and then adjusts the state of the population. However, perhaps due to numerical instability, it occasionally lower the population count below 0 for one of the health states (perhaps because it's step size is too big or something). I have tried both the logarithm trick and also changing the relative and absolute tolerance inputs but I still get the problem for certain combinations of parameters and initial conditions. It occurs both under MS Windows XP Service Pack 2 and on a Linux cluster so I am pretty sure it is not platform specific. My real question to the group is if there is not a work around in lsoda are there other ode solvers in R that will allow the constraint of solutions to the ODEs remain non-negative? Best regards, Jeremy Spencer Graves [EMAIL PROTECTED] 6/8/2007 9:51 AM On the 'lsoda' help page, I did not see any option to force some or all parameters to be nonnegative. Have you considered replacing the parameters that must be nonnegative with their logarithms? This effective moves the 0 lower limit to (-Inf) and seems to have worked well for me in the past. Often, it can even make the log likelihood or sum of squares surface more elliptical, which means that the standard normal approximation for the sampling distribution of parameter estimates will likely be more accurate. Hope this helps. Spencer Graves p.s. Your example seems not to be self contained. If I could have easily copied it from your email and run it myself, I might have been able to offer more useful suggestions. Jeremy Goldhaber-Fiebert wrote: Hello, I am using odesolve to simulate a group of people moving through time and transmitting infections to one another. In Matlab, there is a NonNegative option which tells the Matlab solver to keep the vector elements of the ODE solution non-negative at all times. What is the right way to do this in R? Thanks, Jeremy P.S., Below is a simplified version of the code I use to try to do this, but I am not sure that it is theoretically right dynmodel - function(t,y,p) { ## Initialize parameter values birth - p$mybirth(t) death - p$mydeath(t) recover - p$myrecover beta - p$mybeta vaxeff - p$myvaxeff vaccinated - p$myvax(t) vax - vaxeff*vaccinated/100 ## If the state currently has negative quantities (shouldn't have), then reset to reasonable values for computing meaningful derivatives for (i in 1:length(y)) { if (y[i]0) { y[i] - 0 } } S - y[1] I - y[2] R - y[3] N - y[4] shat - (birth*(1-vax)) - (death*S) - (beta*S*I/N) ihat
[R] An extension of Gabriel's biplot
Hi All, I would like to extend the standard biplot of Grabriel (1971), which is done by the R function biplot(). Rather than just showing the variables as vectors (with arrows at end point), I would like to show them as line segments, where the segment runs from minimum value of the variable to its maximum. In other words, I would like to produce a plot similar to that shown in Figure 1 of Gabriel and Odoroff (1990), rather than the classical biplot shown in Figure 2 of the same paper. Has anyone generated biplots similar to Figure 1 of Gabriel and Odoroff? Any help is greatly appreciated. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing multiple distributions
Your data is compositional data. The R package compositions might be useful. You might also want to consult the book by J. Aitchison: statistical analysis of compositional data. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jiho Sent: Thursday, May 31, 2007 11:37 AM To: R-help Subject: Re: [R] Comparing multiple distributions Nobody answered my first request. I am sorry if I did not explain my problem clearly. English is not my native language and statistical english is even more difficult. I'll try to summarize my issue in more appropriate statistical terms: Each of my observations is not a single number but a vector of 5 proportions (which add up to 1 for each observation). I want to compare the shape of those vectors between two treatments (i.e. how the quantities are distributed between the 5 values in treatment A with respect to treatment B). I was pointed to Hotelling T-squared. Does it seem appropriate? Are there other possibilities (I read many discussions about hotelling vs. manova but I could not see how any of those related to my particular case)? Thank you very much in advance for your insights. See below for my earlier, more detailed, e-mail. On 2007-May-21 , at 19:26 , jiho wrote: I am studying the vertical distribution of plankton and want to study its variations relatively to several factors (time of day, species, water column structure etc.). So my data is special in that, at each sampling site (each observation), I don't have *one* number, I have *several* numbers (abundance of organisms in each depth bin, I sample 5 depth bins) which describe a vertical distribution. Then let say I want to compare speciesA with speciesB, I would end up trying to compare a group of several distributions with another group of several distributions (where a distribution is a vector of 5 numbers: an abundance for each depth bin). Does anyone know how I could do this (with R obviously ;) )? Currently I kind of get around the problem and: - compute mean abundance per depth bin within each group and compare the two mean distributions with a ks.test but this obviously diminishes the power of the test (I only compare 5*2 observations) - restrict the information at each sampling site to the mean depth weighted by the abundance of the species of interest. This way I have one observation per station but I reduce the information to the mean depths while the actual repartition is important also. I know this is probably not directly R related but I have already searched around for solutions and solicited my local statistics expert... to no avail. So I hope that the stats' experts on this list will help me. Thank you very much in advance. JiHO --- http://jo.irisson.free.fr/ -- Ce message a iti virifii par MailScanner pour des virus ou des polluriels et rien de suspect n'a iti trouvi. CRI UPVD http://www.univ-perp.fr __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ratio distribution - missing attachment
Dear Martin and Vitto, Please find attached the R function to compute the density of the ratio of 2 dependent normal variates. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Martin Maechler [mailto:[EMAIL PROTECTED] Sent: Monday, May 28, 2007 5:37 AM To: Ravi Varadhan Cc: R-help@stat.math.ethz.ch Subject: Re: [R] ratio distribution - missing attachment Thank you, Ravi. You probably will have noticed, that the attachment didn't make it to the mailing list. The reason is that the we let the mailing list software strip binary attachments which can easily be misused to spread viruses; see -- http://www.r-project.org/mail.html (search attachment) or the posting-guide. OTOH, the software allows attachments with MIME type text/plain. If you use an e-mail software for sophisticated users, the software allows you to specify the MIME type of your attachments; otherwise (as with most user friendly, modern e-mail software), attach a *.txt file (Doug Bates uses foo_R.txt) and it should make it to the lists; as a third alternative, just cut paste the corresponding text into your e-mal. I think your R function should make it to R-help (and its archives), so I'd be thankful for a repost. Martin Ravi == Ravi Varadhan [EMAIL PROTECTED] on Fri, 25 May 2007 14:24:20 -0400 writes: Ravi Mike, Attached is an R function to do this, along with Ravi an example that will reproduce the MathCad plot shown Ravi in your attached paper. I haven't checked it Ravi thoroughly, but it seems to reproduce the MathCad Ravi example well. Ravi Ravi. Ravi Ravi --- Ravi Ravi Varadhan, Ph.D. Ravi Assistant Professor, The Center on Aging and Health Ravi Division of Geriatric Medicine and Gerontology Ravi Johns Hopkins University Ravi Ph: (410) 502-2619 Ravi Fax: (410) 614-9625 Ravi Email: [EMAIL PROTECTED] Ravi Webpage: Ravi http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html Ravi Ravi Ravi -Original Message- From: Ravi [EMAIL PROTECTED] Ravi [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Mike Lawrence Sent: Friday, May 25, 2007 1:55 PM To: Ravi Lucke, Joseph F Cc: Rhelp Subject: Re: [R] Calculation Ravi of ratio distribution properties Ravi According to the paper I cited, there is controversy Ravi over the sufficiency of Hinkley's solution, hence Ravi their proposed more complete solution. Ravi On 25-May-07, at 2:45 PM, Lucke, Joseph F wrote: The exact ratio is given in On the Ratio of Two Correlated Normal Random Variables, D. V. Hinkley, Biometrika, Vol. 56, No. 3. (Dec., 1969), pp. 635-639. Ravi -- Mike Lawrence Graduate Student, Department of Ravi Psychology, Dalhousie University Ravi Website: http://myweb.dal.ca/mc973993 Public calendar: Ravi http://icalx.com/public/informavore/Public Ravi The road to wisdom? Well, it's plain and simple to Ravi express: Err and err and err again, but less and less Ravi and less. - Piet Hein ratio2normals - function(x, mean1,mean2,sd1,sd2,rho){ # A function to compute ratio of 2 normals # R code written by Ravi Varadhan # May 25, 2007 # Based on the paper by Pham Gia et al., Comm in Stats (2006) A - 1 / (2*pi*sd1*sd2*sqrt(1-rho^2)) exponent.num - -sd2^2*mean1^2 - sd1^2*mean2^2 + 2*rho*sd1*sd2*mean1*mean2 exponent.denom - 2*(1-rho^2)*sd1^2*sd2^2 K - A * exp(exponent.num/exponent.denom) t2x.num - -sd2^2*mean1*x - sd1^2*mean2 + rho*sd1*sd2*(mean2*x + mean1) t2x.denom - sd1*sd2*sqrt(2*(1-rho^2)*(sd2^2*x^2 - 2*rho*x*sd1*sd2 + sd1^2)) t2x - t2x.num / t2x.denom erf.term - 2 * pnorm(sqrt(2) * t2x) - 1 Ft2x - sqrt(pi) * t2x * exp(t2x^2) * erf.term + 1 fx - K * Ft2x * 2 * (1 - rho^2) * sd1^2 * sd2^2 / (sd2^2 * x^2 + sd1^2 - 2*x*rho*sd1*sd2) return(fx) } mean1 - 75.25 mean2 - 71.58 sd1 - 6.25 sd2 - 5.45 rho - 0.76 x - seq(0.5,1.5, length=100) y - ratio2normals(x,mean1,mean2, sd1,sd2,rho) plot(x,y, type=l) # compute the mean and variance via quadrature m1 - function(x, mean1, mean2, sd1, sd2, rho){ x * ratio2normals(x, mean1, mean2, sd1, sd2, rho) } m2 - function(x, mean1, mean2, sd1, sd2, rho){ x^2 * ratio2normals(x, mean1, mean2, sd1, sd2, rho) } m.1 - integrate(m1, lower=-Inf, upper=Inf, mean1=mean1, mean2=mean2, sd1=sd1, sd2=sd2, rho=rho, rel.tol=1.e
Re: [R] Problem with numerical integration and optimization with BFGS
Deepankar, If the problem seems to be in the evaluation of numerical quadrature part, you might want to try quadrature methods that are better suited to integrands with strong peaks. The traditional Gaussian quadrature methods, even their adaptive versions such as Gauss-Kronrod, are not best suited for integrating because they do not explicitly account for the peakedness of the integrand, and hence can be inefficient and inaccurate. See the article below: http://citeseer.ist.psu.edu/cache/papers/cs/18996/http:zSzzSzwww.sci.wsu.edu zSzmathzSzfacultyzSzgenzzSzpaperszSzmvn.pdf/genz92numerical.pdf Alan Genz has worked on this problem a lot and has a number of computational tools available. I used some of them when I was working on computing Bayes factors for binomial regression models with different link functions. If you are interested, check the following: http://www.math.wsu.edu/faculty/genz/software/software.html. For your immediate needs, there is an R package called mnormt that has a function for computing integrals under a multivariate normal (and multivariate t) densities, which is actually based on Genz's Fortran routines. You could try that. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Deepankar Basu Sent: Friday, May 25, 2007 12:02 AM To: Prof Brian Ripley Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Problem with numerical integration and optimization with BFGS Prof. Ripley, The code that I provided with my question of course does not contain code for the derivatives; but I am supplying analytical derivatives in my full program. I did not include that code with my question because that would have added about 200 more lines of code without adding any new information relevant for my question. The problem that I had pointed to occurs whether I provide analytical derivatives or not to the optimization routine. And the problem was that when I use the BFGS method in optim, I get an error message saying that the integrals are probably divergent; I know, on the other hand, that the integrals are convergent. The same problem does not arise when I instead use the Nelder-Mead method in optim. Your suggestion that the expression can be analytically integrated (which will involve pnorm) might be correct though I do not see how to do that. The integrands are the bivariate normal density functions with one variable replaced by known quantities while I integrate over the second. For instance, the first integral is as follows: the integrand is the bivariate normal density function (with general covariance matrix) where the second variable has been replaced by y[i] - rho1*y[i-1] + delta and I integrate over the first variable; the range of integration is lower=-y[i]+rho1*y[i-1] upper=y[i]-rho1*y[i-1] The other two integrals are very similar. It would be of great help if you could point out how to integrate the expressions analytically using pnorm. Thanks. Deepankar On Fri, 2007-05-25 at 04:22 +0100, Prof Brian Ripley wrote: You are trying to use a derivative-based optimization method without supplying derivatives. This will use numerical approoximations to the derivatives, and your objective function will not be suitable as it is internally using adaptive numerical quadrature and hence is probably not close enough to a differentiable function (it may well have steps). I believe you can integrate analytically (the answer will involve pnorm), and that you can also find analytical derivatives. Using (each of) numerical optimization and integration is a craft, and it seems you need to know more about it. The references on ?optim are too advanced I guess, so you could start with Chapter 16 of MASS and its references. On Thu, 24 May 2007, Deepankar Basu wrote: Hi R users, I have a couple of questions about some problems that I am facing with regard to numerical integration and optimization of likelihood functions. Let me provide a little background information: I am trying to do maximum likelihood estimation of an econometric model that I have developed recently. I estimate the parameters of the model using the monthly US unemployment rate series obtained from the Federal Reserve Bank of St. Louis. (The data is freely available from their web-based database called FRED-II). For my model, the likelihood function for each observation is the sum of three integrals. The integrand in each of these integrals is of the following form
Re: [R] Calculation of ratio distribution properties
Mike, Attached is an R function to do this, along with an example that will reproduce the MathCad plot shown in your attached paper. I haven't checked it thoroughly, but it seems to reproduce the MathCad example well. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Lawrence Sent: Friday, May 25, 2007 1:55 PM To: Lucke, Joseph F Cc: Rhelp Subject: Re: [R] Calculation of ratio distribution properties According to the paper I cited, there is controversy over the sufficiency of Hinkley's solution, hence their proposed more complete solution. On 25-May-07, at 2:45 PM, Lucke, Joseph F wrote: The exact ratio is given in On the Ratio of Two Correlated Normal Random Variables, D. V. Hinkley, Biometrika, Vol. 56, No. 3. (Dec., 1969), pp. 635-639. -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://myweb.dal.ca/mc973993 Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series\optimization question not R question
Your approach obviously won't give you the same result as when the likelihood is optimized jointly with A and \beta. However, you can maximize the likelihood over \beta for different values of A, which would give you a profiled likelihood. Then you pick the \beta and A corresponding to maximum of the profiled likelihood. However, this set of A and \beta need not necessarily satisfy your constraints. If this does happen, you could make a simple parameter transformation from (A, beta) to (P1, P2) that might resolve the problem: P1 - beta P2 - atanh(A + beta) Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED) Sent: Tuesday, May 22, 2007 10:29 AM To: r-help@stat.math.ethz.ch Subject: [R] Time series\optimization question not R question This is a time series\optimization rather than an R question : Suppose I have an ARMA(1,1) with restrictions such that the coefficient on the lagged epsilon_term is related to the coefficient on The lagged z term as below. z_t =[A + beta]*z_t-1 + epsilon_t - A*epsilon_t-1 So, if I don't have a facility for optimizing with this restriction, is it legal to set A to something and then Optimize just for the beta given the A ? Would this give me the same answer likelihood wise, of optimizing both jointly with the restriction ? This methodology doesn't sound right to me. Thanks. P.S : abs(A + beta) also has to be less than 1 but I was just going to hope for that and not worry about it right now. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series\optimization question not R question
In my previous email, I meant to say: P1 - A P2 - atanh(A + beta) So that the model becomes: z_t = tanh(P2)*z_t-1 + epsilon_t - P1*epsilon_t-1 Although I am not sure, how readily the likelihood of the above model can be maximized. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Tuesday, May 22, 2007 11:27 AM To: 'Leeds, Mark (IED)'; r-help@stat.math.ethz.ch Subject: Re: [R] Time series\optimization question not R question Your approach obviously won't give you the same result as when the likelihood is optimized jointly with A and \beta. However, you can maximize the likelihood over \beta for different values of A, which would give you a profiled likelihood. Then you pick the \beta and A corresponding to maximum of the profiled likelihood. However, this set of A and \beta need not necessarily satisfy your constraints. If this does happen, you could make a simple parameter transformation from (A, beta) to (P1, P2) that might resolve the problem: P1 - beta P2 - atanh(A + beta) Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED) Sent: Tuesday, May 22, 2007 10:29 AM To: r-help@stat.math.ethz.ch Subject: [R] Time series\optimization question not R question This is a time series\optimization rather than an R question : Suppose I have an ARMA(1,1) with restrictions such that the coefficient on the lagged epsilon_term is related to the coefficient on The lagged z term as below. z_t =[A + beta]*z_t-1 + epsilon_t - A*epsilon_t-1 So, if I don't have a facility for optimizing with this restriction, is it legal to set A to something and then Optimize just for the beta given the A ? Would this give me the same answer likelihood wise, of optimizing both jointly with the restriction ? This methodology doesn't sound right to me. Thanks. P.S : abs(A + beta) also has to be less than 1 but I was just going to hope for that and not worry about it right now. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear constrains with optim
Paul, The general idea here is to transform a constrained problem into a sequence of unconstrained optimization problems. When only equality constraints are involved, a popular way to do this is to augment the objective function with a Lagrangian term and with a quadratic penalty term. When general inequalities are present, it is a lot harder, but various kinds of barrier functions are used. A canonical reference (quite theoretical) for constrained optimization is the book: Fiacco and McCormick (1968, but reprinted as classic by SIAM in 1990): Nonlinear programming: sequential unconstrained minimization techniques. Another good book is that by Roger Fletcher (1987): Practical methods of optimization. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Thursday, May 10, 2007 11:08 AM To: R-help Subject: [R] Nonlinear constrains with optim Dear All I am dealing at the moment with optimization problems with nonlinear constraints. Regenoud is quite apt to solve that kind of problems, but the precision of the optimal values for the parameters is sometimes far from what I need. Optim seems to be more precise, but it can only accept box-constrained optimization problems. I read in the list archives that optim can also be used with nonlinear constrains through penalizations. However, I am not familiar with the technique of penalizations. Could someone please indicate to me a site or a book to learn about that penalization technique? Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optim
Let us first assume that you have enumerated all the local maxima, which is by no means a trivial thing to assure. How different are the likelihood values? If they are significantly different, then take the parameter estimates corresponding to the largest likelihood. If they are not significantly different but the corresponding parameter estimates differ widely, then you may have identifiability issues. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wassim Kamoum Sent: Thursday, May 10, 2007 3:46 PM To: r-help@stat.math.ethz.ch Subject: [R] Optim Hello, I'm maximizing a likelihood function with the function optim, but for different intial parameters (in the input of the optim funtion) , I found different value for the likelihood function and the parameters estimates, the causes is that the algorithm has not found the global maximum for the function but only a local maximum. What must I do to obtain the global maximum for the likelihood function? Thanks - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A function for raising a matrix to a power?
Atte, Your matrix A is not symmetric, that is why exponentiation using spectral decomposition, expM.sd, does not give you the correct answer. Convert A to a symmetric matrix: A - (A + t(A))/2 then the results will all match. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Atte Tenkanen [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 09, 2007 10:07 AM To: Paul Gilbert Cc: Ravi Varadhan; r-help@stat.math.ethz.ch Subject: Re: [R] A function for raising a matrix to a power? Hello, Thanks for many replys. I tested all the functions presented. I'm a beginner in linear algebra, but now I have a serious question ;-) Here is a matrix A, which determinant is 3, so it is nonsingular. Then there are similar computer runs done with each function proposed. I have calculated powers of A between A^274 - A^277. In the first version (see the outputs lower), when n=277, there comes zero in the upper right corner. Can you say, if the first function is more reliable than others? What can you say about the accuracy of calculations? -Atte #-# # Matrix A: A=rbind(c(-3,-4,-2),c(3,5,1),c(2,-1,4)) A det(A) # 1st version: %^%-function(A,n){ if(n==1) A else {B-A; for(i in (2:n)){A-A%*%B}}; A } for(i in 274:277){print(i);print(A%^%i)} # 2nd version: %^% - function(A, n) if(n == 1) A else A %*% (A %^% (n-1)) for(i in 274:277){print(i);print(A%^%i)} # 3rd version: mp - function(mat,pow){ ans - mat for ( i in 1:(pow-1)){ ans - mat%*%ans } return(ans) } for(i in 274:277){print(i);print(mp(A,i))} # 4th version: library(Malmig) mtx.exp for(i in 274:277){print(i);print(mtx.exp(A,i))} # 5th version: matrix.power - function(mat, n) { # test if mat is a square matrix # treat n 0 and n = 0 -- this is left as an exercise # trap non-integer n and return an error if (n == 1) return(mat) result - diag(1, ncol(mat)) while (n 0) { if (n %% 2 != 0) { result - result %*% mat n - n - 1 } mat - mat %*% mat n - n / 2 } return(result) } for(i in 274:277){print(i);print(matrix.power(A,i))} # 6th version: expM.sd - function(X,e){Xsd - eigen(X); Xsd$vec %*% diag(Xsd$val^e) %*%t(Xsd$vec)} for(i in 274:277){print(i);print(expM.sd(A,i))} #-OUTPUTS---# A=rbind(c(-3,-4,-2),c(3,5,1),c(2,-1,4)) A [,1] [,2] [,3] [1,] -3 -4 -2 [2,]351 [3,]2 -14 det(A) [1] 3 # 1st version: %^%-function(A,n){ + if(n==1) A else {B-A; for(i in (2:n)){A-A%*%B}}; A + } for(i in 274:277){print(i);print(A%^%i)} [1] 274 [,1] [,2] [,3] [1,] 1.615642e+131 3.231283e+131 -3.940201e+115 [2,] -5.385472e+130 -1.077094e+131 4.925251e+114 [3,] -3.769831e+131 -7.539661e+131 7.880401e+115 [1] 275 [,1] [,2] [,3] [1,] 4.846925e+131 9.693850e+131 -1.182060e+116 [2,] -1.615642e+131 -3.231283e+131 0.00e+00 [3,] -1.130949e+132 -2.261898e+132 4.728241e+116 [1] 276 [,1] [,2] [,3] [1,] 1.454078e+132 2.908155e+132 -1.576080e+116 [2,] -4.846925e+131 -9.693850e+131 0.00e+00 [3,] -3.392848e+132 -6.785695e+132 1.576080e+117 [1] 277 [,1] [,2] [,3] [1,] 4.362233e+132 8.724465e+132 0.00e+00 [2,] -1.454078e+132 -2.908155e+132 -1.576080e+116 [3,] -1.017854e+133 -2.035709e+133 3.782593e+117 # 2nd version: %^% - function(A, n) if(n == 1) A else A %*% (A %^% (n-1)) for(i in 274:277){print(i);print(A%^%i)} [1] 274 [,1] [,2] [,3] [1,] 1.615642e+131 3.231283e+131 -3.101125e+114 [2,] -5.385472e+130 -1.077094e+131 1.533306e+114 [3,] -3.769831e+131 -7.539661e+131 5.664274e+114 [1] 275 [,1] [,2] [,3] [1,] 4.846925e+131 9.693850e+131 -8.158398e+114 [2,] -1.615642e+131 -3.231283e+131 4.027430e+114 [3,] -1.130949e+132 -2.261898e+132 1.492154e+115 [1] 276 [,1] [,2] [,3] [1,] 1.454078e+132 2.908155e+132 -2.147761e+115 [2,] -4.846925e+131 -9.693850e+131 1.058350e+115 [3,] -3.392848e+132 -6.785695e+132 3.934193e+115 [1] 277 [,1] [,2] [,3] [1,] 4.362233e+132 8.724465e+132 -5.658504e+115 [2,] -1.454078e+132 -2.908155e+132 2.782660e+115 [3,] -1.017854e+133 -2.035709e+133 1.038290e+116 # 3rd version: mp - function(mat,pow){ + ans - mat + for ( i in 1:(pow-1)){ + ans - mat%*%ans + } + return(ans) + } for(i in 274:277
Re: [R] Bad optimization solution
Paul, The problem lies neither with R nor with numercial methods. The onus is always on the user to understand what the numerical schemes can do and what they can't do. One should never blindly take the results given by a numerical scheme and run with it. In your example, the optimization method is doing what it was designed to do: to find a critical point of a function where the gradient is zero. It is your responsibility to ensure that the result makes sense, and if it doesn't, to understand why it doesn't make sense. In your problem, maxima ((1,0) and (0,1)) lie on the boundary of the parameter space, and the gradient at the maxima (defined as the limit from within the domain) are clearly not zero. Another problem with your example is that the hessian for your function is singular, it has eigenvalues of 0 and 2. In short, there is no substitute to using your analytic powers! Ravi. - Original Message - From: Paul Smith [EMAIL PROTECTED] Date: Tuesday, May 8, 2007 4:33 am Subject: Re: [R] Bad optimization solution To: R-help r-help@stat.math.ethz.ch It seems that there is here a problem of reliability, as one never knows whether the solution provided by R is correct or not. In the case that I reported, it is fairly simple to see that the solution provided by R (without any warning!) is incorrect, but, in general, that is not so simple and one may take a wrong solution as a correct one. Paul On 5/8/07, Ravi Varadhan [EMAIL PROTECTED] wrote: Your function, (x1-x2)^2, has zero gradient at all the starting values such that x1 = x2, which means that the gradient-based search methods will terminate there because they have found a critical point, i.e. a point at which the gradient is zero (which can be a maximum or a minimum or a saddle point). However, I do not why optim converges to the boundary maximum, when analytic gradient is supplied (as shown by Sundar). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: -Original Message- From: [EMAIL PROTECTED] [ On Behalf Of Paul Smith Sent: Monday, May 07, 2007 6:26 PM To: R-help Subject: Re: [R] Bad optimization solution On 5/7/07, Paul Smith [EMAIL PROTECTED] wrote: I think the problem is the starting point. I do not remember the details of the BFGS method, but I am almost sure the (.5, .5) starting point is suspect, since the abs function is not differentiable at 0. If you perturb the starting point even slightly you will have no problem. Paul Smith [EMAIL PROTECTED] To Sent by: R-help r-help@stat.math.ethz.ch [EMAIL PROTECTED] cc at.math.ethz.ch Subject [R] Bad optimization solution 05/07/2007 04:30 PM Dear All I am trying to perform the below optimization problem, but getting (0.5,0.5) as optimal solution, which is wrong; the correct solution should be (1,0) or (0,1). Am I doing something wrong? I am using R 2.5.0 on Fedora Core 6 (Linux). Thanks in advance, Paul -- myfunc - function(x) { x1 - x[1] x2 - x[2] abs(x1-x2) } optim(c(0.5,0.5),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control= list(fnscale=-1)) Yes, with (0.2,0.9), a correct solution comes out. However, how can one be sure in general that the solution obtained by optim is correct? In ?optim says: Method 'L-BFGS-B' is that of Byrd _et. al._ (1995) which allows _box constraints_, that is each variable can be given a lower and/or upper bound. The initial value must satisfy the constraints. This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning. which only demands that the initial value must satisfy the constraints. Furthermore, X^2 is everywhere differentiable and notwithstanding the reported problem occurs with myfunc - function(x) { x1 - x[1] x2 - x[2] (x1-x2)^2 } optim(c(0.2,0.2),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control= list(fnscale=-1)) Paul
Re: [R] A function for raising a matrix to a power?
Paul, Your solution based on SVD does not work even for the matrix in your example (the reason it worked for e=3, was that it is an odd power and since P is a permutation matrix. It will also work for all odd powers, but not for even powers). However, a spectral decomposition will work for all symmetric matrices (SVD based exponentiation doesn't work for any matrix). Here is the function to do exponentiation based on spectral decomposition: expM.sd - function(X,e){Xsd - eigen(X); Xsd$vec %*% diag(Xsd$val^e) %*% t(Xsd$vec)} P - matrix(c(.3,.7, .7, .3), ncol=2) P [,1] [,2] [1,] 0.3 0.7 [2,] 0.7 0.3 P %*% P [,1] [,2] [1,] 0.58 0.42 [2,] 0.42 0.58 expM(P,2) [,1] [,2] [1,] 0.42 0.58 [2,] 0.58 0.42 expM.sd(P,2) [,1] [,2] [1,] 0.58 0.42 [2,] 0.42 0.58 Exponentiation based on spectral decomposition should be generally more accurate (not sure about the speed). A caveat is that it will only work for real, symmetric or complex, hermitian matrices. It will not work for asymmetric matrices. It would work for asymmetric matrix if the eigenvectors are made to be orthonormal. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Gilbert Sent: Monday, May 07, 2007 5:16 PM To: Atte Tenkanen Cc: r-help@stat.math.ethz.ch Subject: Re: [R] A function for raising a matrix to a power? You might try this, from 9/22/2006 with subject line Exponentiate a matrix: I am getting a bit rusty on some of these things, but I seem to recall that there is a numerical advantage (speed and/or accuracy?) to diagonalizing: expM - function(X,e) { v - La.svd(X); v$u %*% diag(v$d^e) %*% v$vt } P - matrix(c(.3,.7, .7, .3), ncol=2) P %*% P %*% P [,1] [,2] [1,] 0.468 0.532 [2,] 0.532 0.468 expM(P,3) [,1] [,2] [1,] 0.468 0.532 [2,] 0.532 0.468 I think this also works for non-integer, negative, large, and complex exponents: expM(P, 1.5) [,1] [,2] [1,] 0.3735089 0.6264911 [2,] 0.6264911 0.3735089 expM(P, 1000) [,1] [,2] [1,] 0.5 0.5 [2,] 0.5 0.5 expM(P, -3) [,1][,2] [1,] -7.3125 8.3125 [2,] 8.3125 -7.3125 expM(P, 3+.5i) [,1] [,2] [1,] 0.4713+0.0141531i 0.5287-0.0141531i [2,] 0.5287-0.0141531i 0.4713+0.0141531i Paul Gilbert Doran, Harold wrote: Suppose I have a square matrix P P - matrix(c(.3,.7, .7, .3), ncol=2) I know that P * P Returns the element by element product, whereas P%*%P Returns the matrix product. Now, P2 also returns the element by element product. But, is there a slick way to write P %*% P %*% P Obviously, P3 does not return the result I expect. Thanks, Harold Atte Tenkanen wrote: Hi, Is there a function for raising a matrix to a power? For example if you like to compute A%*%A%*%A, is there an abbreviation similar to A^3? Atte Tenkanen A=rbind(c(1,1),c(-1,-2)) A [,1] [,2] [1,]11 [2,] -1 -2 A^3 [,1] [,2] [1,]11 [2,] -1 -8 But: A%*%A%*%A [,1] [,2] [1,]12 [2,] -2 -5 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential inform...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bad optimization solution
Your function, (x1-x2)^2, has zero gradient at all the starting values such that x1 = x2, which means that the gradient-based search methods will terminate there because they have found a critical point, i.e. a point at which the gradient is zero (which can be a maximum or a minimum or a saddle point). However, I do not why optim converges to the boundary maximum, when analytic gradient is supplied (as shown by Sundar). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Monday, May 07, 2007 6:26 PM To: R-help Subject: Re: [R] Bad optimization solution On 5/7/07, Paul Smith [EMAIL PROTECTED] wrote: I think the problem is the starting point. I do not remember the details of the BFGS method, but I am almost sure the (.5, .5) starting point is suspect, since the abs function is not differentiable at 0. If you perturb the starting point even slightly you will have no problem. Paul Smith [EMAIL PROTECTED] To Sent by: R-help r-help@stat.math.ethz.ch [EMAIL PROTECTED] cc at.math.ethz.ch Subject [R] Bad optimization solution 05/07/2007 04:30 PM Dear All I am trying to perform the below optimization problem, but getting (0.5,0.5) as optimal solution, which is wrong; the correct solution should be (1,0) or (0,1). Am I doing something wrong? I am using R 2.5.0 on Fedora Core 6 (Linux). Thanks in advance, Paul -- myfunc - function(x) { x1 - x[1] x2 - x[2] abs(x1-x2) } optim(c(0.5,0.5),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control= list(fnscale=-1)) Yes, with (0.2,0.9), a correct solution comes out. However, how can one be sure in general that the solution obtained by optim is correct? In ?optim says: Method 'L-BFGS-B' is that of Byrd _et. al._ (1995) which allows _box constraints_, that is each variable can be given a lower and/or upper bound. The initial value must satisfy the constraints. This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning. which only demands that the initial value must satisfy the constraints. Furthermore, X^2 is everywhere differentiable and notwithstanding the reported problem occurs with myfunc - function(x) { x1 - x[1] x2 - x[2] (x1-x2)^2 } optim(c(0.2,0.2),myfunc,lower=c(0,0),upper=c(1,1),method=L-BFGS-B,control= list(fnscale=-1)) Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [SPAM] - Re: R package development in windows - BayesianFilter detected spam
Harold, I totally echo your sentiments on the difficulty of creating an R package in Windows. I really wish that this process could be made a bit less painful. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold Sent: Thursday, May 03, 2007 3:33 PM To: Duncan Murdoch Cc: r-help@stat.math.ethz.ch Subject: Re: [R] [SPAM] - Re: R package development in windows - BayesianFilter detected spam Thanks, Duncan. I'll look into that. Is there an authoritative document that codifies the new package development procedures for 2.5.0 (windows-specific), or is that Writing R Extensions? In this thread alone I've received multiple emails pointing to multiple web sites with instructions for windows. Inasmuch as its appreciated, I'm a bit confused as to which I should consider authoritative. I do hope I can resolve this and appreciate the help I've received. However, I feel a bit compelled to note how very difficult this process is. Harold -Original Message- From: Duncan Murdoch [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 3:24 PM To: Doran, Harold Cc: Gabor Grothendieck; r-help@stat.math.ethz.ch Subject: [SPAM] - Re: [R] R package development in windows - Bayesian Filter detected spam On 5/3/2007 3:04 PM, Doran, Harold wrote: Thanks Gabor, Sundar, and Tony. Indeed, Rtools was missing from the path. With that resolved, and another 10 minute windows restart, I get the following below. The log suggests that hhc is not installed. It is, and, according to the directions I am following, I have placed it in the c:\cygwin directory. I think the problem is that you are following a real mix of instructions, and they don't make sense. It would be nice if folks would submit patches to the R Admin manual (or to the Rtools web site) rather than putting together web sites with advice that is bad from day one, and quickly gets worse when it is not updated. BTW, package.skeleton() doesn't seem to create the correct DESCRIPTION template. I had to add the DEPENDS line. Without this, I get another error. C:\Program Files\R\R-2.4.1\binRcmd build --force --binary g:\foo R 2.4.1 is no longer current; the package building instructions in R 2.5.0 have been simplified a bit. You might want to try those. Duncan Murdoch * checking for file 'g:\foo/DESCRIPTION' ... OK * preparing 'g:\foo': * checking DESCRIPTION meta-information ... OK * removing junk files * checking for LF line-endings in source files * checking for empty or unneeded directories * building binary distribution WARNING * some HTML links may not be found installing R.css in c:/TEMP/Rinst40061099 Using auto-selected zip options '' latex: not found latex: not found latex: not found -- Making package foo latex: not found adding build stamp to DESCRIPTION latex: not found latex: not found latex: not found installing R files latex: not found installing data files latex: not found installing man source files installing indices latex: not found not zipping data installing help Warning: \alias{foo} already in foo-package.Rd -- skipping the one in foo.Rd Building/Updating help pages for package 'foo' Formats: text html latex example chm foo-package texthtmllatex example chm foo texthtmllatex example chm mydatatexthtmllatex example chm hhc: not found cp: cannot stat `c:/TEMP/Rbuild40048815/foo/chm/foo.chm': No such file or direct ory make[1]: *** [chm-foo] Error 1 make: *** [pkg-foo] Error 2 *** Installation of foo failed *** Removing 'c:/TEMP/Rinst40061099/foo' ERROR * installation failed C:\Program Files\R\R-2.4.1\bin -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 2:50 PM To: Doran, Harold Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R package development in windows It can find sh.exe so you haven't installed Rtools. There are several HowTo's listed in the links section here that include pointers to R manuals and other step by step instructions: http://code.google.com/p/batchfiles/ On 5/3/07, Doran, Harold [EMAIL PROTECTED] wrote: I'm attempting to build an R package for distribution and am
Re: [R] Simple plot question
Check out the function mvrnorm in package MASS. library(MASS) ?mvrnorm Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Kaplan Sent: Thursday, April 26, 2007 2:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Simple plot question Hi, I have been searching and can't seem to find a simple command that will allow me to sample from a multivariate normal distribution with known covariance matrix. I am likely missing something. Thanks in advance. David -- === David Kaplan, Ph.D. Professor Department of Educational Psychology University of Wisconsin - Madison Educational Sciences, Room, 1061 1025 W. Johnson Street Madison, WI 53706 email: [EMAIL PROTECTED] homepage: http://www.education.wisc.edu/edpsych/facstaff/kaplan/kaplan.htm Phone: 608-262-0836 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimates at each iteration of optim()?
Deepankar, Here is an example using BFGS: fr - function(x) { ## Rosenbrock Banana function + x1 - x[1] + x2 - x[2] + 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 + } grr - function(x) { ## Gradient of 'fr' + x1 - x[1] + x2 - x[2] + c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), +200 * (x2 - x1 * x1)) + } optim(c(-1.2,1), fr, grr, method = BFGS, control=list(trace=TRUE)) initial value 24.20 iter 10 value 1.367383 iter 20 value 0.134560 iter 30 value 0.001978 iter 40 value 0.00 final value 0.00 converged $par [1] 1 1 $value [1] 9.594955e-18 $counts function gradient 110 43 $convergence [1] 0 $message NULL This example shows that the parameter estimates are printed out every 10 iterations. However, trying different integer values for trace from 2 to 10 (trace = 1 behaves the same as trace=TRUE) did not change anything. If you want to get estimates at every iteration, look at the source code for BFGS (which I assume is in FORTRAN). You may have to modify the source code and recompile it yourself to get more detailed trace for BFGS. However, you can get parameter iterates at every step for L-BFGS-B using trace=6, although this gives a lot more information than just the parameter estimates. Alternatively, you can use the CG methods with trace=TRUE or trace=1, which is a generally a lot slower than BFGS or L-BFGS-B. Why do you want to look at parameter estimates for each step, anyway? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of DEEPANKAR BASU Sent: Monday, April 23, 2007 11:34 AM To: Peter Dalgaard Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Estimates at each iteration of optim()? I read the description of the trace control parameter in ?optim and then also looked at the examples given at the end. In one of the examples I found that they had used trace=TRUE with the method SANN. I am using the method BFGS and I tried using trace=TRUE too but I did not get the parameter estimates at each iteration. As you say, it might be method dependent. I tried reading the source code for optim but could not find out what I was looking for. Hence, I was wondering if anyone could tell me what option to use with the method BFGS to get the parameter estimates at each iteration of the optimization. Deepankar - Original Message - From: Peter Dalgaard [EMAIL PROTECTED] Date: Monday, April 23, 2007 2:46 am Subject: Re: [R] Estimates at each iteration of optim()? DEEPANKAR BASU wrote: I am trying to maximise a complicated loglikelihood function with the optim command. Is there some way to get to know the estiamtes at each iteration? When I put control=list(trace=TRUE) as an option in optim, I just got the initial and final values of the loglikelihood, number of iterations and whether the routine has converged or not. I need to know the estimate values at each iteration. It might help if you actually _read_ the description of the trace control parameter (hint: it is not an on/off switch) in ?optim... And, as it says, this is method dependent, so you may have to study the source code. Deepankar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimates at each iteration of optim()?
Without knowing much about your problem, it is hard to suggest good strategies. However, if you are having trouble with the estimates of covariance matrix not being positive-definite, you can force them to be positive-definite after each iteration, before moving on to the next iteration. Look at the make.positive.definite function from corpcor package. This is just one approach. There may be better approaches - perhaps, an EM-like approach might be applicable that would automatically satisfy all parameter constraints. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: DEEPANKAR BASU [mailto:[EMAIL PROTECTED] Sent: Monday, April 23, 2007 1:09 PM To: Ravi Varadhan Cc: 'Peter Dalgaard'; r-help@stat.math.ethz.ch Subject: Re: RE: [R] Estimates at each iteration of optim()? Ravi, Thanks a lot for your detailed reply. It clarifies many of the confusions in my mind. I want to look at the parameter estimates at each iteration because the full model that I am trying to estimate is not converging; a smaller version of the model converges but the results are quite meaningless. The problem in the estimation of the full model is the following: my likelihood function contains the elements of a (bivariate normal) covariance matrix as parameters. To compute the likelihood, I have to draw random samples from the bivariate normal distribution. But no matter what starting values I give, I cannot ensure that the covariance matrix remains positive definite at each iteration of the optimization exercise. Moreover, as soon as the covariance matrix fails to be positive definite, I get an error message (because I can no longer draw from the bivariate normal distribution) and the program stops. Faced with this problem, I wanted to see exactly at which parameter estimates the covariance matrix fails to remain positive definite. From that I would think of d evising a method to get around the problem, at least I would try to. Probably there is some other way to solve this problem. I would like your opinion on the following question: is there some way I can transform the three parametrs of my (2 by 2) covariance matrix (the two standard devaitions and the correlation coefficient) to ensure that the covariance matrix remains positive definite at each iteration of the optimization. Is there any method other than transforming the parameters to ensure this? Deepankar - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] Date: Monday, April 23, 2007 12:21 pm Subject: RE: [R] Estimates at each iteration of optim()? Deepankar, Here is an example using BFGS: fr - function(x) { ## Rosenbrock Banana function + x1 - x[1] + x2 - x[2] + 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 + } grr - function(x) { ## Gradient of 'fr' + x1 - x[1] + x2 - x[2] + c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), +200 * (x2 - x1 * x1)) + } optim(c(-1.2,1), fr, grr, method = BFGS, control=list(trace=TRUE)) initial value 24.20 iter 10 value 1.367383 iter 20 value 0.134560 iter 30 value 0.001978 iter 40 value 0.00 final value 0.00 converged $par [1] 1 1 $value [1] 9.594955e-18 $counts function gradient 110 43 $convergence [1] 0 $message NULL This example shows that the parameter estimates are printed out every 10 iterations. However, trying different integer values for trace from 2 to 10 (trace = 1 behaves the same as trace=TRUE) did not change anything. If you want to get estimates at every iteration, look at the source code for BFGS (which I assume is in FORTRAN). You may have to modify the source code and recompile it yourself to get more detailed trace for BFGS. However, you can get parameter iterates at every step for L-BFGS- B using trace=6, although this gives a lot more information than just the parameterestimates. Alternatively, you can use the CG methods with trace=TRUE or trace=1, which is a generally a lot slower than BFGS or L-BFGS-B. Why do you want to look at parameter estimates for each step, anyway? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
Re: [R] Suggestions for statistical computing course
Hi Giovanni, You may want to consider: Numerical analysis for statisticians (Springer) by Ken Lange. We used when I was taking a graduate level (MS and PhD students) course in statistical computing. I really like it and still use it frequently. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Giovanni Petris Sent: Friday, April 20, 2007 9:34 AM To: r-help@stat.math.ethz.ch Subject: [R] Suggestions for statistical computing course Dear R-helpers, I am planning a course on Statistical Computing and Computational Statistics for the Fall semester, aimed at first year Masters students in Statistics. Among the topics that I would like to cover are linear algebra related to least squares calculations, optimization and root-finding, numerical integration, Monte Carlo methods (possibly including MCMC), bootstrap, smoothing and nonparametric density estimation. Needless to say, the software I will be using is R. 1. Does anybody have a suggestion about a book to follow that covers (most of) the topics above at a reasonable revel for my audience? Are there any on-line publicly-available manuals, lecture notes, instructional documents that may be useful? 2. I do most of my work in R using Emacs and ESS. That means that I keep a file in an emacs window and I submit it to R one line at a time or one region at a time, making corrections and iterating as needed. When I am done, I just save the file with the last, working, correct (hopefully!) version of my code. Is there a way of doing something like that, or in the same spirit, without using Emacs/ESS? What approach would you use to polish and save your code in this case? For my course I will be working in a Windows environment. While I am looking for simple and effective solutions that do not require installing emacs in our computer lab, the answer you should teach your students emacs/ess on top of R is perfecly acceptable. Thank you for your consideration, and thank you in advance for the useful replies. Have a good day, Giovanni -- Giovanni Petris [EMAIL PROTECTED] Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convergence
No. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alberto Monteiro Sent: Thursday, April 19, 2007 9:24 AM To: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] convergence Ted Harding wrote: There are various ways round this, but a 'for' loop with a fixed number of iterations is not usully one of them! The simplest is to use while(). A possibly strategy is Y.old - initial.Y while(TRUE){ Y - compute.Y(Y.old, ...) if(abs(Y - Y.old) small.number) break Y.old - Y } This will loop indefinitely until the convergence criterion abs(Y - Y.old) small.number is met, and then stop. I guess some precaution must be taken to prevent that the loop runs forever. Those algorithms that must optimize something, but run the risk of running forever, sound like the chess playing engine: we know that a deterministic solution exists (there is a finite number of chess positions), but it's not practical to check all of them. I read somewhere that computer loop problems are treated as if the computer was playing chess agains Murphy: it tries hard to solve the problem, but sometimes he must give up a path and backtrack to a less optimum but faster solution. Do I make any sense? Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems in programming a simple likelihood
Hi Deepankar, Dimitris' code works just fine. Your problem is that the output of optim does not have a corresponding summary method. Instead you should simply type the name of the object returned by optim to look at the results. out - optim(mu.start, mlogl, method = CG, y = women$J, X = cbind(1, women$M, women$S)) out $par [1] -3.0612277 -1.4567141 0.3659251 $value [1] 13.32251 $counts function gradient 357 101 $convergence [1] 1 $message NULL Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Deepankar Basu Sent: Thursday, April 19, 2007 12:42 PM To: Dimitris Rizopoulos Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Problems in programming a simple likelihood Dimitris, Thanks a lot for your suggestion and also for suggestions that others have provided. I am learning fast and with the help of the R community will be able to get this going pretty soon. Of course, right now I am just trying to learn the language; so I am trying to program a standard probit model for which I know the answers. I could easily get the estimates with glm. But I want to program the probit model to make sure I understand the subtleties of R. I tried running the alternate script that you provided, but it is still not working. Am I making some mistake? Here is what I get when I run your script (which shows that the maximum number of iterations was reached without convergence): source(probit1.R) summary(out) Length Class Mode par 3 -none- numeric value 1 -none- numeric counts 2 -none- numeric convergence 1 -none- numeric message 0 -none- NULL Here is the script (exactly what you had suggested): mlogl - function (mu, y, X) { zeta - as.vector(X %*% mu) y.logic - as.logical(y) lgLik - numeric(length(y)) lgLik[y.logic] - pnorm(zeta[y.logic], log.p = TRUE) lgLik[!y.logic] - pnorm(zeta[!y.logic], lower.tail = FALSE, log.p = TRUE) -sum(lgLik) } women - read.table(http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII /Women13.txt, header=TRUE) mu.start - c(-3, -1.5, 0.5) out - optim(mu.start, mlogl, method = BFGS, y = women$J, X = cbind(1, women$M, women$S)) out glm.fit(x = cbind(1, women$M, women$S), y = women$J, family = binomial(link = probit))$coefficients Thanks. Deepankar On Thu, 2007-04-19 at 09:26 +0200, Dimitris Rizopoulos wrote: try the following: mlogl - function (mu, y, X) { zeta - as.vector(X %*% mu) y.logic - as.logical(y) lgLik - numeric(length(y)) lgLik[y.logic] - pnorm(zeta[y.logic], log.p = TRUE) lgLik[!y.logic] - pnorm(zeta[!y.logic], lower.tail = FALSE, log.p = TRUE) -sum(lgLik) } women - read.table(http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII /Women13.txt, header=TRUE) mu.start - c(0, -1.5, 0.01) out - optim(mu.start, mlogl, method = BFGS, y = women$J, X = cbind(1, women$M, women$S)) out glm.fit(x = cbind(1, women$M, women$S), y = women$J, family = binomial(link = probit))$coefficients I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Deepankar Basu [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, April 19, 2007 12:38 AM Subject: [R] Problems in programming a simple likelihood As part of carrying out a complicated maximum likelihood estimation, I am trying to learn to program likelihoods in R. I started with a simple probit model but am unable to get the code to work. Any help or suggestions are most welcome. I give my code below: mlogl - function(mu, y, X) { n - nrow(X) zeta - X%*%mu llik - 0 for (i in 1:n) { if (y[i]==1) llik - llik + log(pnorm(zeta[i,], mean=0, sd=1)) else llik - llik + log(1-pnorm(zeta[i,], mean=0, sd=1)) } return(-llik) } women - read.table(~/R/Examples/Women13.txt, header=TRUE) # DATA # THE DATA SET CAN BE ACCESSED HERE # women - read.table(http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII /Women13.txt, header=TRUE) # I HAVE CHANGED THE NAMES OF THE VARIABLES # J is changed to work
Re: [R] Computing the rank of a matrix.
Hi Martin, I played a bit with rankMat. I will first state the conclusions of my numerical experiments and then present the results to support them: 1. I don't believe that rankMat, or equivalently Matlab's rank computation, is necessarily better than qr(.)$rank or (qr., LAPACK=TRUE)$rank. In fact, for the notorious Hilbert matrix, rankMat can give poor estimates of rank. 2. There exists no universally optimal (i.e. optimal for all A) tol in qr(A, tol)$rank that would be optimally close to rankMat. The discrepancy in the ranks computed by qr(A)$rank and rankMat(A) obviously depends on the matrix A. This is evident from the tol used in rankMat: tol - max(d) * .Machine$double.eps * abs(singValA[1]) So, this value of tol in qr will also minimize the discrepancy. 3. The tol parameter is irrelevant in qr(A, tol, LAPACK=TRUE)$rank, i.e. LAPACK doesn't seem to utilize the tol parameter when computing the rank. However, qr(A, tol, LAPACK=FALSE)$rank does depend on tol. Now, here are the results: 1. hilbert.rank - matrix(NA,20,3) hilbert - function(n) { i - 1:n; 1 / outer(i - 1, i, +) } for (i in 1:20) { + himat - hilbert(i) + hilbert.rank[i,1] - rankMat(himat) + hilbert.rank[i,2] - qr(himat,tol=1.0e-14)$rank + hilbert.rank[i,3] - qr(himat, tol = .Machine$double.eps, LAPACK = TRUE)$rank + } hilbert.rank [,1] [,2] [,3] [1,]111 [2,]222 [3,]333 [4,]444 [5,]555 [6,]666 [7,]777 [8,]888 [9,]999 [10,] 10 10 10 [11,] 10 11 11 [12,] 11 12 12 [13,] 11 12 13 [14,] 11 13 14 [15,] 12 13 15 [16,] 12 15 16 [17,] 12 16 17 [18,] 12 16 18 [19,] 13 17 19 [20,] 13 17 20 We see that rankMat underestimates the rank for n 10, but qr(himat, tol = .Machine$double.eps, LAPACK=TRUE)$rank gets it right. 2. Here I first, created a random rectangular matrix, and then added a number of rows to it, where these new rows are the same as some of the original rows except for a tiny amount of noise, which I call eps. So, the degree of rank deficiency is controlled by eps. I let eps take on 3 values: 1.e-07, 1.e-10, and 1.e-14, and show that the optimal tol (in terms of being close to rankMat) depends on the level of precision (eps) in the matrix. set.seed(123) nrow - 15 ncol - 20 nsim - 5000 ndefic - 4 # number of nearly dependent rows eps - 1.e-07 rnk - matrix(NA, nsim, 5) for (j in 1:nsim) { + A - matrix(rnorm(nrow*ncol),nrow, ncol) + newrows - matrix(NA, ndefic, ncol) + for (i in 1:ndefic) { + newrows[i,] - A[nrow-i,] + rnorm(ncol, sd=min(abs(A[nrow-i+1,]))* eps) + } + + B - rbind(A,newrows) + rnk[j,1] - rankMat(B) + rnk[j,2] - qr(B, tol = 1.e-07)$rank + rnk[j,3] - qr(B, tol = 1.e-11)$rank + rnk[j,4] - qr(B, tol = 1.0e-14)$rank + rnk[j,5] - qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank + } apply(abs(rnk[,1] - rnk[,2:5]),2,sum) [1] 1935153 0 0 We observe that both qr(B, tol=1.e-14)$rank and qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank give exactly the same rank as rankMat. Now, we change eps - 1.e-10 and the results are: apply(abs(rnk[,1] - rnk[,2:5]),2,sum) [1] 19778 14263 166 220 This means that a tolerance of 1.e-14 works best. Now we lower eps further: eps - 1.e-14 apply(abs(rnk[,1] - rnk[,2:5]),2,sum) [1] 0 3 665 2 Clearly, the smaller tolerances yield rank estimates that are higher than that given by rankMat. That is, rankMat underestimates the rank as in the case of Hilbert matrix. 3. Now we show that qr(., tol, LAPACK=TRUE)$rank is independent of tol: exp - -(7:16) tol - 10^exp sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=FALSE)$rank) [1] 10 12 14 14 15 16 16 17 18 19 sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=TRUE)$rank) [1] 20 20 20 20 20 20 20 20 20 20 Looking forward to comments. Best, Ravi. - Original Message - From: Martin Maechler [EMAIL PROTECTED] Date: Saturday, April 7, 2007 10:57 am Subject: Re: [R] Computing the rank of a matrix. To: Ravi Varadhan [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], 'José Luis Aznarte M.' [EMAIL PROTECTED], r-help@stat.math.ethz.ch Ravi == Ravi Varadhan [EMAIL PROTECTED] on Fri, 6 Apr 2007 12:44:33 -0400 writes: Ravi Hi, qr(A)$rank will work, but just be wary of the Ravi tolerance parameter (default is 1.e-07), since the Ravi rank computation could be sensitive to the tolerance Ravi chosen. Yes, indeed. The point is that rank(Matrix) is well defined in pure math (linear algebra), as well as a singular matrix is. The same typically no longer makes sense as soon as you enter the real world: A matrix close to singular may have to be treated as if singular depending on its singularity closeness {{ learn about the condition number of a matrix }} and the same issues arise with rank(matrix). Of course
Re: [R] Computing the rank of a matrix.
Hi, qr(A)$rank will work, but just be wary of the tolerance parameter (default is 1.e-07), since the rank computation could be sensitive to the tolerance chosen. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Friday, April 06, 2007 11:07 AM To: José Luis Aznarte M. Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] Computing the rank of a matrix. How about qr(A)$rank or perhaps qr(A, LAPACK=TRUE)$rank Cheers, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: [EMAIL PROTECTED] Tel: (651) 733-6092 Fax: (651) 736-3122 José Luis Aznarte M. [EMAIL PROTECTED] To .ugr.es r-help@stat.math.ethz.ch Sent by: cc [EMAIL PROTECTED] at.math.ethz.ch Subject [R] Computing the rank of a matrix. 04/06/2007 06:39 AM Hi! Maybe this is a silly question, but I need the column rank (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function 'rank()' only gives me the ordering of the elements of my matrix. How can I compute the column rank of a matrix? Is there not an R equivalent to Matlab's 'rank()'? I've been browsing for a time now and I can't find anything, so any help will be greatly appreciated. Best regards! -- -- Jose Luis Aznarte M. http://decsai.ugr.es/~jlaznarte Department of Computer Science and Artificial Intelligence Universidad de Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax: +34 - 958 - 24 00 79 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Likelihood returning inf values to optim(L-BFGS-B) other options?
Hi, In your code, the variables x (which I assume is the observed data), Tvec, and flag are not passed to the function as arguments. This could be a potential problem. Another problem could be that you have to use negative log-likelihood function as input to optim, since by default it minimizes the function, whereas you are interested in finding the argmax of log-likelihood. So, in your function you should return (-ll) instead of ll. If the above strategies don't work, I would try different initial values (it would be best if you have a data-driven strategy for picking a starting value) and different optimization methods (e.g. conjugate gradient with Polak-Ribiere steplength option, Nelder-Mead, etc.). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, April 05, 2007 6:12 AM To: r-help@stat.math.ethz.ch Subject: [R] Likelihood returning inf values to optim(L-BFGS-B) other options? Dear R-help list, I am working on an optimization with R by evaluating a likelihood function that contains lots of Gamma calculations (BGNBD: Hardie Fader Lee 2005 Management Science). Since I am forced to implement lower bounds for the four parameters included in the model, I chose the optim() function mith L-BFGS-B as method. But the likelihood often returns inf-values which L-BFGS-B can't deal with. Are there any other options to implement an optimization algorithm with R accounting for lower bounds and a four parameter-space? Here is the error message I receive (german): -- out=optim(c(.1,.1,.1,.1),fn,method=L-BFGS-B,lower=c(.0001,.0001,.0001,.000 1,.0001)) Fehler in optim(c(0.1, 0.1, 0.1, 0.1), fn, method = L-BFGS-B, lower = c(1e-04, : L-BFGS-B benötigt endliche Werte von 'fn' Zusätzlich: Es gab 50 oder mehr Warnungen (Anzeige der ersten 50 mit warnings()) -- And this is the likelihood function: -- fn-function(p) { A1=(gamma(p[1]+x)*p[2]^p[1])/(gamma(p[1])) A2=(gamma(p[3]+p[4])*gamma(p[4]+x))/(gamma(p[4])*gamma(p[3]+p[4]+x)) A3=(1/(p[2]+Tvec))^(p[1]+x) A4=(p[3]/(p[4]+x-1))*((1/(p[2]+t_x))^(p[1]+x)) ll=sum(log(A1*A2*(A3+flag*A4))) return(ll) } Thank you very much for your help in advance! Best regards, Michael __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large matrix into a vector
c(HR) will do it. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of A Ezhil Sent: Wednesday, March 28, 2007 1:28 PM To: r-help@stat.math.ethz.ch Subject: [R] Large matrix into a vector Hi, I have a matrix HR(9x27). I would like to make a single vector with elements: t(HR[,1]) followed by t(HR[,2]) and then t(HR[,3] ... etc. Is there any neat way of converting this matrix into a vector rather doing something like c(t(HR[,1]), t(HR[,2]), t(HR[,3]) ..)? Thanks in Advance. Kind regards, Ezhil TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gaussian Adaptive Quadrature
Hi, The function integrate is based on QUADPACK Fortran package by Piessens. It does indeed perform adaptive Gauss-Kronrod quadrature, where Kronrod's method allows the re-use of abscissa from the previous iteration, thus enabling the estimation of the quadrature error and its control. In contrast, in the standard Gaussian quadrature methods this is not feasible. The terminology Gaussian quadrature is not restricted to Gauss-Hermite quadrature (where exp(-x^2) is the weight function), but applies more broadly to Gauss-Legendre, Gauss-Laguerre, etc., where the abscissa are chosen from Legendre, Laguerre polynomials. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Bates Sent: Wednesday, March 21, 2007 9:25 AM To: Doran, Harold Cc: Help mailing list - R Subject: Re: [R] Gaussian Adaptive Quadrature On 3/21/07, Doran, Harold [EMAIL PROTECTED] wrote: The function integrate() uses AGQ. There are other functions for gaussian quadrature in the statmod() package that I really like. I think that integrate does adaptive quadrature but not adaptive Gaussian quadrature (which probably should have been called adaptive Gauss-Hermite quadrature to be more specific). In the first case the adaptive refers to a choice of mesh size. In the second case one is integrating a function that is close to a multivariate Gaussian density by first finding the conditional optimum of the integrand and using a quadratic approximation to the log-integrand to establish the location of the Gauss-Hermite quadrature points. The Laplace approximation to the log-likelihood for a generalized linear mixed model is a 1-point adaptive Gauss-Hermite quadrature evaluation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Caio Lucidius Naberezny Azevedo Sent: Wednesday, March 21, 2007 5:55 AM To: Help mailing list - R Subject: [R] Gaussian Adaptive Quadrature Hi all, Does anybody know any function that performs gaussian adapative quadrature integration of univariate functions? Thanks in advance, Regards, Caio __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MCMC logit
As the error message clearly indicates, the function MCMClogit is unable to find the variable x1 (possibly x2,x3, and x4 also) in the data frame c.df. Check the names of the variables in that data frame and make sure that the names correspond to the formula specification. Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Anamika Chaudhuri Sent: Friday, March 09, 2007 3:27 PM To: r-help@stat.math.ethz.ch Subject: [R] MCMC logit Hi, I have a dataset with the binary outcome Y(0,1) and 4 covariates (X1,X@,X#,X$). I am trying to use MCMClogit to model logistic regression using MCMC. I am getting an error where it doesnt identify the covariates ,although its reading in correctly. The dataset is a sample of actual dataset. Below is my code: ### #retreive data # considering four covariates d.df=as.data.frame(read.table(c:/tina/phd/thesis/data/modified_data1.1.txt ,header=T,sep=,)) y=d.df[,ncol(d.df)] x=d.df[,1:4] c.df=cbind(y,x) #x=cbind(1,x) p - ncol(c.df) # marginal log-prior of beta[] logpriorfun - function(beta, mu, gshape, grate) + { + logprior = -p*log(2) + log(gamma(p+gshape)) - log(gamma(gshape)) + + gshape*log(grate) - (p+gshape)* log(grate+sum(abs(beta))) + return(logprior) + } require(MCMCpack) Loading required package: MCMCpack Loading required package: coda Loading required package: lattice Loading required package: MASS ## ## Markov Chain Monte Carlo Package (MCMCpack) ## Copyright (C) 2003-2007 Andrew D. Martin and Kevin M. Quinn ## ## Support provided by the U.S. National Science Foundation ## (Grants SES-0350646 and SES-0350613) ## [1] TRUE Warning message: package 'MASS' was built under R version 2.4.1 a0 = 0.5 b0 = 1 mu0 = 0 beta.init=list(c(0, rep(0.1,4)), c(0, rep(-0.1,4)), c(0, rep(0, 4))) burnin.cycles = 1000 mcmc.cycles = 25000 # three chains post.list - lapply(beta.init, function(vec) + { + posterior - MCMClogit(y~x1+x2+x3+x4, data=c.df, burnin=burnin.cycles, mcmc=mcmc.cycles, + thin=5, tune=0.5, beta.start=vec, user.prior.density=logpriorfun, logfun=TRUE, + mu=mu0, gshape=a0, grate=b0) + return(posterior) + }) Error in eval(expr, envir, enclos) : object x1 not found Any suggestions will be greatly appreciated. Thanks, Anamika - We won't tell. Get more on shows you hate to love [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with faster optimization for large parameter problem
Hi James, There are a few things that immediately come to my mind that you could try: 1. Profile your code using Rprof to detect which steps are the most time-consuming. 2. In your likelihood function you can make the code a bit more faster by using outer instead of %*%. 3. Using conjugate-gradient type methods in optim might be a better option since you are dealing with thousands of parameters and BFGS could use up a lot of memory working with a large Hessian matrix. CG methods use much less storage and memory than quasi-Newton methods. The main caveat here is that the CG methods generally have slower convergence than QN type methods, unless you can precondition the problem. Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of James Fowler Sent: Friday, March 02, 2007 4:34 AM To: r-help@stat.math.ethz.ch Subject: [R] Help with faster optimization for large parameter problem Hello all, I have a large parameter problem with the following very simple likelihood function: fn-function(param) { x1-param[1:n] g1-param[(n+1):(2*n)] beta-param[(2*n+1):(2*n+k)] sigma2-param[2*n+k+1]^2 meang1sp-mean(g1[sp]) mu-beta%*%matrix(x1,1,n)-(g1[sp]-meang1sp)%*%matrix(g1,1,n) return(sum((ydc-mu)^2)/(2*sigma2) + n*k*log(sqrt(sigma2)) + mean(x1)^2 + mean(g1)^2 + 1000*(x1[1]x1[n])) } There's nothing special here -- it's just plain old OLS, except all the variables on the right hand side are parameters (only the variable ydc is observed). I have no problems getting this to recover parameter estimates from data I myself have generated for smaller problems (e.g. where ydc is a k=500 by n=50 matrix and there are 601 parameters). But the target problem with real data will be k=6000 by n=400 with 6801 parameters. I am currently using optim(method=BFGS) but it is slow. I can get good starting values for x1 and g1 with some svd techniques, and these help me generate the starting values for the betas via lm(). I then use optim() on a modified likelihood function to find g1,x1,sigma2 while holding beta fixed and then use optim() again to find beta while holding the other variables fixed. But eventually, I have to run optim on the unmodified likelihood function above and it is very slow, taking several days for large problems. I have also tried metrop() in mcmc, but I find this needs to be very close to the mode of the likelihood to be efficient (in fact, MCMCpack's metropolis function calls optim first and even requires it to invert the hessian before even starting the metropolis algorithm, unless we can provide our own covariance matrix). I will probably use metrop() to generate standard errors once I find a mode In the mean time, I can't help thinking that there is some easy way to make this much faster than I am currently doing, especially since the likelihood is normal. I am sure I have missed something obvious so I'd very much appreciate any advice you could give on packages in R or code that might help. Thanks! james -- James H. Fowler, Associate Professor web: http://jhfowler.ucsd.edu Department of Political Scienceemail: [EMAIL PROTECTED] University of California, San Diegophone: (858) 534-6807 Social Sciences Building 383, #0521fax: (858) 534-7130 9500 Gilman Drive, La Jolla, CA 92093 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Questions regarding biplot
Hi, I am using biplot (in package stats) to visualize the results of some PCA analyses. When I use scale=0 in the biplot function, it draws X (PC1) and Y (PC2) axes, with the scales multiplied by sqrt(N), where N is the sample size. Is there a way to change these scales so that the actual values of PC1 and PC2 are shown, instead of multiplication by sqrt(n)? Also, how can I plot the points in biplot with different colors or symbols corresponding to a grouping variable, instead of sample row number? Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] returns from dnorm and dmvnorm
I guarantee that it would also happen on all future versions of R. Why would you expect density to be smaller than 1? The only constraints on density are that (a) it is non-negative and (b) it integrates to one. The smaller the variance, the greater the density is around its center. Density can be made to become arbitrarily large by letting the variance gets close to zero, and in the limit you will obtain Dirac's delta function. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of A Hailu Sent: Monday, February 26, 2007 3:04 PM To: r-help@stat.math.ethz.ch Subject: [R] returns from dnorm and dmvnorm Hi All, Why would calls to dnorm and dmvnorm return values that are above 1? For example, dnorm(0.3,mean=0, sd=0.1) [1] 3.989423 This is happening on two different installations of R that I have. Thank you. Hailu [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using integrate in a function definition
Your function jjj is not vectorized. Try this: jjj - function(www) sapply(www, function(x)2*integrate(dnorm,0,x)$value) plot(jjj, 0, 5) It should work. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of theo borm Sent: Friday, February 23, 2007 1:43 PM To: r-help@stat.math.ethz.ch Subject: [R] using integrate in a function definition Dear list members, I'm quite new to R, and though I tried to find the answer to my probably very basic question through the available resources (website, mailing list archives, docs, google), I've not found it. If I try to use the integrate function from within my own functions, my functions seem to misbehave in some contexts. The following example is a bit silly, but illustrates what's happening: The following behaves as expected: jjj-function(www) {2*integrate(dnorm,0,www)$value} kkk-function(www) {2*(pnorm(www)-0.5)} jjj(2) [1] 0.9544997 kkk(2) [1] 0.9544997 However, if I do: plot(jjj,0,5) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ whereas plot(kkk,0,5) produces a nice plot. xxx-seq(0:5) yyy-jjj(xxx) zzz-kkk(xxx) produces no errors, but: yyy [1] 0.6826895 zzz [1] 0.6826895 0.9544997 0.9973002 0.367 0.994 1.000 Why is this? Is this some R language feature that I've completely missed? Ultimately my problem is that I have a mathematical function describing a distribution, and I want to use this in a Kolmogorov-Smirnov test where I need a cumulative distribution function. If I use the following (synthetic) dataset with ks.test with either the jjj or kkk function, I get: ppp-c(1.74865955,1.12220426,0.24760427,0.24351439,0.10870853, 0.72023272,0.40245392,0.16002948,0.24203950,0.44029851, 0.34830446,1.66967152,1.71609574,1.17540830,0.22306379, 0.64382628,0.37382795,0.84361547,1.92138362,0.02844235, 0.11238473,0.21237557,1.01058073,2.33108243,1.36034473, 1.88951679,0.18230647,0.96571916,0.91239760,2.05574766, 0.33681130,2.76006257,0.83952491,1.24038831,1.46199671, 0.24694163,0.01565159,0.94816108,1.04642673,0.36556444, 2.20760578,1.59518590,0.83951030,0.07113652,0.97422886, 0.59835793,0.08383890,1.09180853,0.43508943,0.35368637, 0.75805978,0.12790161,1.56798563,1.68669770,0.56168021) ks.test(ppp,kkk) One-sample Kolmogorov-Smirnov test data: ppp D = 0.1013, p-value = 0.5895 alternative hypothesis: two.sided [ which seems correct as the samples come from abs(rnorm()) ] ks.test(ppp,jjj) One-sample Kolmogorov-Smirnov test data: ppp D = 0.9875, p-value 2.2e-16 alternative hypothesis: two.sided [ which is *incorrect* ] Note: This example is artificial as I have used a function for which I know the integral; my real problem involves a distribution that I can only integrate numerically. How would I go about to solve this problem? With kind regards, Theo. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simulating from Langevin distributions
Hi Ranjan, I think that the following would work: library(MASS) rlangevin - function(n, mu, K) { q - length(mu) norm.sim - mvrnorm(n, mu=mu, Sigma=diag(1/K, q)) cp - apply(norm.sim, 1, function(x) sqrt(crossprod(x))) sweep(norm.sim, 1, cp, FUN=/) } mu - runif(7) mu - mu / sqrt(crossprod(mu)) K - 1.2 ylang - rlangevin(n=10, mu=mu, K=K) apply(ylang,1,crossprod) [1] 1 1 1 1 1 1 1 1 1 1 I hope that this helps. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ranjan Maitra Sent: Tuesday, February 13, 2007 1:04 PM To: r-help@stat.math.ethz.ch Subject: [R] simulating from Langevin distributions Dear all, I have been looking for a while for ways to simulate from Langevin distributions and I thought I would ask here. I am ok with finding an algorithmic reference, though of course, a R package would be stupendous! Btw, just to clarify, the Langevin distribution with (mu, K), where mu is a vector and K0 the concentration parameter is defined to be: f(x) = exp(K*mu'x) / const where both mu and x are p-variate vectors with norm 1. For p=2, this corresponds to von-Mises (for which algorithms exist, including in R/Splus) while for p=3, I believe it is called the Fisher distribution. I am looking for general p. Can anyone please help in this? Many thanks and best wishes, Ranjan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multidimensional Integration over arbitrary sets
Hi, By defining your function appropriately (e.g. using indicator functions), you can make adapt work: myfunc - function(x) { x[1]*x[2] * (x[1] = x[2]) } # Exact answer is 1/8 library(adapt) adapt(2, lo=c(0,0), up=c(1,1), functn=myfunc) value relerr minpts lenwrk ifail 0.1250612 0.00999505459071123 0 Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Saptarshi Guha Sent: Tuesday, February 13, 2007 3:34 PM To: R-Help Subject: [R] Multidimensional Integration over arbitrary sets Hi, I need to integrate a 2D function over range where the limits depend on the other e.g integrate f(x,y)=x*y over {x,0,1} and {y,x,1}. i.e \int_0^1 \int_x^1 xy dydx I checked adapt but it doesn't seem to help here. Are they any packages for this sort of thing? I tried RSitesearch but couldn't find the answer to this. Many thanks for you help. Regards Saptarshi Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls: missing value or an infinity (Error in numericDeriv) andsingular gradient matrixError in nlsModel
Hi Daniela, Please read the error message from nls. The problem is with the start values for the parameters a and b. You haven't specified any, so it uses default values of a=1 and b=1, which may not be very good. So, you should specify good start values, if you have a reasonable idea of what they should be. If you don't, then you could try selfStart, as the error message tells you to do. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniela Salvini Sent: Tuesday, February 13, 2007 4:09 PM To: r-help@stat.math.ethz.ch Subject: [R] nls: missing value or an infinity (Error in numericDeriv) andsingular gradient matrixError in nlsModel Hi, I am a non-expert user of R. I am essaying the fit of two different functions to my data, but I receive two different error messages. I suppose I have two different problems here... But, of which nature? In the first instance I did try with some different starting values for the parameters, but without success. If anyone could suggest a sensible way to proceed to solve these I would be really thankful! Daniela *** Details on data and formulas: polcurve=read.table(polcurve.txt,header=TRUE) polcurve distfath 1 1 0.138908965 2 2 0.113768871 3 3 0.114361753 4 4 0.051065765 5 5 0.095787946 6 6 0.123601131 7 7 0.117340746 8 8 0.054049434 9 9 0.112000962 10 10 0.074843028 11 11 0.064735196 12 12 0.098210497 13 13 0.093786895 14 14 0.033752379 15 15 0.038961039 16 16 0.023048216 17 17 0.010018243 18 18 0.007177034 19 19 0.003787879 20 20 0.0 21 21 0.0 #Exponential power function s=nls(fath~((b^2)/(a^2))*exp((-dist/a)^b),polcurve) Error in numericDeriv(form[[3]], names(ind), env) : Missing value or an infinity produced when evaluating the model In addition: Warning message: No starting values specified for some parameters. Intializing 'b', 'a' to '1.'. Consider specifying 'start' or using a selfStart model in: nls(fath ~ ((b^2)/(a^2)) * exp((-dist/a)^b), polcurve) #Geometric function s=nls(fath~((b-1)*(b-2)/a)*((1+dist/a)^-b),polcurve) Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates In addition: Warning message: No starting values specified for some parameters. Intializing 'b', 'a' to '1.'. Consider specifying 'start' or using a selfStart model in: nls(fath ~ ((b - 1) * (b - 2)/a) * ((1 + dist/a)^-b), Daniela Salvini Ph.D. student University of Copenhagen, Faculty of Life Sciences, Centre for Forest and Landscape Hørsholm Kongevej 11 2970 Hørsholm, Denmark e-mail: [EMAIL PROTECTED] Tel.: (+45)35281639 / 35281645 Fax.: (+45)35281517 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Timings of function execution in R [was Re: R in Industry]
Hi Profs. Ripley and Bates, I also recollect from a Tom Lumley email that when he profiled an MCMC computation, he found that pmin/pmax was the bottleneck. That is why he suggested the function that I called fast.pmax. I think that it would be nice to have restricted alternative functions dealing exclusively with numeric mode. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Bates Sent: Friday, February 09, 2007 10:05 AM To: Prof Brian Ripley Cc: R-Help Subject: Re: [R] Timings of function execution in R [was Re: R in Industry] On 2/9/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: The other reason why pmin/pmax are preferable to your functions is that they are fully generic. It is not easy to write C code which takes into account that , [, [- and is.na are all generic. That is not to say that it is not worth having faster restricted alternatives, as indeed we do with rep.int and seq.int. Anything that uses arithmetic is making strong assumptions about the inputs. It ought to be possible to write a fast C version that worked for atomic vectors (logical, integer, real and character), but is there any evidence of profiled real problems where speed is an issue? Yes. I don't have the profiled timings available now and one would need to go back to earlier versions of R to reproduce them but I did encounter a situation where the bottleneck in a practical computation was pmin/pmax. The binomial and poisson families for generalized linear models used pmin and pmax to avoid boundary conditions when evaluating the inverse link and other functions. When I profiled the execution of some generalized linear model and, more importantly for me, generalized linear mixed model fits, these calls to pmin and pmax were the bottleneck. That is why I moved some of the calculations for the binomial and poisson families in the stats package to compiled code. In that case I didn't rewrite the general form of pmin and pmax, I replaced specific calls in the compiled code. On Fri, 9 Feb 2007, Martin Maechler wrote: Ravi == Ravi Varadhan [EMAIL PROTECTED] on Thu, 8 Feb 2007 18:41:38 -0500 writes: Ravi Hi, Ravi greaterOf is indeed an interesting function. It is much faster than the Ravi equivalent R function, pmax, because pmax does a lot of checking for Ravi missing data and for recycling. Tom Lumley suggested a simple function to Ravi replace pmax, without these checks, that is analogous to greaterOf, which I Ravi call fast.pmax. Ravi fast.pmax - function(x,y) {i- xy; x[i]-y[i]; x} Ravi Interestingly, greaterOf is even faster than fast.pmax, although you have to Ravi be dealing with very large vectors (O(10^6)) to see any real difference. Yes. Indeed, I have a file, first version dated from 1992 where I explore the slowness of pmin() and pmax() (in S-plus 3.2 then). I had since added quite a few experiments and versions to that file in the past. As consequence, in the robustbase CRAN package (which is only a bit more than a year old though), there's a file, available as https://svn.r-project.org/R-packages/robustbase/R/Auxiliaries.R with the very simple content {note line 3 !}: - ### Fast versions of pmin() and pmax() for 2 arguments only: ### FIXME: should rather add these to R pmin2 - function(k,x) (x+k - abs(x-k))/2 pmax2 - function(k,x) (x+k + abs(x-k))/2 - {the funny argument name 'k' comes from the use of these to compute Huber's psi() fast : psiHuber - function(x,k) pmin2(k, pmax2(- k, x)) curve(psiHuber(x, 1.35), -3,3, asp = 1) } One point *is* that I think proper function names would be pmin2() and pmax2() since they work with exactly 2 arguments, whereas IIRC the feature to work with '...' is exactly the reason that pmax() and pmin() are so much slower. I've haven't checked if Gabor's pmax2.G - function(x,y) {z - x y; z * (x-y) + y} is even faster than the abs() using one. It may have the advantage of giving *identical* results (to the last bit!) to pmax() which my version does not --- IIRC the only reason I did not follow my own 'FIXME' above. I had then planned to implement pmin2() and pmax2() in C code, trivially, and and hence get identical (to the last bit!) behavior as pmin
Re: [R] R in Industry
Here is a function to create a Toeplitz matrix of any size, and an example of a 220 x 220 toeplitz matrix, which was created in almost no time: # Given a vector x, forms a Toeplitz matrix # toeplitz - function (x, sym=T) { if (!is.vector(x)) stop(x is not a vector) n - length(x) if (!sym) { if (!n%%2) stop(length of vector must be odd) n2 - (n+1)/2 A - matrix(NA, n2, n2) mat - matrix(x[col(A) - row(A) + n2], n2, n2) } else { A - matrix(NA, n, n) mat - matrix(x[abs(col(A) - row(A)) + 1], n, n) } mat } ### system.time(top.mat - toeplitz(runif(220))) [1] 0.00 0.01 0.02 NA NA Hope this is fast enough! Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefan Grosse Sent: Thursday, February 08, 2007 12:09 PM To: Albrecht, Dr. Stefan (AZ Private Equity Partner) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R in Industry I just ran on my Windows PC the benchmark from http://www.sciviews.org/benchmark/index.html which is pretty old now. Thats probably the reason for the errors which I did not correct. As you see R has some advantages but Matlab has also some advantages. However the differences are not to big. There is only one big difference which indeed includes loops (Creation of a 220x220 Toeplitz matrix) where Matlab is much faster. But maybe a simple change in the programmation can change that... Has someone in the list an updated script? Stefan Grosse The benchmarks: R 2.4.1 R Benchmark 2.3 === I. Matrix calculation - Creation, transp., deformation of a 1500x1500 matrix (sec): 0.865 800x800 normal distributed random matrix ^1000__ (sec): 0.136 Sorting of 2,000,000 random values__ (sec): 0.615 700x700 cross-product matrix (b = a' * a)___ (sec): 0.557 Linear regression over a 600x600 matrix (c = a \ b') (sec): 0 #ERROR II. Matrix functions FFT over 800,000 random values__ (sec): 0.557 Eigenvalues of a 320x320 random matrix__ (sec): 0.495 Determinant of a 650x650 random matrix__ (sec): 0.276 Cholesky decomposition of a 900x900 matrix__ (sec): 0 #ERROR Inverse of a 400x400 random matrix__ (sec): 0 #ERROR III. Programmation -- 750,000 Fibonacci numbers calculation (vector calc)_ (sec): 0.469 Creation of a 2250x2250 Hilbert matrix (matrix calc) (sec): 1.016667 Grand common divisors of 70,000 pairs (recursion)___ (sec): 0.3966671 Creation of a 220x220 Toeplitz matrix (loops)___ (sec): 0.552 Escoufier's method on a 37x37 matrix (mixed) (sec): 2.66 --- End of test --- Matlab 7.0.4 Matlab Benchmark 2 == Number of times each test is run__: 3 I. Matrix calculation - Creation, transp., deformation of a 1500x1500 matrix (sec): 0.29047 800x800 normal distributed random matrix ^1000__ (sec): 0.42967 Sorting of 2,000,000 random values__ (sec): 0.71432 700x700 cross-product matrix (b = a' * a)___ (sec): 0.14748 Linear regression over a 600x600 matrix (c = a \ b') (sec): 0.12831 -- Trimmed geom. mean (2 extremes eliminated): 0.26403 II. Matrix functions FFT over 800,000 random values__ (sec): 0.24591 Eigenvalues of a 320x320 random matrix__ (sec): 0.38507 Determinant of a 650x650 random matrix__ (sec): 0.091612 Cholesky decomposition of a 900x900 matrix__ (sec): 0.11059 Inverse of a 400x400 random matrix__ (sec): 0.069414 -- Trimmed geom. mean (2 extremes eliminated): 0.13556 III. Programmation -- 750,000 Fibonacci numbers calculation (vector calc)_ (sec): 1.2386 Creation of a 2250x2250 Hilbert matrix (matrix calc) (sec): 3.0541 Grand common divisors of 70,000 pairs (recursion)___ (sec): 1.7637 Creation of a 220x220 Toeplitz matrix (loops)___ (sec): 0.0045972 Escoufier's method on a 37x37 matrix
Re: [R] Timings of function execution in R [was Re: R in Industry]
Hi, greaterOf is indeed an interesting function. It is much faster than the equivalent R function, pmax, because pmax does a lot of checking for missing data and for recycling. Tom Lumley suggested a simple function to replace pmax, without these checks, that is analogous to greaterOf, which I call fast.pmax. fast.pmax - function(x,y) {i- xy; x[i]-y[i]; x} Interestingly, greaterOf is even faster than fast.pmax, although you have to be dealing with very large vectors (O(10^6)) to see any real difference. n - 200 x1 - runif(n) x2 - rnorm(n) system.time( ans1 - greaterOf(x1,x2) ) [1] 0.17 0.06 0.23 NA NA system.time( ans2 - pmax(x1,x2) ) [1] 0.72 0.19 0.94 NA NA system.time( ans3 - fast.pmax(x1,x2) ) [1] 0.29 0.05 0.35 NA NA all.equal(ans1,ans2,ans3) [1] TRUE Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Bates Sent: Thursday, February 08, 2007 6:00 PM To: R-Help Subject: [R] Timings of function execution in R [was Re: R in Industry] On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner) [EMAIL PROTECTED] wrote: Dear all, Thanks a lot for your comments. I very well agree with you that writing efficient code is about optimisation. The most important rules I know would be: - vectorization - pre-definition of vectors, etc. - use matrix instead of data.frame - do not use named objects - use pure matrix instead of involved S4 (perhaps also S3) objects (can have enormous effects) - use function instead of expression - use compiled code - I guess indexing with numbers (better variables) is also much faster than with text (names) (see also above) - I even made, for example, my own min, max, since they are slow, e.g., greaterOf - function(x, y){ # Returns for each element of x and y (numeric) # x or y may be a multiple of the other z - x y z*x + (!z)*y That's an interesting function. I initially was tempted to respond that you have managed to reinvent a specialized form of the ifelse function but then I decided to do the timings just to check (always a good idea). The enclosed timings show that your function is indeed faster than a call to ifelse. A couple of comments: - I needed to make the number of components in the vectors x and y quite large before I could get reliable timings on the system I am using. - The recommended way of doing timings is with system.time function, which makes an effort to minimize the effects of garbage collection on the timings. - Even when using system.time there is often a big difference in timing between the first execution of a function call that generates a large object and subsequent executions of the same function call. [additional parts of the original message not relevant to this discussion have been removed] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Thomas, you are absolutely correct. Here are some results comparing Jeff's original implementation, my suggestion using outer and pmax, and your clever trick using fast.pmax. Your fast.pmax function is more than 3 times faster than pmax. Thanks for this wonderful insight. Best, Ravi. x - rnorm(2000) y - runif(500) nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 # Here is the straightforward way system.time( + for(i in 1:nx) { + sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) + sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) + } + ) [1] 3.83 0.00 3.83 NA NA # Here is a faster way using outer and pmax system.time({ + sum11 - sum(outer(x,x,FUN=pmax)) + sum33 - sum(outer(x,y,FUN=pmax)) + }) [1] 2.55 0.48 3.04 NA NA # Here is an even faster method using Tom Lumley's suggestion: fast.pmax - function(x,y) {i- xy; x[i]-y[i]; x} system.time({ + sum111 - sum(outer(x,x,FUN=fast.pmax)) + sum333 - sum(outer(x,y,FUN=fast.pmax)) + }) [1] 0.78 0.08 0.86 NA NA all.equal(sum1,sum11,sum111) [1] TRUE all.equal(sum3,sum33,sum333) [1] TRUE --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 9:06 AM To: Ravi Varadhan Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) On Thu, 1 Feb 2007, Ravi Varadhan wrote: Jeff, Here is something which is a little faster: sum1 - sum(outer(x, x, FUN=pmax)) sum3 - sum(outer(x, y, FUN=pmax)) This is the sort of problem where profiling can be useful. My experience with pmax() is that it is surprisingly slow, presumably because it handles recycling and NAs In the example I profiled (an MCMC calculation) it was measurably faster to use function(x,y) {i- xy; x[i]-y[i]; x} -thomas Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey Racine Sent: Thursday, February 01, 2007 1:18 PM To: r-help@stat.math.ethz.ch Subject: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas
[R] Need help writing a faster code
Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help writing a faster code
Thank you, Dimitris and Robin. Dimitris - your solution(s) works very well. Although my g function is a lot more complicated than that in the simple example that I gave, I think that I can use your idea of taking the whole matrix inside the function and working directly with it. Robin - using two applys doesn't make the code any faster, it just produces a compact one-liner. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Dimitris Rizopoulos [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 10:33 AM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Need help writing a faster code the following seems to be a first improvement: m - 2000 n - 5000 A - matrix(rnorm(2*m), ncol=2) B - matrix(rnorm(2*n), ncol=2) W1 - W2 - matrix(0, m, n) ## ## g1 - function(x, y){ theta - atan((y[2] - x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } invisible({gc(); gc()}) system.time(for (i in 1:m) { W1[i, ] - apply(B, 1, y = A[i,], function(x, y) g1(y, x)) }) ## g2 - function(x){ out - tB - x theta - atan(out[2, ] / out[1, ]) theta + 2*pi*(theta 0) } tB - t(B) invisible({gc(); gc()}) system.time(for (i in 1:m) { W2[i, ] - g2(A[i, ]) }) ## or invisible({gc(); gc()}) system.time(W3 - t(apply(A, 1, g2))) all.equal(W1, W2) all.equal(W1, W3) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 4:10 PM Subject: [R] Need help writing a faster code Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help writing a faster code
Dear Dimitris, I implemented your solution on my actual problem. I was able to generate my large transition matrix in 56 seconds, compared to the previous time of around 27 minutes. Wow!!! I thank you very much for the help. R and the R-user group are truly amazing! Best regards, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Dimitris Rizopoulos [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 10:33 AM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Need help writing a faster code the following seems to be a first improvement: m - 2000 n - 5000 A - matrix(rnorm(2*m), ncol=2) B - matrix(rnorm(2*n), ncol=2) W1 - W2 - matrix(0, m, n) ## ## g1 - function(x, y){ theta - atan((y[2] - x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } invisible({gc(); gc()}) system.time(for (i in 1:m) { W1[i, ] - apply(B, 1, y = A[i,], function(x, y) g1(y, x)) }) ## g2 - function(x){ out - tB - x theta - atan(out[2, ] / out[1, ]) theta + 2*pi*(theta 0) } tB - t(B) invisible({gc(); gc()}) system.time(for (i in 1:m) { W2[i, ] - g2(A[i, ]) }) ## or invisible({gc(); gc()}) system.time(W3 - t(apply(A, 1, g2))) all.equal(W1, W2) all.equal(W1, W3) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 4:10 PM Subject: [R] Need help writing a faster code Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Jeff, Here is something which is a little faster: sum1 - sum(outer(x, x, FUN=pmax)) sum3 - sum(outer(x, y, FUN=pmax)) Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey Racine Sent: Thursday, February 01, 2007 1:18 PM To: r-help@stat.math.ethz.ch Subject: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looking for a faster code
Hi, I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } I have tried the following and it works okay, but I am sure that I can do even better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner (for example, I feel that I should be able to use outer)? Thanks for any suggestions. Best, Ravi. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood estimation of 5 parameters
Franco, Is it possible that you have failed to provide the negative of loglikelihood to optim, since optim, by default, minimizes a function? If you want to do this withput redefining the log-likelihood, you should set fnscale= -1 (as hinted by Prof. Ripley). This would turn the problem into a maximization problem. If this doesn't work, you should provide more details (a reproducible code with actual error message). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of francogrex Sent: Friday, January 05, 2007 10:42 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] maximum likelihood estimation of 5 parameters Franco, You can provide lower and upper bounds on the parameters if you use optim with method=L-BFGS-B. Hth, Ingmar Thanks, but when I use L-BFGS-B it tells me that there is an error in optim(start, f, method = method, hessian = TRUE, ...) : L-BFGS-B needs finite values of 'fn' -- View this message in context: http://www.nabble.com/maximum-likelihood-estimation-of-5-parameters-tf292536 4.html#a8180120 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] na.action and simultaneous regressions
No, Bert, lm doesn't produce a list each of whose components is a separate fit using all the nonmissing data in the column. It is true that the regressions are independently performed, but when the response matrix is passed from lm on to lm.fit, only the complete rows are passed, i.e. rows with no missing values. I looked at lm function, but it was not obvious to me how to fix it. In the following toy example, the degrees of freedom for y1 regression should be 18 and that for y2 should be 15, but both degrees of freedom are only 15. y1 - runif(20) y2 - c(runif(17), rep(NA,3)) x - rnorm(20) summary(lm(cbind(y1,y2) ~ x)) Response y1 : Call: lm(formula = y1 ~ x) Residuals: Min 1Q Median 3Q Max -0.52592 -0.22632 -0.00964 0.25117 0.31227 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.569890.06902 8.257 5.82e-07 *** x -0.123250.06516 -1.8910.078 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.2798 on 15 degrees of freedom Multiple R-Squared: 0.1926, Adjusted R-squared: 0.1387 F-statistic: 3.577 on 1 and 15 DF, p-value: 0.07804 Response y2 : Call: lm(formula = y2 ~ x) Residuals: Min 1Q Median 3Q Max -0.48880 -0.28552 -0.06022 0.23167 0.54425 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.437120.07686 5.687 4.31e-05 *** x0.102780.07257 1.4160.177 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3115 on 15 degrees of freedom Multiple R-Squared: 0.118, Adjusted R-squared: 0.05915 F-statistic: 2.006 on 1 and 15 DF, p-value: 0.1771 Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bert Gunter Sent: Wednesday, January 03, 2007 4:46 PM To: 'Talbot Katz'; r-help@stat.math.ethz.ch Subject: Re: [R] na.action and simultaneous regressions As the Help page says: If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix So there's nothing hidden going on behind the scenes, and apply(cbind(y1,y2),2,function(z)lm(z~x)) (or an explicit loop, of course) will produce a list each of whose components is a separate fit using all the nonmissing data in the column. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz Sent: Wednesday, January 03, 2007 11:56 AM To: r-help@stat.math.ethz.ch Subject: [R] na.action and simultaneous regressions Hi. I am running regressions of several dependent variables using the same set of independent variables. The independent variable values are complete, but each dependent variable has some missing values for some observations; by default, lm(y1~x) will carry out the regressions using only the observations without missing values of y1. If I do lm(cbind(y1,y2)~x), the default will be to use only the observations for which neither y1 nor y2 is missing. I'd like to have the regression for each separate dependent variable use all the non-missing cases for that variable. I would think that there should be a way to do that using the na.action option, but I haven't seen this in the documentation or figured out how to do it on my own. Can it be done this way, or do I have to code the regressions in a loop? (By the way, since it restricts to non-missing values in all the variables simultaneously, is this because it's doing some sort of SUR or other simultaneous equation estimation behind the scenes?) Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting
Re: [R] na.action and simultaneous regressions
Sorry, Bert. I didn't notice your use of apply, which will indeed give you separate regression results using all available data. But I was wondering, if there was a way to modify lm to be able to accomplish this, since it is doing separate regressions anyway. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bert Gunter Sent: Wednesday, January 03, 2007 4:46 PM To: 'Talbot Katz'; r-help@stat.math.ethz.ch Subject: Re: [R] na.action and simultaneous regressions As the Help page says: If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix So there's nothing hidden going on behind the scenes, and apply(cbind(y1,y2),2,function(z)lm(z~x)) (or an explicit loop, of course) will produce a list each of whose components is a separate fit using all the nonmissing data in the column. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz Sent: Wednesday, January 03, 2007 11:56 AM To: r-help@stat.math.ethz.ch Subject: [R] na.action and simultaneous regressions Hi. I am running regressions of several dependent variables using the same set of independent variables. The independent variable values are complete, but each dependent variable has some missing values for some observations; by default, lm(y1~x) will carry out the regressions using only the observations without missing values of y1. If I do lm(cbind(y1,y2)~x), the default will be to use only the observations for which neither y1 nor y2 is missing. I'd like to have the regression for each separate dependent variable use all the non-missing cases for that variable. I would think that there should be a way to do that using the na.action option, but I haven't seen this in the documentation or figured out how to do it on my own. Can it be done this way, or do I have to code the regressions in a loop? (By the way, since it restricts to non-missing values in all the variables simultaneously, is this because it's doing some sort of SUR or other simultaneous equation estimation behind the scenes?) Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RuleFit quantreg: partial dependence plots; showing an effect
Dear Roger, Is it possible to combine the two ideas that you mentioned: (1) algorithmic approaches of Breiman, Friedman, and others that achieve flexibility in the predictor space, and (2) robust and flexible regression like QR that achieve flexibility in the response space, so as to achieve complete flexibility? If it is possible, are you or anyone else in the R community working on this? Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of roger koenker Sent: Wednesday, December 20, 2006 8:57 AM To: Mark Difford Cc: R-help list Subject: Re: [R] RuleFit quantreg: partial dependence plots; showing an effect They are entirely different: Rulefit is a fiendishly clever combination of decision tree formulation of models and L1-regularization intended to select parsimonious fits to very complicated responses yielding e.g. piecewise constant functions. Rulefit estimates the conditional mean of the response over the covariate space, but permits a very flexible, but linear in parameters specifications of the covariate effects on the conditional mean. The quantile regression plotting you refer to adopts a fixed, linear specification for conditional quantile functions and given that specification depicts how the covariates influence the various conditional quantiles of the response. Thus, roughly speaking, Rulefit is focused on flexibility in the x-space, maintaining the classical conditional mean objective; while QR is trying to be more flexible in the y-direction, and maintaining a fixed, linear in parameters specification for the covariate effects at each quantile. url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Dec 20, 2006, at 4:17 AM, Mark Difford wrote: Dear List, I would greatly appreciate help on the following matter: The RuleFit program of Professor Friedman uses partial dependence plots to explore the effect of an explanatory variable on the response variable, after accounting for the average effects of the other variables. The plot method [plot(summary(rq(y ~ x1 + x2, t=seq(.1,.9,.05] of Professor Koenker's quantreg program appears to do the same thing. Question: Is there a difference between these two types of plot in the manner in which they depict the relationship between explanatory variables and the response variable ? Thank you inav for your help. Regards, Mark Difford. - Mark Difford Ph.D. candidate, Botany Department, Nelson Mandela Metropolitan University, Port Elizabeth, SA. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RuleFit quantreg: partial dependence plots; showing an effect
Thanks, Roger. These should be very useful tools. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: roger koenker [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 20, 2006 10:59 AM To: Ravi Varadhan Cc: 'Mark Difford'; 'R-help list' Subject: Re: [R] RuleFit quantreg: partial dependence plots; showing an effect On Dec 20, 2006, at 8:43 AM, Ravi Varadhan wrote: Dear Roger, Is it possible to combine the two ideas that you mentioned: (1) algorithmic approaches of Breiman, Friedman, and others that achieve flexibility in the predictor space, and (2) robust and flexible regression like QR that achieve flexibility in the response space, so as to achieve complete flexibility? If it is possible, are you or anyone else in the R community working on this? There are some tentative steps in this direction. One is the rqss() fitting in my quantreg package which does QR fitting with additive models using total variation as a roughness penalty for nonlinear terms. Another, along more tree structured lines, is Nicolai Meinshausen's quantregforest package. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of roger koenker Sent: Wednesday, December 20, 2006 8:57 AM To: Mark Difford Cc: R-help list Subject: Re: [R] RuleFit quantreg: partial dependence plots; showing an effect They are entirely different: Rulefit is a fiendishly clever combination of decision tree formulation of models and L1-regularization intended to select parsimonious fits to very complicated responses yielding e.g. piecewise constant functions. Rulefit estimates the conditional mean of the response over the covariate space, but permits a very flexible, but linear in parameters specifications of the covariate effects on the conditional mean. The quantile regression plotting you refer to adopts a fixed, linear specification for conditional quantile functions and given that specification depicts how the covariates influence the various conditional quantiles of the response. Thus, roughly speaking, Rulefit is focused on flexibility in the x-space, maintaining the classical conditional mean objective; while QR is trying to be more flexible in the y-direction, and maintaining a fixed, linear in parameters specification for the covariate effects at each quantile. url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Dec 20, 2006, at 4:17 AM, Mark Difford wrote: Dear List, I would greatly appreciate help on the following matter: The RuleFit program of Professor Friedman uses partial dependence plots to explore the effect of an explanatory variable on the response variable, after accounting for the average effects of the other variables. The plot method [plot(summary(rq(y ~ x1 + x2, t=seq(.1,.9,.05] of Professor Koenker's quantreg program appears to do the same thing. Question: Is there a difference between these two types of plot in the manner in which they depict the relationship between explanatory variables and the response variable ? Thank you inav for your help. Regards, Mark Difford. - Mark Difford Ph.D. candidate, Botany Department, Nelson Mandela Metropolitan University, Port Elizabeth, SA. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R
Re: [R] any way to make the code more efficient ?
Using rbind almost always slows things down significantly. You should define the objects aggfxdata and fullaggfxdata before the loop and then assign appropriate values to the corresponding rows and/or columns. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED) Sent: Friday, December 08, 2006 4:17 PM To: r-help@stat.math.ethz.ch Subject: [R] any way to make the code more efficient ? The code bekow works so this is why I didn't include the data to reproduce it. The loops about 500 times and each time, a zoo object with 1400 rows and 4 columns gets created. ( the rows represent minutes so each file is one day worth of data). Inside the loop, I keep rbinding the newly created zoo object to the current zoo object so that it gets bigger and bigger over time. Eventually, the new zoo object, fullaggfxdata, containing all the days of data is created. I was just wondering if there is a more efficient way of doing this. I do know the number of times the loop will be done at the beginning so maybe creating the a matrix or data frame at the beginning and putting the daily ones in something like that would Make it be faster. But, the proboem with this is I eventually do need a zoo object. I ask this question because at around the 250 mark of the loop, things start to slow down significiantly and I think I remember reading somewhere that doing an rbind of something to itself is not a good idea. Thanks. #=== === start-1 for (filecounter in (1:length(datafilenames))) { print(paste(File Counter = , filecounter)) datafile= paste(datadir,/,datafilenames[filecounter],sep=) aggfxdata-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=aggminutes, fillholes=1) logbidask-log(aggfxdata[,bidask]) aggfxdata-cbind(aggfxdata,logbidask) if ( start == 1 ) { fullaggfxdata-aggfxdata start-0 } else { fullaggfxdata-rbind(fullaggfxdata,aggfxdata) } } #=== == This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to solve differential equations with a delay (time lag)?
Hi, I would like to solve a system of coupled ordinary differential equations, where there is a delay (time lag) term. I would like to use the lsoda function odesolve package. However, I am not sure how to specify the delay term using the syntax allowed by odesolve. Here is an example of the kind of problem that I am trying to solve: library(odesolve) yprime - function(t, y, p) { # this function yd1 - p[k1] *(t = p[T]) - p[k2] * y[2] yd2 - p[k3] * y[1](t - p[delay]) - p[k4] * y[2] # this is not syntactically valid, but it is what I would like to do list(c(yd1,yd2)) } times - seq(0,30,by=0.1) y0 - c(0,0) parms - c(k1=0.7, k2=0.5, k3=0.2, k4=0.8, T=10, delay=5) Is there a way to incorporate delay in odesolve? Any hints would be much appreciated. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to solve differential equations with a delay (time lag)?
Hi, I would like to add an additional piece of information to my previous posting: y[1](t) = 0, for all t in the interval (-delay, 0) This completes the specification of the problem. Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Wednesday, November 29, 2006 4:45 PM To: r-help@stat.math.ethz.ch Subject: [R] How to solve differential equations with a delay (time lag)? Hi, I would like to solve a system of coupled ordinary differential equations, where there is a delay (time lag) term. I would like to use the lsoda function odesolve package. However, I am not sure how to specify the delay term using the syntax allowed by odesolve. Here is an example of the kind of problem that I am trying to solve: library(odesolve) yprime - function(t, y, p) { # this function yd1 - p[k1] *(t = p[T]) - p[k2] * y[2] yd2 - p[k3] * y[1](t - p[delay]) - p[k4] * y[2] # this is not syntactically valid, but it is what I would like to do list(c(yd1,yd2)) } times - seq(0,30,by=0.1) y0 - c(0,0) parms - c(k1=0.7, k2=0.5, k3=0.2, k4=0.8, T=10, delay=5) Is there a way to incorporate delay in odesolve? Any hints would be much appreciated. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to solve differential equations with a delay (time lag)?
Thanks, Woodrow. I also found a DDE solver called dde23 in Matlab written by L.F. Shampine. I will see if I can use it in Scilab, since I don't have Matlab. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 29, 2006 5:26 PM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch Subject: Re: [R] How to solve differential equations with a delay (time lag)? lsoda does not solve delay differential equations, which is what you need to be able to do. A quick search on netlib (netlib.org) turned up one match for delay differential equations: ode/ddverk.f, which might help (though not in R). R. Woodrow Setzer, Ph. D. National Center for Computational Toxicology US Environmental Protection Agency Mail Drop B205-01/US EPA/RTP, NC 27711 Ph: (919) 541-0128Fax: (919) 541-1194 Ravi Varadhan [EMAIL PROTECTED] eduTo Sent by: r-help@stat.math.ethz.ch [EMAIL PROTECTED]cc tat.math.ethz.ch Subject [R] How to solve differential 11/29/2006 04:44 equations with a delay (time PM lag)? Hi, I would like to solve a system of coupled ordinary differential equations, where there is a delay (time lag) term. I would like to use the lsoda function odesolve package. However, I am not sure how to specify the delay term using the syntax allowed by odesolve. Here is an example of the kind of problem that I am trying to solve: library(odesolve) yprime - function(t, y, p) { # this function yd1 - p[k1] *(t = p[T]) - p[k2] * y[2] yd2 - p[k3] * y[1](t - p[delay]) - p[k4] * y[2] # this is not syntactically valid, but it is what I would like to do list(c(yd1,yd2)) } times - seq(0,30,by=0.1) y0 - c(0,0) parms - c(k1=0.7, k2=0.5, k3=0.2, k4=0.8, T=10, delay=5) Is there a way to incorporate delay in odesolve? Any hints would be much appreciated. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix-vector multiplication without loops
Hi, I am trying to do the following computation: p - rep(0, n) coef - runif(K+1) U - matrix(runif(n*(2*K+1)), n, 2*K+1) for (i in 0:K){ for (j in 0:K){ p - p + coef[i+1]* coef[j+1] * U[,i+j+1] } } I would appreciate any suggestions on how to perform this computation efficiently without the for loops? Thank you, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix-vector multiplication without loops
In my case, I have n K so the loop solution is faster than the solutions proposed by Christos and Dimitris. In any case, I would like to thank Christos and Dimitris for their solutions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Christos Hatzis [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 14, 2006 2:49 PM To: 'Dimitris Rizopoulos'; 'Ravi Varadhan' Cc: r-help@stat.math.ethz.ch Subject: RE: [R] Matrix-vector multiplication without loops Ravi, Here is another version, somewhat similar to what Dimitris proposed but without using 'rep'. For large n, K 'rep' tries allocating two vectors, each of length n*(K+1)^2, which can be a problem. In this version 'outer' buys you some efficiency and compactness, but your looping version is faster. n - 1000 K - 1000 p - rep(0, n) cf - runif(K+1) U - matrix(runif(n*(2*K+1)), n, 2*K+1) # original code system.time( { for (i in 0:K) for (j in 0:K) p - p + cf[i+1]* cf[j+1] * U[,i+j+1] p.1 - p } ) # 'vectorized' system.time( { ind - sapply(1:(K+1), seq, length = K+1) cc - outer(cf,cf) p.2 - apply(U, 1, FUN=function(u) sum(cc * u[ind])) } ) all.equal(p.1, p.2) rm(n,K,p,U,cf,cc,ind,p.1,p.2) -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dimitris Rizopoulos Sent: Tuesday, November 14, 2006 11:20 AM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Matrix-vector multiplication without loops I think you need something along these lines: ind - c(sapply(1:(K+1), seq, length = K + 1)) cf1 - rep(rep(coef, each = n), K + 1) cf2 - rep(rep(coef, each = n), each = K + 1) rowSums(cf1 * cf2 * U[, ind]) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Tuesday, November 14, 2006 4:45 PM Subject: [R] Matrix-vector multiplication without loops Hi, I am trying to do the following computation: p - rep(0, n) coef - runif(K+1) U - matrix(runif(n*(2*K+1)), n, 2*K+1) for (i in 0:K){ for (j in 0:K){ p - p + coef[i+1]* coef[j+1] * U[,i+j+1] } } I would appreciate any suggestions on how to perform this computation efficiently without the for loops? Thank you, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Fortran 9x -- advice
Metcalf and Reid - FORTRAN 90/95 Explained Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christos Hatzis Sent: Monday, November 13, 2006 2:05 PM To: 'Mike Prager'; r-help@stat.math.ethz.ch Subject: Re: [R] R and Fortran 9x -- advice Mike, Can you recommend any good books on Fortran 90/95? I had been an old user of Fortran 77 but haven't followed the developments in the last 15 years or so... Thanks. Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Prager Sent: Monday, November 13, 2006 1:00 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] R and Fortran 9x -- advice Tamas K Papp [EMAIL PROTECTED] wrote: I found some bottlenecks in my R code with Rprof. First I wanted to rewrite them in C, but a colleague keeps suggesting that I learn Fortran, so maybe this is the time to do it... 1) I hear bad things about Fortran. Sure, F77 looks archaic, but F90/95 seems nicer. Is it worth learning, especially when I plan to use it mainly from R? Dusting off my C knowledge would take a bit of time too, and I never liked C that much. I'll answer this from the perspective of someone who uses Fortran 95 regularly. It is a modern language, far more reliable and flexible than Fortran 77 and quite well suited to most scientific problems. I do think it's worth learning, particularly if C is not to your taste. Two free compilers for Fortran 95 are available. It seems that g95 is complete, while gfortran is nearing completion. There are also several high-quality commercial compilers, some of which are free under certain operating systems and/or conditions and others of which (Lahey) are quite inexpensive if one is willing to work from the command line or one's own editor. I can't address questions of R interoperability -- not something I've done. -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Numerical Integration
Here is a simple example using the bkde (binned kernel density estimator) from the KernSmooth package that shows how to use the trapezoidal integration: library(KernSmooth) data(geyser, package=MASS) x - geyser$duration est - bkde(x, grid = 101, bandwidth=0.25) trap.rule(est$x,est$y) [1] 0.61 The answer from the simple trapezoidal rule is pretty good with an error of less than 1.e-04 (number of grid points used is 101, with equal spacing). You could use higher order rules (e.g. Simpson's), but you can get reasonable accuracy with Trapezoidal rule by just increasing the number of grid points. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Xiaofan Cao [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 08, 2006 4:21 PM To: Ravi Varadhan Cc: 'Doran, Harold'; r-help@stat.math.ethz.ch Subject: RE: [R] Numerical Integration Hi Ravi and Harold, Thanks for the input. I'm using trapezoidal rule and like to know if there's other alternatives. This f(x) is the kernel density estimator and thus we can get an estimate of f(x) at any given x in theory. Thanks again, Martha On Wed, 8 Nov 2006, Ravi Varadhan wrote: Can you get an estimate of f(x) at any given x? If so, the Gaussian quadrature methods will work, but not otherwise since f(x) must be known at all the nodes. A rough approximation to the integral can be obtained using the trapezoidal rule. Here is a simple function to do that: trap.rule - function(x,y) sum(diff(x)*(y[-1]+y[-length(y)]))/2 However, the use of the word knots seems to indicate that some sort of spline is being fit to the data. Martha - can you provide more information about your function f(x)? Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold Sent: Wednesday, November 08, 2006 1:04 PM To: Xiaofan Cao; r-help@stat.math.ethz.ch Subject: Re: [R] Numerical Integration You might try the statmod package which provides nodes and weights for gaussian quadrature. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Xiaofan Cao Sent: Wednesday, November 08, 2006 12:43 PM To: r-help@stat.math.ethz.ch Subject: [R] Numerical Integration Hi everyone, I'm trying to integrate f(x) over x where f(x) does not have a close form but only numerical values at certurn knots of x. Is there a way that I can use any generical R function (such as integrate) or any package to do so? Thanks! I appreciate your time. Best Regards, Martha Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] solve computationally singular
For heaven's sake, please stop sending repeat emails and send your R code that can reproduce the error. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of xchen Sent: Tuesday, November 07, 2006 2:46 PM To: r-help@stat.math.ethz.ch Subject: [R] solve computationally singular Hi uRsers, when inverting a 2 by 2 matrix using solve, I encountered a error message: solve.default(sigma, tol = 1e-07) : system is computationally singular: reciprocal condition number = 1.7671e-017 and then I test the determinant of this matrix: 6.341393e-06. In my program, I have a condition block that whether a matrix is invertible like this: if(det(sigma)1e-7) return NULL; but this seems doesnot work to prevent the singularity when inverting a matrix. I am some confused about the relationship between reciprocal condition number and determinant. Can anybody give me some idea how to prevent this situation? Thanks a lot! Xiaohui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ridge for logistic regression
Also check out the package glmpath which can incorporate both ridge (L2) and lasso (L1) type penalties. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Monday, November 06, 2006 7:41 AM To: Prof Brian Ripley Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Ridge for logistic regression Prof Brian Ripley wrote: On Fri, 3 Nov 2006, Zheng Yuan wrote: Dear all experts, Does anyone know if there is a R function which can perform Ridge regression for logistic models? multinom and nnet in the package of that name. You can also use the lrm function in the Design package, which makes it relatively easy to specify penalization patterns by type of model term (e.g., to obtain more penalization for interaction or nonlinear terms). -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Markov blanket
Hi, Is there an R implementation of the HITON Markov blanket discovery algorithm that is referenced for example in the following paper? http://citeseer.ist.psu.edu/cache/papers/cs/27027/http:zSzzSzdiscover1.mc.va nderbilt.eduzSzdiscoverzSzpubliczSzPublicationszSzHITON.pdf/aliferis03hiton. pdf I would also appreciate tips to any related algorithms/methods that are implemented in R. Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hessian matrix
The function hessian in the numDeriv package can be useful. It is based on Richardson extrapolation, and although it involves more computational effort (depending on the order of the extrapolation), it will provide highly accurate values. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: Wednesday, November 01, 2006 3:13 AM To: Arun Kumar Saha Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Hessian matrix ?deriv can do so algebraically for some functions. There are various packages which can do so via finite differences, e.g. fdHess() in nlme and hessian() in numDeriv. On Wed, 1 Nov 2006, Arun Kumar Saha wrote: Dear all R users, Is there any way to calculate hessian matrix of a given function at any given point? Regards [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Please do: no HTML mail as requested. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate regression
Thank you Robin (also thanks to Andris and Bernhard). I did get answers to my problem using lm as you had suggested. But my main concern is about getting the appropriate standard errors for \hat{beta}. My response is actually preference data, where each individual ranks a list of k items, assigning them unique ranks from 1 to k. Since the ranks for each item are negatively-correlated within an individual, I would like to take this into consideration. Although lm gives me correct parameter estimates, I think that the standard errors are overestimated. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin Sent: Monday, October 30, 2006 4:17 AM To: Andris Jankevics Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multivariate regression Hi I discovered the other day that lm() does some of the work for you: library(mvtnorm) X - matrix(rnorm(60),ncol=3) beta - matrix(1:6,ncol=2) sig - matrix(c(1,0.7,0.7,1),2,2) Y - X %*% beta + rmvnorm(n=20,sigma=sig) lm(Y ~ X-1) Call: lm(formula = Y ~ X - 1) Coefficients: [,1] [,2] X1 1.015 4.065 X2 2.483 5.366 X3 2.762 5.727 This gives an estimate for beta. But I don't know of a ready-made R solution for estimating the covariance of the elements of beta, or the sig matrix for the covariance matrix of the observation errors. Anyone? On 30 Oct 2006, at 09:01, Andris Jankevics wrote: Also you can take a look on Partial Least Squares (PLS) regression. http://www.statsoft.com/textbook/stpls.html R-package: http://mevik.net/work/software/pls.html Andris Jankevics On Sestdiena, 28. Oktobris 2006 06:04, Ritwik Sinha wrote: You can use gee ( http://finzi.psych.upenn.edu/R/library/geepack/html/00Index.html) or maybe the function gls in nlme. Ritwik. On 10/27/06, Ravi Varadhan [EMAIL PROTECTED] wrote: Hi, Suppose I have a multivariate response Y (n x k) obtained at a set of predictors X (n x p). I would like to perform a linear regression taking into consideration the covariance structure of Y within each unit - this would be represented by a specified matrix V (k x k), assumed to be the same across units. How do I use lm to do this? One approach that I was thinking of is as follows: Flatten Y to a vector, say, Yvec (n*k x 1). Create Xvec (n*k, p*k) such that it is made up of block matrices Bij (k x k), where Bij is a diagonal matrix with X_ij as the diagonal (i = 1,.n, and j = 1,.,p). Now I can use lm in a univariate mode to regress Yvec against Xvec, with covariance matrix Vvec (n*k x n*k). Vvec is a block-diagonal matrix with blocks of V along the diagonal. This seems like a valid approach, but I still don't know how to specify the covariance structure to do weighted least squares. Any help is appreciated. Best, Ravi. - --- --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html - --- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help
[R] Multivariate regression
Hi, Suppose I have a multivariate response Y (n x k) obtained at a set of predictors X (n x p). I would like to perform a linear regression taking into consideration the covariance structure of Y within each unit - this would be represented by a specified matrix V (k x k), assumed to be the same across units. How do I use lm to do this? One approach that I was thinking of is as follows: Flatten Y to a vector, say, Yvec (n*k x 1). Create Xvec (n*k, p*k) such that it is made up of block matrices Bij (k x k), where Bij is a diagonal matrix with X_ij as the diagonal (i = 1,.n, and j = 1,.,p). Now I can use lm in a univariate mode to regress Yvec against Xvec, with covariance matrix Vvec (n*k x n*k). Vvec is a block-diagonal matrix with blocks of V along the diagonal. This seems like a valid approach, but I still don't know how to specify the covariance structure to do weighted least squares. Any help is appreciated. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R gams
Can you be more specific about what you mean by gams? Do you mean generalized additive models (GAM)? If so, R is a good environment for forecasting models and GAM. However, the link that you provided is NOT for generalized additive modeling, but it is for General Algebraic Modeling System (GAMS), which is a high-level modeling system for mathematical programming and optimization. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of vittorio Sent: Friday, October 27, 2006 3:14 PM To: r-help@stat.math.ethz.ch Subject: [R] R gams At office I have been introduced by another company to new, complex energy forecasting models using gams as the basic software. I have been told by the company offering the models that gams is specialised in dealing with huge, hevy-weight linear and non-linear modelling (see an example in http://www.gams.com/modtype/index.htm) and they say it is almost the only option for doing it. I would like to know your opinion on the subject and, above all, if R can be an effective alternative and to what extent, if any. Thanks Vittorio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.