>>>>> "RV" == RAVI VARADHAN <[EMAIL PROTECTED]> >>>>> on Sat, 07 Apr 2007 18:39:36 -0400 writes: ^^^^^^^^^^^^^^^^
this is a bit a late reply... better late than never RV> Hi Martin, Hi Ravi, thanks for your research. RV> I played a bit with rankMat. I will first state the RV> conclusions of my numerical experiments and then present RV> the results to support them: RV> 1. I don't believe that rankMat, or equivalently RV> Matlab's rank computation, is necessarily better than RV> qr(.)$rank or (qr., LAPACK=TRUE)$rank. In fact, for the RV> notorious Hilbert matrix, rankMat can give poor RV> estimates of rank. RV> 2. There exists no universally optimal (i.e. optimal RV> for all A) tol in qr(A, tol)$rank that would be RV> optimally close to rankMat. The discrepancy in the RV> ranks computed by qr(A)$rank and rankMat(A) obviously RV> depends on the matrix A. This is evident from the tol RV> used in rankMat: RV> tol <- max(d) * .Machine$double.eps * abs(singValA[1]) RV> So, this value of tol in qr will also minimize the discrepancy. RV> 3. The tol parameter is irrelevant in qr(A, tol, RV> LAPACK=TRUE)$rank, i.e. LAPACK doesn't seem to utilize RV> the tol parameter when computing the rank. However, RV> qr(A, tol, LAPACK=FALSE)$rank does depend on tol. Yes, indeed! The help page of qr() actually says so {at least now, don't know about 3 months ago}. RV> Now, here are the results: RV> 1. >> hilbert.rank <- matrix(NA,20,3) >> hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") } >> for (i in 1:20) { RV> + himat <- hilbert(i) RV> + hilbert.rank[i,1] <- rankMat(himat) RV> + hilbert.rank[i,2] <- qr(himat,tol=1.0e-14)$rank RV> + hilbert.rank[i,3] <- qr(himat, tol = .Machine$double.eps, LAPACK = TRUE)$rank RV> + } >> hilbert.rank RV> [,1] [,2] [,3] RV> [1,] 1 1 1 RV> [2,] 2 2 2 RV> [3,] 3 3 3 RV> [4,] 4 4 4 RV> [5,] 5 5 5 RV> [6,] 6 6 6 RV> [7,] 7 7 7 RV> [8,] 8 8 8 RV> [9,] 9 9 9 RV> [10,] 10 10 10 RV> [11,] 10 11 11 RV> [12,] 11 12 12 RV> [13,] 11 12 13 RV> [14,] 11 13 14 RV> [15,] 12 13 15 RV> [16,] 12 15 16 RV> [17,] 12 16 17 RV> [18,] 12 16 18 RV> [19,] 13 17 19 RV> [20,] 13 17 20 RV> We see that rankMat underestimates the rank for n > 10, but qr(himat, tol = .Machine$double.eps, LAPACK=TRUE)$rank gets it right. Yes, indeed; and that's seems a bit against the ``common knowledge'' that svd() is more reliable than qr() Hmm.... I'm still baffled a bit.. Note that with the Hilbert matrices, one might argue that hilbert(20) might really not have a correct "estimated rank" of 20, but at least for hilbert(13) or so, the rank should be == n. BTW, there's a nice plot, slightly related to this problem, using rcond() from the Matrix package : library(Matrix) hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") } rcHilb <- sapply(1:20, function(n) rcond(Matrix(hilbert(n)))) plot(rcHilb, xlab = "n", log = "y", col = 2, type = "b", main = "reciprocal condition numbers of Hilbert(n)") where one sees that the LAPACK code that underlies Matrix::rcond() also gets into "problem" when estimating the condition number for hilbert(n) when n >~ 16 . I think if we wanted to make real progress here, we'd have to consult with numerical analyist specialists. But for me the topic is too remote to be followed up further at the moment. One conclusion for me is that to estimate the rank of a matrix in current versions of R, one should use rankMat <- function(x) qr(x, LAPACK = TRUE)$rank (as was suggested as one possibility in the original thread) Regards, and thank you again, Ravi. Martin Maechler, ETH Zurich RV> 2. RV> Here I first, created a random rectangular matrix, and then added a number of rows to it, where these new rows are the same as some of the original rows except for a tiny amount of noise, which I call eps. So, the degree of rank deficiency is controlled by eps. I let eps take on 3 values: 1.e-07, 1.e-10, and 1.e-14, and show that the optimal tol (in terms of being close to rankMat) depends on the level of precision (eps) in the matrix. >> set.seed(123) >> nrow <- 15 >> ncol <- 20 >> nsim <- 5000 >> ndefic <- 4 # number of "nearly" dependent rows >> eps <- 1.e-07 >> rnk <- matrix(NA, nsim, 5) >> for (j in 1:nsim) { RV> + A <- matrix(rnorm(nrow*ncol),nrow, ncol) RV> + newrows <- matrix(NA, ndefic, ncol) RV> + for (i in 1:ndefic) { RV> + newrows[i,] <- A[nrow-i,] + rnorm(ncol, sd=min(abs(A[nrow-i+1,]))* eps) RV> + } RV> + RV> + B <- rbind(A,newrows) RV> + rnk[j,1] <- rankMat(B) RV> + rnk[j,2] <- qr(B, tol = 1.e-07)$rank RV> + rnk[j,3] <- qr(B, tol = 1.e-11)$rank RV> + rnk[j,4] <- qr(B, tol = 1.0e-14)$rank RV> + rnk[j,5] <- qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank RV> + } >> apply(abs(rnk[,1] - rnk[,2:5]),2,sum) RV> [1] 19351 53 0 0 RV> We observe that both qr(B, tol=1.e-14)$rank and qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank give exactly the same rank as rankMat. RV> Now, we change eps <- 1.e-10 and the results are: >> apply(abs(rnk[,1] - rnk[,2:5]),2,sum) RV> [1] 19778 14263 166 220 RV> This means that a tolerance of 1.e-14 works best. RV> Now we lower eps further: eps <- 1.e-14 >> apply(abs(rnk[,1] - rnk[,2:5]),2,sum) RV> [1] 0 3 665 20000 RV> Clearly, the smaller tolerances yield rank estimates that are higher than that given by rankMat. That is, rankMat underestimates the rank as in the case of Hilbert matrix. RV> 3. RV> Now we show that qr(., tol, LAPACK=TRUE)$rank is independent of tol: >> exp <- -(7:16) >> tol <- 10^exp >> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=FALSE)$rank) RV> [1] 10 12 14 14 15 16 16 17 18 19 >> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=TRUE)$rank) RV> [1] 20 20 20 20 20 20 20 20 20 20 RV> Looking forward to comments. RV> Best, RV> Ravi. RV> ----- Original Message ----- RV> From: Martin Maechler <[EMAIL PROTECTED]> RV> Date: Saturday, April 7, 2007 10:57 am RV> Subject: Re: [R] Computing the rank of a matrix. RV> To: Ravi Varadhan <[EMAIL PROTECTED]> RV> Cc: [EMAIL PROTECTED], "'José Luis Aznarte M.'" <[EMAIL PROTECTED]>, r-help@stat.math.ethz.ch >> >>>>> "Ravi" == Ravi Varadhan <[EMAIL PROTECTED]> >> >>>>> on Fri, 6 Apr 2007 12:44:33 -0400 writes: >> Ravi> Hi, qr(A)$rank will work, but just be wary of the Ravi> tolerance parameter (default is 1.e-07), since the Ravi> rank computation could be sensitive to the tolerance Ravi> chosen. >> >> Yes, indeed. >> >> The point is that rank(<Matrix>) >> is well defined in pure math (linear algebra), as well as >> a "singular matrix" is. >> >> The same typically no longer makes sense as soon as you enter >> the real world: A matrix "close to singular" may have to be >> treated "as if singular" depending on its "singularity >> closeness" {{ learn about the condition number of a matrix }} >> and the same issues arise with rank(<matrix>). >> >> Of course, the matlab programmers know all this (and much more), >> and indeed, matlab's rank(A) really is >> rank(A, tol = tol.default(A)) >> >> and is based on the SVD instead of QR decomposition since the >> former is said to be more reliable (even though slightly slower). >> >> R's equivalent (with quite a bit of fool-proofing) would be the >> following function (assuming correct online documentation of matlab): >> >> >> rankMat <- function(A, tol = NULL, singValA = svd(A, 0,0)$d) >> { >> ## Purpose: rank of a matrix ``as Matlab'' >> ## ---------------------------------------------------------------------- >> ## Arguments: A: a numerical matrix, maybe non-square >> ## tol: numerical tolerance (compared to singular values) >> ## singValA: vector of non-increasing singular values >> of A >> ## (pass as argument if already known) >> ## ---------------------------------------------------------------------- >> ## Author: Martin Maechler, Date: 7 Apr 2007, 16:16 >> d <- dim(A) >> stopifnot(length(d) == 2, length(singValA) == min(d), >> diff(singValA) < 0) # must be sorted decreasingly >> if(is.null(tol)) >> tol <- max(d) * .Machine$double.eps * abs(singValA[1]) >> else stopifnot(is.numeric(tol), tol >= 0) >> sum(singValA >= tol) >> } >> >> >> A small scale simulation with random matrices, >> i.e., things like >> >> ## ranks of random matrices; here will have 5 all the time: >> table(replicate(1000, rankMat(matrix(rnorm(5*12),5, 12) )))# < 1 sec. >> >> indicates that qr(.)$rank gives the same typically, >> where I assume one should really use >> >> qr(., tol = .Machine$double.eps, LAPACK = TRUE)$rank >> >> to be closer to Matlab's default tolerance. >> >> Ok, who has time to investigate further? >> Research exercise: >> 1> Is there a fixed number, say t0 <- 1e-15 1> for which qr(A, tol = t0, LAPACK=TRUE)$rank is 1> ``optimally close'' to rankMat(A) ? >> 2> how easily do you get cases showing svd(.) to more reliable 2> than qr(., LAPACK=TRUE)? >> >> To solve this in an interesting way, you should probably >> investigate classes of "almost rank-deficient" matrices, >> and I'd also be interested if you "randomly" ever get matrices A >> with rank(A) < min(dim(A)) - 1 >> (unless you construct some columns/rows exactly from earlier >> ones or use all-0 ones) >> >> Martin Maechler, ETH Zurich >> >> >> >> Ravi> Ravi. >> Ravi> ----------------------------------------------------------------- Ravi> Ravi Varadhan, Ph.D. Ravi> Assistant Professor, The Center on Aging and Health >> ....... Ravi> >> >> Ravi> -------------------------------------------------------------------- >> >> >> >> >> How about >> >> >> qr(A)$rank >> >> >> or perhaps >> >> >> qr(A, LAPACK=TRUE)$rank >> >> >> Cheers, >> >> >> Andy >> >> >> >> Hi! Maybe this is a silly question, but I need the >> >> column rank ( >> >> of a matrix and R function 'rank()' only gives me the >> >> ordering of the elements of my matrix. How can I >> >> compute the column rank of a matrix? Is there not an R >> >> equivalent to Matlab's 'rank()'? I've been browsing >> >> for a time now and I can't find anything, so any help >> >> will be greatly appreciated. Best regards! >> >> >> >> -- -- Jose Luis Aznarte M. >> >> Department of Computer >> >> Science and Artificial Intelligence Universidad de >> >> Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax: >> >> +34 - 958 - 24 00 79 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.