Jose Quesada wrote: > Thanks Charles, Martin, > > Substantial improvement with the vectorized solution. Here is a quick > benchmark: > > # The loop-based solution: > nestedCos = function (x) { > if (is(x, "Matrix") ) { > cos = array(NA, c(ncol(x), ncol(x))) > for (i in 2:ncol(x)) { > for (j in 1:(i - 1)) { > cos[i, j] = cosine(x[, i], x[, j]) > } > } > } > return(cos) > } > # Charles C. Berry's vectorized approach > flatCos = function (x) { > res = crossprod( x , x ) > diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res ))) > cos = diagnl %*% res %*% diagnl > return(cos) > } > > Benchmarking: > > >> system.time(for(i in 1:10)nestedCos(x)) >> > (I stopped because it was taking too long) > Timing stopped at: 139.37 3.82 188.76 NA NA > >> system.time(for(i in 1:10)flatCos(x)) >> > [1] 0.43 0.00 0.48 NA NA > > #------------------------------------------------------ > As much as I like to have faster code, I'm still wondering WHY flatCos gets > the same results; i.e., why multiplying the inverse sqrt root of the diagonal > of x BY x, then BY the diagonal again produces the expected result. I checked > the wikipedia page for crossprod and other sources, but it still eludes me. I > can see that scaling by the sqrt of the diagonal once makes sense with 'res > <- crossprod( x , x ) gives your result up to scale factors of > sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply > by the diagonal again. > > Didn't follow this thread too closely, but the point would seem to be that the inner product of two normalized vectors is the cosine of the angle. So basically, you want
crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl I think, BTW, that another version not requiring Matrix is Cr <- crossprod(X) D <- sqrt(diag(C)) Cr/outer(D, D) > Maybe trying to attack a simpler problem might help my understanding: e.g., > calculating the cos of a column to all other colums of x (that is, the inner > part of the nested loop). How would that work in a vectorized way? I'm trying > to get some general technique that I can reuse later from this excellent > answer. > > Thanks, > -Jose > > >> I am rusty on 'Matrix', but I see there are crossprod methods for those >> classes. >> >> res <- crossprod( x , x ) >> >> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so >> something like >> >> diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) >> >> > > OOPS! Better make that > > diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) ) > > >> final.res <- diagnl %*% res %*% diagnl >> >> should do it. >> >> > > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.