Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows
Jose Quesada wrote: > Thanks Charles, Martin, > > Substantial improvement with the vectorized solution. Here is a quick > benchmark: > > # The loop-based solution: > nestedCos = function (x) { > if (is(x, "Matrix") ) { > cos = array(NA, c(ncol(x), ncol(x))) > for (i in 2:ncol(x)) { > for (j in 1:(i - 1)) { > cos[i, j] = cosine(x[, i], x[, j]) > } > } > } > return(cos) > } > # Charles C. Berry's vectorized approach > flatCos = function (x) { > res = crossprod( x , x ) > diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res ))) > cos = diagnl %*% res %*% diagnl > return(cos) > } > > Benchmarking: > > >> system.time(for(i in 1:10)nestedCos(x)) >> > (I stopped because it was taking too long) > Timing stopped at: 139.37 3.82 188.76 NA NA > >> system.time(for(i in 1:10)flatCos(x)) >> > [1] 0.43 0.00 0.48 NA NA > > #-- > As much as I like to have faster code, I'm still wondering WHY flatCos gets > the same results; i.e., why multiplying the inverse sqrt root of the diagonal > of x BY x, then BY the diagonal again produces the expected result. I checked > the wikipedia page for crossprod and other sources, but it still eludes me. I > can see that scaling by the sqrt of the diagonal once makes sense with 'res > <- crossprod( x , x ) gives your result up to scale factors of > sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply > by the diagonal again. > > Didn't follow this thread too closely, but the point would seem to be that the inner product of two normalized vectors is the cosine of the angle. So basically, you want crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl I think, BTW, that another version not requiring Matrix is Cr <- crossprod(X) D <- sqrt(diag(C)) Cr/outer(D, D) > Maybe trying to attack a simpler problem might help my understanding: e.g., > calculating the cos of a column to all other colums of x (that is, the inner > part of the nested loop). How would that work in a vectorized way? I'm trying > to get some general technique that I can reuse later from this excellent > answer. > > Thanks, > -Jose > > >> I am rusty on 'Matrix', but I see there are crossprod methods for those >> classes. >> >> res <- crossprod( x , x ) >> >> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so >> something like >> >> diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) >> >> > > OOPS! Better make that > > diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) ) > > >> final.res <- diagnl %*% res %*% diagnl >> >> should do it. >> >> > > -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows
Thanks Charles, Martin, Substantial improvement with the vectorized solution. Here is a quick benchmark: # The loop-based solution: nestedCos = function (x) { if (is(x, "Matrix") ) { cos = array(NA, c(ncol(x), ncol(x))) for (i in 2:ncol(x)) { for (j in 1:(i - 1)) { cos[i, j] = cosine(x[, i], x[, j]) } } } return(cos) } # Charles C. Berry's vectorized approach flatCos = function (x) { res = crossprod( x , x ) diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res ))) cos = diagnl %*% res %*% diagnl return(cos) } Benchmarking: > system.time(for(i in 1:10)nestedCos(x)) (I stopped because it was taking too long) Timing stopped at: 139.37 3.82 188.76 NA NA > system.time(for(i in 1:10)flatCos(x)) [1] 0.43 0.00 0.48 NA NA #-- As much as I like to have faster code, I'm still wondering WHY flatCos gets the same results; i.e., why multiplying the inverse sqrt root of the diagonal of x BY x, then BY the diagonal again produces the expected result. I checked the wikipedia page for crossprod and other sources, but it still eludes me. I can see that scaling by the sqrt of the diagonal once makes sense with 'res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply by the diagonal again. Maybe trying to attack a simpler problem might help my understanding: e.g., calculating the cos of a column to all other colums of x (that is, the inner part of the nested loop). How would that work in a vectorized way? I'm trying to get some general technique that I can reuse later from this excellent answer. Thanks, -Jose > > > I am rusty on 'Matrix', but I see there are crossprod methods for those > classes. > > res <- crossprod( x , x ) > > gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so > something like > > diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) > OOPS! Better make that diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) ) > > final.res <- diagnl %*% res %*% diagnl > > should do it. > -- Cheers, -Jose -- Jose Quesada, PhD Research fellow, Psychology Dept. Sussex University, Brighton, UK http://www.andrew.cmu.edu/~jquesada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows
On Tue, 23 Jan 2007, Charles C. Berry wrote: > > > I am rusty on 'Matrix', but I see there are crossprod methods for those > classes. > > res <- crossprod( x , x ) > > gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so > something like > > diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) > OOPS! Better make that diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) ) > > final.res <- diagnl %*% res %*% diagnl > > should do it. > > On Tue, 23 Jan 2007, Jose Quesada wrote: > >> (Extremely sorry, disregard previous email as I hit send before pasting the >> latest version of the example; this one is smaller too) >> Dear R users, >> >> I want to apply a function that takes two vectors as input to all pairs >> (combinations (nrow(X), 2))of matrix rows in a matrix. >> I know that ideally, one should avoid loops in R, but after reading the docs >> for >> do.call, apply, etc, I still don't know how to write the nested loop in a >> vectorized way. >> >> Example data: >> x= matrix(rnorm(100), 10, 10) >> # this is actually a very large sparse matrix, but it doesn't matter for the >> # example >> library(Matrix) >> x = as(x,"CsparseMatrix") >> >> # cosine function >> cosine = function (x, y){ >> if (is.vector(x) && is.vector(y)) { >> return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y))) >> } else {stop("cosine: argument mismatch. Two vectors needed as input.")} >> } >> >> # The loop-based solution I have is: >> if (is(x, "Matrix") ) { >> cos = array(NA, c(ncol(x), ncol(x))) >> for (i in 2:ncol(x)) { >> for (j in 1:(i - 1)) { >> cos[i, j] = cosine(x[, i], x[, j]) >> } >> } >> } >> >> This solution seems inneficient. Is there an easy way of achieving this with >> a >> clever do.call + apply combination? >> >> Also, I have noticed that getting a row from a Matrix object produces a >> normal >> array (i.e., it does not inherit Matrix class). However, selecting >1 rows, >> does >> produce a same-class matrix. If I convert with as() the output of selecting >> one >> row, am I losing performance? Is there any way to make the resulting vector >> be a >> 1-D Matrix object? >> This solution seems inneficient. Is there an easy way of achieving this with >> a >> clever do.call + apply combination? >> -- >> Thanks in advance, >> -Jose >> >> -- >> Jose Quesada, PhD >> Research fellow, Psychology Dept. >> Sussex University, Brighton, UK >> http://www.andrew.cmu.edu/~jquesada >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry(858) 534-2098 > Dept of Family/Preventive Medicine > E mailto:[EMAIL PROTECTED] UC San Diego > http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows
I am rusty on 'Matrix', but I see there are crossprod methods for those classes. res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so something like diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) final.res <- diagnl %*% res %*% diagnl should do it. On Tue, 23 Jan 2007, Jose Quesada wrote: > (Extremely sorry, disregard previous email as I hit send before pasting the > latest version of the example; this one is smaller too) > Dear R users, > > I want to apply a function that takes two vectors as input to all pairs > (combinations (nrow(X), 2))of matrix rows in a matrix. > I know that ideally, one should avoid loops in R, but after reading the docs > for > do.call, apply, etc, I still don't know how to write the nested loop in a > vectorized way. > > Example data: > x = matrix(rnorm(100), 10, 10) > # this is actually a very large sparse matrix, but it doesn't matter for the > # example > library(Matrix) > x = as(x,"CsparseMatrix") > > # cosine function > cosine = function (x, y){ > if (is.vector(x) && is.vector(y)) { > return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y))) > } else {stop("cosine: argument mismatch. Two vectors needed as input.")} > } > > # The loop-based solution I have is: > if (is(x, "Matrix") ) { > cos = array(NA, c(ncol(x), ncol(x))) > for (i in 2:ncol(x)) { > for (j in 1:(i - 1)) { > cos[i, j] = cosine(x[, i], x[, j]) > } > } > } > > This solution seems inneficient. Is there an easy way of achieving this with a > clever do.call + apply combination? > > Also, I have noticed that getting a row from a Matrix object produces a normal > array (i.e., it does not inherit Matrix class). However, selecting >1 rows, > does > produce a same-class matrix. If I convert with as() the output of selecting > one > row, am I losing performance? Is there any way to make the resulting vector > be a > 1-D Matrix object? > This solution seems inneficient. Is there an easy way of achieving this with a > clever do.call + apply combination? > -- > Thanks in advance, > -Jose > > -- > Jose Quesada, PhD > Research fellow, Psychology Dept. > Sussex University, Brighton, UK > http://www.andrew.cmu.edu/~jquesada > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.