Jose Quesada wrote:
> Thanks Charles, Martin,
>
> Substantial improvement with the vectorized solution. Here is a quick 
> benchmark:
>
> # The loop-based solution:
> nestedCos = function (x) {
>       if (is(x, "Matrix") ) {
>               cos     = array(NA, c(ncol(x), ncol(x)))
>               for (i in 2:ncol(x)) {
>                       for (j in 1:(i - 1)) {
>                               cos[i, j] = cosine(x[, i], x[, j])
>                       }
>               }
>       }
>       return(cos)
> }
> # Charles C. Berry's vectorized approach
> flatCos = function (x) {
>       res             = crossprod( x , x )
>       diagnl          = Diagonal( ncol(x), 1 / sqrt( diag( res )))
>       cos                     = diagnl %*% res %*% diagnl
>       return(cos)
> }
>
> Benchmarking:
>
>   
>> system.time(for(i in 1:10)nestedCos(x))
>>     
> (I stopped because it was taking too long)
> Timing stopped at: 139.37 3.82 188.76 NA NA
>   
>> system.time(for(i in 1:10)flatCos(x))
>>     
> [1] 0.43 0.00 0.48   NA   NA
>
> #------------------------------------------------------
> As much as I like to have faster code, I'm still wondering WHY flatCos gets 
> the same results; i.e., why multiplying the inverse sqrt root of the diagonal 
> of x BY x, then BY the diagonal again produces the expected result. I checked 
> the wikipedia page for crossprod and other sources, but it still eludes me. I 
> can see that scaling by the sqrt of the diagonal once makes sense with 'res 
> <- crossprod( x , x ) gives your result up to scale factors of 
> sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply 
> by the diagonal again.
>
>   
Didn't follow this thread too closely, but the point would seem to be
that the inner product of two normalized vectors is the cosine of the
angle. So basically, you want

crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl

I think, BTW, that another version not requiring Matrix is

Cr <- crossprod(X)
D <- sqrt(diag(C))
Cr/outer(D, D)

> Maybe trying to attack a simpler problem might help my understanding: e.g., 
> calculating the cos of a column to all other colums of x (that is, the inner 
> part of the nested loop). How would that work in a vectorized way? I'm trying 
> to get some general technique that I can reuse later from this excellent 
> answer.
>
> Thanks,
> -Jose
>
>   
>> I am rusty on 'Matrix', but I see there are crossprod methods for those
>> classes.
>>
>>      res <- crossprod( x , x )
>>
>> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
>> something like
>>
>>      diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
>>
>>     
>
> OOPS! Better make that
>
>       diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )
>
>   
>>      final.res <- diagnl %*% res %*% diagnl
>>
>> should do it.
>>
>>     
>
>   


-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to