Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-24 Thread Peter Dalgaard
Jose Quesada wrote:
> Thanks Charles, Martin,
>
> Substantial improvement with the vectorized solution. Here is a quick 
> benchmark:
>
> # The loop-based solution:
> nestedCos = function (x) {
>   if (is(x, "Matrix") ) {
>   cos = array(NA, c(ncol(x), ncol(x)))
>   for (i in 2:ncol(x)) {
>   for (j in 1:(i - 1)) {
>   cos[i, j] = cosine(x[, i], x[, j])
>   }
>   }
>   }
>   return(cos)
> }
> # Charles C. Berry's vectorized approach
> flatCos = function (x) {
>   res = crossprod( x , x )
>   diagnl  = Diagonal( ncol(x), 1 / sqrt( diag( res )))
>   cos = diagnl %*% res %*% diagnl
>   return(cos)
> }
>
> Benchmarking:
>
>   
>> system.time(for(i in 1:10)nestedCos(x))
>> 
> (I stopped because it was taking too long)
> Timing stopped at: 139.37 3.82 188.76 NA NA
>   
>> system.time(for(i in 1:10)flatCos(x))
>> 
> [1] 0.43 0.00 0.48   NA   NA
>
> #--
> As much as I like to have faster code, I'm still wondering WHY flatCos gets 
> the same results; i.e., why multiplying the inverse sqrt root of the diagonal 
> of x BY x, then BY the diagonal again produces the expected result. I checked 
> the wikipedia page for crossprod and other sources, but it still eludes me. I 
> can see that scaling by the sqrt of the diagonal once makes sense with 'res 
> <- crossprod( x , x ) gives your result up to scale factors of 
> sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply 
> by the diagonal again.
>
>   
Didn't follow this thread too closely, but the point would seem to be
that the inner product of two normalized vectors is the cosine of the
angle. So basically, you want

crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl

I think, BTW, that another version not requiring Matrix is

Cr <- crossprod(X)
D <- sqrt(diag(C))
Cr/outer(D, D)

> Maybe trying to attack a simpler problem might help my understanding: e.g., 
> calculating the cos of a column to all other colums of x (that is, the inner 
> part of the nested loop). How would that work in a vectorized way? I'm trying 
> to get some general technique that I can reuse later from this excellent 
> answer.
>
> Thanks,
> -Jose
>
>   
>> I am rusty on 'Matrix', but I see there are crossprod methods for those
>> classes.
>>
>>  res <- crossprod( x , x )
>>
>> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
>> something like
>>
>>  diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
>>
>> 
>
> OOPS! Better make that
>
>   diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )
>
>   
>>  final.res <- diagnl %*% res %*% diagnl
>>
>> should do it.
>>
>> 
>
>   


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-24 Thread Jose Quesada
Thanks Charles, Martin,

Substantial improvement with the vectorized solution. Here is a quick benchmark:

# The loop-based solution:
nestedCos = function (x) {
if (is(x, "Matrix") ) {
cos = array(NA, c(ncol(x), ncol(x)))
for (i in 2:ncol(x)) {
for (j in 1:(i - 1)) {
cos[i, j] = cosine(x[, i], x[, j])
}
}
}
return(cos)
}
# Charles C. Berry's vectorized approach
flatCos = function (x) {
res = crossprod( x , x )
diagnl  = Diagonal( ncol(x), 1 / sqrt( diag( res )))
cos = diagnl %*% res %*% diagnl
return(cos)
}

Benchmarking:

> system.time(for(i in 1:10)nestedCos(x))
(I stopped because it was taking too long)
Timing stopped at: 139.37 3.82 188.76 NA NA
> system.time(for(i in 1:10)flatCos(x))
[1] 0.43 0.00 0.48   NA   NA

#--
As much as I like to have faster code, I'm still wondering WHY flatCos gets the 
same results; i.e., why multiplying the inverse sqrt root of the diagonal of x 
BY x, then BY the diagonal again produces the expected result. I checked the 
wikipedia page for crossprod and other sources, but it still eludes me. I can 
see that scaling by the sqrt of the diagonal once makes sense with 'res <- 
crossprod( x , x ) gives your result up to scale factors of 
sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply by 
the diagonal again.

Maybe trying to attack a simpler problem might help my understanding: e.g., 
calculating the cos of a column to all other colums of x (that is, the inner 
part of the nested loop). How would that work in a vectorized way? I'm trying 
to get some general technique that I can reuse later from this excellent answer.

Thanks,
-Jose

>
>
> I am rusty on 'Matrix', but I see there are crossprod methods for those
> classes.
>
>   res <- crossprod( x , x )
>
> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
> something like
>
>   diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
>

OOPS! Better make that

diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )

>
>   final.res <- diagnl %*% res %*% diagnl
>
> should do it.
>

-- 
Cheers,
-Jose

--
Jose Quesada, PhD
Research fellow, Psychology Dept.
Sussex University, Brighton, UK
http://www.andrew.cmu.edu/~jquesada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-23 Thread Charles C. Berry
On Tue, 23 Jan 2007, Charles C. Berry wrote:

>
>
> I am rusty on 'Matrix', but I see there are crossprod methods for those
> classes.
>
>   res <- crossprod( x , x )
>
> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
> something like
>
>   diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
>

OOPS! Better make that

diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )


>
>   final.res <- diagnl %*% res %*% diagnl
>
> should do it.
>
> On Tue, 23 Jan 2007, Jose Quesada wrote:
>
>> (Extremely sorry, disregard previous email as I hit send before pasting the 
>> latest version of the example; this one is smaller too)
>> Dear R users,
>>
>> I want to apply a function that takes two vectors as input to all pairs
>> (combinations (nrow(X), 2))of matrix rows in a matrix.
>> I know that ideally, one should avoid loops in R, but after reading the docs 
>> for
>> do.call, apply, etc, I still don't know how to write the nested loop in a
>> vectorized way.
>>
>> Example data:
>> x= matrix(rnorm(100), 10, 10)
>> # this is actually a very large sparse matrix, but it doesn't matter for the
>> # example
>> library(Matrix)
>> x = as(x,"CsparseMatrix")
>>
>> # cosine function
>> cosine = function (x, y){
>>  if (is.vector(x) && is.vector(y)) {
>>  return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)))
>>  } else {stop("cosine: argument mismatch. Two vectors needed as input.")}
>> }
>>
>> # The loop-based solution I have is:
>>  if (is(x, "Matrix") ) {
>>  cos = array(NA, c(ncol(x), ncol(x)))
>>  for (i in 2:ncol(x)) {
>>  for (j in 1:(i - 1)) {
>>  cos[i, j] = cosine(x[, i], x[, j])
>>  }
>>  }
>>  }
>>
>> This solution seems inneficient. Is there an easy way of achieving this with 
>> a
>> clever do.call + apply combination?
>>
>> Also, I have noticed that getting a row from a Matrix object produces a 
>> normal
>> array (i.e., it does not inherit Matrix class). However, selecting >1 rows, 
>> does
>> produce a same-class matrix. If I convert with as() the output of selecting 
>> one
>> row, am I losing performance? Is there any way to make the resulting vector 
>> be a
>> 1-D Matrix object?
>> This solution seems inneficient. Is there an easy way of achieving this with 
>> a
>> clever do.call + apply combination?
>> --
>> Thanks in advance,
>> -Jose
>>
>> --
>> Jose Quesada, PhD
>> Research fellow, Psychology Dept.
>> Sussex University, Brighton, UK
>> http://www.andrew.cmu.edu/~jquesada
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry(858) 534-2098
>  Dept of Family/Preventive Medicine
> E mailto:[EMAIL PROTECTED] UC San Diego
> http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-23 Thread Charles C. Berry


I am rusty on 'Matrix', but I see there are crossprod methods for those 
classes.

res <- crossprod( x , x )

gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so 
something like

diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )

final.res <- diagnl %*% res %*% diagnl

should do it.

On Tue, 23 Jan 2007, Jose Quesada wrote:

> (Extremely sorry, disregard previous email as I hit send before pasting the 
> latest version of the example; this one is smaller too)
> Dear R users,
>
> I want to apply a function that takes two vectors as input to all pairs
> (combinations (nrow(X), 2))of matrix rows in a matrix.
> I know that ideally, one should avoid loops in R, but after reading the docs 
> for
> do.call, apply, etc, I still don't know how to write the nested loop in a
> vectorized way.
>
> Example data:
> x = matrix(rnorm(100), 10, 10)
> # this is actually a very large sparse matrix, but it doesn't matter for the
> # example
> library(Matrix)
> x = as(x,"CsparseMatrix")
>
> # cosine function
> cosine = function (x, y){
>   if (is.vector(x) && is.vector(y)) {
>   return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)))
>   } else {stop("cosine: argument mismatch. Two vectors needed as input.")}
> }
>
> # The loop-based solution I have is:
>   if (is(x, "Matrix") ) {
>   cos = array(NA, c(ncol(x), ncol(x)))
>   for (i in 2:ncol(x)) {
>   for (j in 1:(i - 1)) {
>   cos[i, j] = cosine(x[, i], x[, j])
>   }
>   }
>   }
>
> This solution seems inneficient. Is there an easy way of achieving this with a
> clever do.call + apply combination?
>
> Also, I have noticed that getting a row from a Matrix object produces a normal
> array (i.e., it does not inherit Matrix class). However, selecting >1 rows, 
> does
> produce a same-class matrix. If I convert with as() the output of selecting 
> one
> row, am I losing performance? Is there any way to make the resulting vector 
> be a
> 1-D Matrix object?
> This solution seems inneficient. Is there an easy way of achieving this with a
> clever do.call + apply combination?
> -- 
> Thanks in advance,
> -Jose
>
> --
> Jose Quesada, PhD
> Research fellow, Psychology Dept.
> Sussex University, Brighton, UK
> http://www.andrew.cmu.edu/~jquesada
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.