Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-24 Thread Jose Quesada
Thanks Charles, Martin,

Substantial improvement with the vectorized solution. Here is a quick benchmark:

# The loop-based solution:
nestedCos = function (x) {
if (is(x, Matrix) ) {
cos = array(NA, c(ncol(x), ncol(x)))
for (i in 2:ncol(x)) {
for (j in 1:(i - 1)) {
cos[i, j] = cosine(x[, i], x[, j])
}
}
}
return(cos)
}
# Charles C. Berry's vectorized approach
flatCos = function (x) {
res = crossprod( x , x )
diagnl  = Diagonal( ncol(x), 1 / sqrt( diag( res )))
cos = diagnl %*% res %*% diagnl
return(cos)
}

Benchmarking:

 system.time(for(i in 1:10)nestedCos(x))
(I stopped because it was taking too long)
Timing stopped at: 139.37 3.82 188.76 NA NA
 system.time(for(i in 1:10)flatCos(x))
[1] 0.43 0.00 0.48   NA   NA

#--
As much as I like to have faster code, I'm still wondering WHY flatCos gets the 
same results; i.e., why multiplying the inverse sqrt root of the diagonal of x 
BY x, then BY the diagonal again produces the expected result. I checked the 
wikipedia page for crossprod and other sources, but it still eludes me. I can 
see that scaling by the sqrt of the diagonal once makes sense with 'res - 
crossprod( x , x ) gives your result up to scale factors of 
sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply by 
the diagonal again.

Maybe trying to attack a simpler problem might help my understanding: e.g., 
calculating the cos of a column to all other colums of x (that is, the inner 
part of the nested loop). How would that work in a vectorized way? I'm trying 
to get some general technique that I can reuse later from this excellent answer.

Thanks,
-Jose



 I am rusty on 'Matrix', but I see there are crossprod methods for those
 classes.

   res - crossprod( x , x )

 gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
 something like

   diagnl - Diagonal( ncol(x), sqrt( diag( res ) )


OOPS! Better make that

diagnl - Diagonal( ncol(x), 1 / sqrt( diag( res ) )


   final.res - diagnl %*% res %*% diagnl

 should do it.


-- 
Cheers,
-Jose

--
Jose Quesada, PhD
Research fellow, Psychology Dept.
Sussex University, Brighton, UK
http://www.andrew.cmu.edu/~jquesada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-24 Thread Peter Dalgaard
Jose Quesada wrote:
 Thanks Charles, Martin,

 Substantial improvement with the vectorized solution. Here is a quick 
 benchmark:

 # The loop-based solution:
 nestedCos = function (x) {
   if (is(x, Matrix) ) {
   cos = array(NA, c(ncol(x), ncol(x)))
   for (i in 2:ncol(x)) {
   for (j in 1:(i - 1)) {
   cos[i, j] = cosine(x[, i], x[, j])
   }
   }
   }
   return(cos)
 }
 # Charles C. Berry's vectorized approach
 flatCos = function (x) {
   res = crossprod( x , x )
   diagnl  = Diagonal( ncol(x), 1 / sqrt( diag( res )))
   cos = diagnl %*% res %*% diagnl
   return(cos)
 }

 Benchmarking:

   
 system.time(for(i in 1:10)nestedCos(x))
 
 (I stopped because it was taking too long)
 Timing stopped at: 139.37 3.82 188.76 NA NA
   
 system.time(for(i in 1:10)flatCos(x))
 
 [1] 0.43 0.00 0.48   NA   NA

 #--
 As much as I like to have faster code, I'm still wondering WHY flatCos gets 
 the same results; i.e., why multiplying the inverse sqrt root of the diagonal 
 of x BY x, then BY the diagonal again produces the expected result. I checked 
 the wikipedia page for crossprod and other sources, but it still eludes me. I 
 can see that scaling by the sqrt of the diagonal once makes sense with 'res 
 - crossprod( x , x ) gives your result up to scale factors of 
 sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply 
 by the diagonal again.

   
Didn't follow this thread too closely, but the point would seem to be
that the inner product of two normalized vectors is the cosine of the
angle. So basically, you want

crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl

I think, BTW, that another version not requiring Matrix is

Cr - crossprod(X)
D - sqrt(diag(C))
Cr/outer(D, D)

 Maybe trying to attack a simpler problem might help my understanding: e.g., 
 calculating the cos of a column to all other colums of x (that is, the inner 
 part of the nested loop). How would that work in a vectorized way? I'm trying 
 to get some general technique that I can reuse later from this excellent 
 answer.

 Thanks,
 -Jose

   
 I am rusty on 'Matrix', but I see there are crossprod methods for those
 classes.

  res - crossprod( x , x )

 gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
 something like

  diagnl - Diagonal( ncol(x), sqrt( diag( res ) )

 

 OOPS! Better make that

   diagnl - Diagonal( ncol(x), 1 / sqrt( diag( res ) )

   
  final.res - diagnl %*% res %*% diagnl

 should do it.

 

   


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-23 Thread Charles C. Berry


I am rusty on 'Matrix', but I see there are crossprod methods for those 
classes.

res - crossprod( x , x )

gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so 
something like

diagnl - Diagonal( ncol(x), sqrt( diag( res ) )

final.res - diagnl %*% res %*% diagnl

should do it.

On Tue, 23 Jan 2007, Jose Quesada wrote:

 (Extremely sorry, disregard previous email as I hit send before pasting the 
 latest version of the example; this one is smaller too)
 Dear R users,

 I want to apply a function that takes two vectors as input to all pairs
 (combinations (nrow(X), 2))of matrix rows in a matrix.
 I know that ideally, one should avoid loops in R, but after reading the docs 
 for
 do.call, apply, etc, I still don't know how to write the nested loop in a
 vectorized way.

 Example data:
 x = matrix(rnorm(100), 10, 10)
 # this is actually a very large sparse matrix, but it doesn't matter for the
 # example
 library(Matrix)
 x = as(x,CsparseMatrix)

 # cosine function
 cosine = function (x, y){
   if (is.vector(x)  is.vector(y)) {
   return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)))
   } else {stop(cosine: argument mismatch. Two vectors needed as input.)}
 }

 # The loop-based solution I have is:
   if (is(x, Matrix) ) {
   cos = array(NA, c(ncol(x), ncol(x)))
   for (i in 2:ncol(x)) {
   for (j in 1:(i - 1)) {
   cos[i, j] = cosine(x[, i], x[, j])
   }
   }
   }

 This solution seems inneficient. Is there an easy way of achieving this with a
 clever do.call + apply combination?

 Also, I have noticed that getting a row from a Matrix object produces a normal
 array (i.e., it does not inherit Matrix class). However, selecting 1 rows, 
 does
 produce a same-class matrix. If I convert with as() the output of selecting 
 one
 row, am I losing performance? Is there any way to make the resulting vector 
 be a
 1-D Matrix object?
 This solution seems inneficient. Is there an easy way of achieving this with a
 clever do.call + apply combination?
 -- 
 Thanks in advance,
 -Jose

 --
 Jose Quesada, PhD
 Research fellow, Psychology Dept.
 Sussex University, Brighton, UK
 http://www.andrew.cmu.edu/~jquesada

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [fixed] vectorized nested loop: apply a function that takes two rows

2007-01-23 Thread Charles C. Berry
On Tue, 23 Jan 2007, Charles C. Berry wrote:



 I am rusty on 'Matrix', but I see there are crossprod methods for those
 classes.

   res - crossprod( x , x )

 gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
 something like

   diagnl - Diagonal( ncol(x), sqrt( diag( res ) )


OOPS! Better make that

diagnl - Diagonal( ncol(x), 1 / sqrt( diag( res ) )



   final.res - diagnl %*% res %*% diagnl

 should do it.

 On Tue, 23 Jan 2007, Jose Quesada wrote:

 (Extremely sorry, disregard previous email as I hit send before pasting the 
 latest version of the example; this one is smaller too)
 Dear R users,

 I want to apply a function that takes two vectors as input to all pairs
 (combinations (nrow(X), 2))of matrix rows in a matrix.
 I know that ideally, one should avoid loops in R, but after reading the docs 
 for
 do.call, apply, etc, I still don't know how to write the nested loop in a
 vectorized way.

 Example data:
 x= matrix(rnorm(100), 10, 10)
 # this is actually a very large sparse matrix, but it doesn't matter for the
 # example
 library(Matrix)
 x = as(x,CsparseMatrix)

 # cosine function
 cosine = function (x, y){
  if (is.vector(x)  is.vector(y)) {
  return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)))
  } else {stop(cosine: argument mismatch. Two vectors needed as input.)}
 }

 # The loop-based solution I have is:
  if (is(x, Matrix) ) {
  cos = array(NA, c(ncol(x), ncol(x)))
  for (i in 2:ncol(x)) {
  for (j in 1:(i - 1)) {
  cos[i, j] = cosine(x[, i], x[, j])
  }
  }
  }

 This solution seems inneficient. Is there an easy way of achieving this with 
 a
 clever do.call + apply combination?

 Also, I have noticed that getting a row from a Matrix object produces a 
 normal
 array (i.e., it does not inherit Matrix class). However, selecting 1 rows, 
 does
 produce a same-class matrix. If I convert with as() the output of selecting 
 one
 row, am I losing performance? Is there any way to make the resulting vector 
 be a
 1-D Matrix object?
 This solution seems inneficient. Is there an easy way of achieving this with 
 a
 clever do.call + apply combination?
 --
 Thanks in advance,
 -Jose

 --
 Jose Quesada, PhD
 Research fellow, Psychology Dept.
 Sussex University, Brighton, UK
 http://www.andrew.cmu.edu/~jquesada

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
 E mailto:[EMAIL PROTECTED] UC San Diego
 http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.