[R] use loop or use apply?

2007-05-17 Thread Prasenjit Kapat
Hi,

I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb) 
such that, C[i,j] is the Euclidean distance between the ith row of A and jth 
row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). 
What is the best method for doing so? (assume a  b)

I have been doing some exploration myself: Consider the following function: 
get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' 
avoids one loop by constructing a bigger matrix, but doesn't use 
apply(); 'method=3' avoids both the loops by using apply() and constructing 
bigger matrices; 'method=4' avoids constructing bigger matrices by using 
apply() twice.

get.f - function (A, B, method=2) {
if (method == 1){
a - nrow(A); b - nrow(B);
C - matrix(NA, nrow=a, ncol=b);
for (i in 1:a) 
for (j in 1:b) 
C[i,j] - sum((A[i,]-B[j,])^2)
} else if (method == 2 ) {
a - nrow(A); b - nrow(B); d - ncol(A);
C - matrix(NA, nrow=a, ncol=b);
for (i in 1:a) 
C[i,] - rowSums((matrix(A[i,], nrow=b, ncol=d, 
byrow=TRUE) - B) ^ 2)
} else if (method == 3) {
C - t(apply(A, MARGIN=1, FUN=FUN1, BB=B)); # 
transpose is needed
} else if (method == 4) {
C - t(apply(A, MARGIN=1, FUN=FUN2, BB=B))
}
}

FUN1 - function(aa, BB)
  return(rowSums(
(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
  )

FUN2 - function(aa, BB)
return(apply(BB, MARGIN=1, FUN=FUN3, aa=aa))

FUN3 - function(bb,aa) return(sum((aa-bb)^2))

### With these methods and the following intitializations,

a - 100; b - 1000; d - 100; n.loop - 20;

A - matrix(rnorm(a*d), ncol=d)
B - matrix(rnorm(b*d), ncol=d)

all.times - matrix(0,nrow=5,ncol=4)
rownames(all.times) - rownames(as.matrix(system.time(NULL)))

for (i in 1:4)  
for (j in 1:n.loop)
all.times[,i] - all.times[,i] + 
as.matrix(system.time(C - get.f(A=A, B=B, 
method=i)))

all.times - all.times / n.loop
print(all.times)

   [,1][,2][,3][,4]
user.self   4.0554 1.50010 1.50130 4.51285
sys.self 0.0370 0.02420 0.01800 0.04260
elapsed4.2705 1.58865 1.59475 6.07535
user.child 0. 0.0 0.0 0.0
sys.child   0. 0.0 0.0 0.0

'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' 
(two apply()s)... Is that expected?

Is it possible to improve over 'method=2'?

Thanks
PK

PS: The mail text seems fine in my composer, I hope, it looks decent in your 
reader.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use loop or use apply?

2007-05-17 Thread Adaikalavan Ramasamy
Can you check if the following gives you what you want?

tmp - rbind( A, B )
dis - dist( tmp )
nA  - nrow(A)
nB  - nrow(B)
dis[ 1:nA, nA + 1:nB ] ## output

If it works, this suggestion comes with the caveat that it might be 
computationally inefficient compared with using for() loops for very 
large values of (a,b) or highly discordant values of (a,b). However I am 
hoping the gain from dist() being coded in C should offset it.

Try experimenting to find the optimal speed etc. Also have a look at 
mapply() examples to see if they are useful.

Regards, Adai



Prasenjit Kapat wrote:
 Hi,
 
 I have two matrices, A (axd) and B (bxd). I want to get another matrix C 
 (axb) 
 such that, C[i,j] is the Euclidean distance between the ith row of A and jth 
 row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). 
 What is the best method for doing so? (assume a  b)
 
 I have been doing some exploration myself: Consider the following function: 
 get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' 
 avoids one loop by constructing a bigger matrix, but doesn't use 
 apply(); 'method=3' avoids both the loops by using apply() and constructing 
 bigger matrices; 'method=4' avoids constructing bigger matrices by using 
 apply() twice.
 
 get.f - function (A, B, method=2) {
   if (method == 1){
   a - nrow(A); b - nrow(B);
   C - matrix(NA, nrow=a, ncol=b);
   for (i in 1:a) 
   for (j in 1:b) 
   C[i,j] - sum((A[i,]-B[j,])^2)
   } else if (method == 2 ) {
   a - nrow(A); b - nrow(B); d - ncol(A);
   C - matrix(NA, nrow=a, ncol=b);
   for (i in 1:a) 
   C[i,] - rowSums((matrix(A[i,], nrow=b, ncol=d, 
 byrow=TRUE) - B) ^ 2)
   } else if (method == 3) {
   C - t(apply(A, MARGIN=1, FUN=FUN1, BB=B)); # 
 transpose is needed
   } else if (method == 4) {
   C - t(apply(A, MARGIN=1, FUN=FUN2, BB=B))
   }
 }
 
 FUN1 - function(aa, BB)
   return(rowSums(
   (matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
   )
 
 FUN2 - function(aa, BB)
   return(apply(BB, MARGIN=1, FUN=FUN3, aa=aa))
 
 FUN3 - function(bb,aa) return(sum((aa-bb)^2))
 
 ### With these methods and the following intitializations,
 
 a - 100; b - 1000; d - 100; n.loop - 20;
 
 A - matrix(rnorm(a*d), ncol=d)
 B - matrix(rnorm(b*d), ncol=d)
 
 all.times - matrix(0,nrow=5,ncol=4)
 rownames(all.times) - rownames(as.matrix(system.time(NULL)))
 
 for (i in 1:4)  
   for (j in 1:n.loop)
   all.times[,i] - all.times[,i] + 
   as.matrix(system.time(C - get.f(A=A, B=B, 
 method=i)))
 
 all.times - all.times / n.loop
 print(all.times)
 
[,1][,2][,3][,4]
 user.self   4.0554 1.50010 1.50130 4.51285
 sys.self 0.0370 0.02420 0.01800 0.04260
 elapsed4.2705 1.58865 1.59475 6.07535
 user.child 0. 0.0 0.0 0.0
 sys.child   0. 0.0 0.0 0.0
 
 'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' 
 (two apply()s)... Is that expected?
 
 Is it possible to improve over 'method=2'?
 
 Thanks
 PK
 
 PS: The mail text seems fine in my composer, I hope, it looks decent in your 
 reader.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.