Can you check if the following gives you what you want?
tmp - rbind( A, B )
dis - dist( tmp )
nA - nrow(A)
nB - nrow(B)
dis[ 1:nA, nA + 1:nB ] ## output
If it works, this suggestion comes with the caveat that it might be
computationally inefficient compared with using for() loops for very
large values of (a,b) or highly discordant values of (a,b). However I am
hoping the gain from dist() being coded in C should offset it.
Try experimenting to find the optimal speed etc. Also have a look at
mapply() examples to see if they are useful.
Regards, Adai
Prasenjit Kapat wrote:
Hi,
I have two matrices, A (axd) and B (bxd). I want to get another matrix C
(axb)
such that, C[i,j] is the Euclidean distance between the ith row of A and jth
row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]).
What is the best method for doing so? (assume a b)
I have been doing some exploration myself: Consider the following function:
get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2'
avoids one loop by constructing a bigger matrix, but doesn't use
apply(); 'method=3' avoids both the loops by using apply() and constructing
bigger matrices; 'method=4' avoids constructing bigger matrices by using
apply() twice.
get.f - function (A, B, method=2) {
if (method == 1){
a - nrow(A); b - nrow(B);
C - matrix(NA, nrow=a, ncol=b);
for (i in 1:a)
for (j in 1:b)
C[i,j] - sum((A[i,]-B[j,])^2)
} else if (method == 2 ) {
a - nrow(A); b - nrow(B); d - ncol(A);
C - matrix(NA, nrow=a, ncol=b);
for (i in 1:a)
C[i,] - rowSums((matrix(A[i,], nrow=b, ncol=d,
byrow=TRUE) - B) ^ 2)
} else if (method == 3) {
C - t(apply(A, MARGIN=1, FUN=FUN1, BB=B)); #
transpose is needed
} else if (method == 4) {
C - t(apply(A, MARGIN=1, FUN=FUN2, BB=B))
}
}
FUN1 - function(aa, BB)
return(rowSums(
(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
)
FUN2 - function(aa, BB)
return(apply(BB, MARGIN=1, FUN=FUN3, aa=aa))
FUN3 - function(bb,aa) return(sum((aa-bb)^2))
### With these methods and the following intitializations,
a - 100; b - 1000; d - 100; n.loop - 20;
A - matrix(rnorm(a*d), ncol=d)
B - matrix(rnorm(b*d), ncol=d)
all.times - matrix(0,nrow=5,ncol=4)
rownames(all.times) - rownames(as.matrix(system.time(NULL)))
for (i in 1:4)
for (j in 1:n.loop)
all.times[,i] - all.times[,i] +
as.matrix(system.time(C - get.f(A=A, B=B,
method=i)))
all.times - all.times / n.loop
print(all.times)
[,1][,2][,3][,4]
user.self 4.0554 1.50010 1.50130 4.51285
sys.self 0.0370 0.02420 0.01800 0.04260
elapsed4.2705 1.58865 1.59475 6.07535
user.child 0. 0.0 0.0 0.0
sys.child 0. 0.0 0.0 0.0
'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4'
(two apply()s)... Is that expected?
Is it possible to improve over 'method=2'?
Thanks
PK
PS: The mail text seems fine in my composer, I hope, it looks decent in your
reader.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.