[R] Proportion of equal entries in dist()?

Jorge I Velez Mon, 19 Jan 2015 05:40:46 -0800

Dear all,

Given vectors "x" and "y", I would like to compute the proportion of
entries that are equal, that is, mean(x == y).


Now, suppose I have the following matrix:

n <- 1e2
m <- 1e4
X <- matrix(sample(0:2, m*n, replace = TRUE), ncol = m)

I am interested in calculating the above proportion for every pairwise
combination of rows.  I came up with the following:

myd <- function(X, p = NROW(X)){
D <- matrix(NA, p, p)
for(i in 1:p) for(j in 1:p) if(i > j) D[i, j] <- mean(X[i, ] == X[j,])
D
}

system.time(d <- myd(X))

However, in my application n and m are much more larger than in this
example and the computational time might be an issue.  I would very much
appreciate any suggestions on how to speed the "myd" function.

Note:  I have done some experiments with the dist() function and despite
being much, much, much faster than "myd", none of the default distances
fits my needs.  I would also appreciate any suggestions on how to include
"my own" distance function in dist().

Thank you very much for your time.

Best regards,
Jorge Velez.-

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Proportion of equal entries in dist()?

Reply via email to