Dear all,

I'm trying to understand what is the correct use case of ColumnSimilarity
implemented in RowMatrix.

As far as I know, this function computes the similarity of a column of a
given matrix. The DIMSUM paper says that it's efficient for large m (rows)
and small n (columns). In this case the output will be a n by n matrix.

Now, suppose I want to compute similarity of several users, say m =
billions. Each users is described by a high dimensional feature vector, say
n = 10000. In my dataset, one row represent one user. So in that case
computing the similarity my matrix is not the same as computing the
similarity of all users. Then, what does it mean computing the similarity
of the columns of my matrix in this case ?

Best regards,

Jao

Reply via email to