Dear all, I'm trying to understand what is the correct use case of ColumnSimilarity implemented in RowMatrix.
As far as I know, this function computes the similarity of a column of a given matrix. The DIMSUM paper says that it's efficient for large m (rows) and small n (columns). In this case the output will be a n by n matrix. Now, suppose I want to compute similarity of several users, say m = billions. Each users is described by a high dimensional feature vector, say n = 10000. In my dataset, one row represent one user. So in that case computing the similarity my matrix is not the same as computing the similarity of all users. Then, what does it mean computing the similarity of the columns of my matrix in this case ? Best regards, Jao