There's no *RowSimilarity *method in RowMatrix class. You have to transpose
your matrix to use that method. However, when the number of rows is large,
this approach is still very slow.
Try to use approximate nearest neighbor (ANN) methods instead such as LSH.
There are several implements of LSH on
Each row of the given matrix is Vector[Double]. Want to find out the
nearest neighbor row to each row using cosine similarity.
The problem here is the complexity: O( 10^20 )
We need to do *blocking*, and do the row-wise comparison within each block.
Any tips for best practice?
In Spark, we have