Re: What's the best way to find the Nearest Neighbor row of a matrix with 10billion rows x 300 columns?

2016-05-17 Thread nguyen duc tuan
There's no *RowSimilarity *method in RowMatrix class. You have to transpose your matrix to use that method. However, when the number of rows is large, this approach is still very slow. Try to use approximate nearest neighbor (ANN) methods instead such as LSH. There are several implements of LSH on

What's the best way to find the Nearest Neighbor row of a matrix with 10billion rows x 300 columns?

2016-05-17 Thread Rex X
Each row of the given matrix is Vector[Double]. Want to find out the nearest neighbor row to each row using cosine similarity. The problem here is the complexity: O( 10^20 ) We need to do *blocking*, and do the row-wise comparison within each block. Any tips for best practice? In Spark, we have