There's no *RowSimilarity *method in RowMatrix class. You have to transpose
your matrix to use that method. However, when the number of rows is large,
this approach is still very slow.
Try to use approximate nearest neighbor (ANN) methods instead such as LSH.
There are several implements of LSH on spark that you can find on github.
For example: https://github.com/karlhigley/spark-neighbors.

An other option, you can use ANN libraries on a single machine. There's a
good benchmark of ANN libraries here:
https://github.com/erikbern/ann-benchmarks

2016-05-17 23:24 GMT+07:00 Rex X <dnsr...@gmail.com>:

> Each row of the given matrix is Vector[Double]. Want to find out the
> nearest neighbor row to each row using cosine similarity.
>
> The problem here is the complexity: O( 10^20 )
>
> We need to do *blocking*, and do the row-wise comparison within each
> block. Any tips for best practice?
>
> In Spark, we have RowMatrix.*ColumnSimilarity*, but I didn't find
> *RowSimilarity* method.
>
>
> Thank you.
>
>
> Regards
> Rex
>
>
>
>

Reply via email to