[ https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243055#comment-14243055 ]
Sean Owen commented on SPARK-4823: ---------------------------------- I don't think MapReduce matters here. You can compute pairs of similarities with any framework, or try to do it on the fly. It's not different than column similarities, right? I don't think there's anything more to it than applying a similarity metric to all pairs of vectors. I think the JIRA is about exposing a method just for API convenience, not because it's conceptually different. > rowSimilarities > --------------- > > Key: SPARK-4823 > URL: https://issues.apache.org/jira/browse/SPARK-4823 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Reza Zadeh > > RowMatrix has a columnSimilarities method to find cosine similarities between > columns. > A rowSimilarities method would be useful to find similarities between rows. > This is JIRA is to investigate which algorithms are suitable for such a > method, better than brute-forcing it. Note that when there are many rows (> > 10^6), it is unlikely that brute-force will be feasible, since the output > will be of order 10^12. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org