[ https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648340#comment-14648340 ]
Debasish Das commented on SPARK-4823: ------------------------------------- We did more detailed experiment for July 2015 Spark Meetup to understand the shuffle effects on runtime. I attached the data for experiments in the JIRA. I will update the PR as discussed with Reza. I am targeting 1 PR for Spark 1.5. > rowSimilarities > --------------- > > Key: SPARK-4823 > URL: https://issues.apache.org/jira/browse/SPARK-4823 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Reza Zadeh > Attachments: MovieLensSimilarity Comparisons.pdf > > > RowMatrix has a columnSimilarities method to find cosine similarities between > columns. > A rowSimilarities method would be useful to find similarities between rows. > This is JIRA is to investigate which algorithms are suitable for such a > method, better than brute-forcing it. Note that when there are many rows (> > 10^6), it is unlikely that brute-force will be feasible, since the output > will be of order 10^12. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org