It's not so cheap to compute row similarities when there are many rows, as it amounts to computing the outer product of a matrix A (i.e. computing AA^T, which is expensive).
There is a JIRA to track handling (1) and (2) more efficiently than computing all pairs: https://issues.apache.org/jira/browse/SPARK-3066 On Wed, Dec 10, 2014 at 2:44 PM, Debasish Das <debasish.da...@gmail.com> wrote: > Hi, > > It seems there are multiple places where we would like to compute row > similarity (accurate or approximate similarities) > > Basically through RowMatrix columnSimilarities we can compute column > similarities of a tall skinny matrix > > Similarly we should have an API in RowMatrix called rowSimilarities where > we can compute similar rows in a map-reduce fashion. It will be useful for > following use-cases: > > 1. Generate topK users for each user from matrix factorization model > 2. Generate topK products for each product from matrix factorization model > 3. Generate kernel matrix for use in spectral clustering > 4. Generate kernel matrix for use in kernel regression/classification > > I am not sure if there are already good implementation for map-reduce row > similarity that we can use (ideas like fastfood and kitchen sink felt more > like for classification use-case but for recommendation also user > similarities show up which is unsupervised)... > > Is there a JIRA tracking it ? If not I can open one and we can discuss > further on it. > > Thanks. > Deb >