DIMSUM and ColumnSimilarity use case ?

2014-12-10 Thread Jaonary Rabarisoa
Dear all, I'm trying to understand what is the correct use case of ColumnSimilarity implemented in RowMatrix. As far as I know, this function computes the similarity of a column of a given matrix. The DIMSUM paper says that it's efficient for large m (rows) and small n (columns). In this case

Re: DIMSUM and ColumnSimilarity use case ?

2014-12-10 Thread Sean Owen
Well, you're computing similarity of your features then. Whether it is meaningful depends a bit on the nature of your features and more on the similarity algorithm. On Wed, Dec 10, 2014 at 2:53 PM, Jaonary Rabarisoa jaon...@gmail.com wrote: Dear all, I'm trying to understand what is the

Re: DIMSUM and ColumnSimilarity use case ?

2014-12-10 Thread Debasish Das
If you have tall x skinny matrix of m users and n products, column similarity will give you a n x n matrix (product x product matrix)...this is also called product correlation matrix...it can be cosine, pearson or other kind of correlations...Note that if the entry is unobserved (user Joanary did

Re: DIMSUM and ColumnSimilarity use case ?

2014-12-10 Thread Reza Zadeh
As Sean mentioned, you would be computing similar features then. If you want to find similar users, I suggest running k-means with some fixed number of clusters. It's not reasonable to try and compute all pairs of similarities between 1bn items, so k-means with fixed k is more suitable here.