Dear all,
I'm trying to understand what is the correct use case of ColumnSimilarity
implemented in RowMatrix.
As far as I know, this function computes the similarity of a column of a
given matrix. The DIMSUM paper says that it's efficient for large m (rows)
and small n (columns). In this case
Well, you're computing similarity of your features then. Whether it is
meaningful depends a bit on the nature of your features and more on
the similarity algorithm.
On Wed, Dec 10, 2014 at 2:53 PM, Jaonary Rabarisoa jaon...@gmail.com wrote:
Dear all,
I'm trying to understand what is the
If you have tall x skinny matrix of m users and n products, column
similarity will give you a n x n matrix (product x product matrix)...this
is also called product correlation matrix...it can be cosine, pearson or
other kind of correlations...Note that if the entry is unobserved (user
Joanary did
As Sean mentioned, you would be computing similar features then.
If you want to find similar users, I suggest running k-means with some
fixed number of clusters. It's not reasonable to try and compute all pairs
of similarities between 1bn items, so k-means with fixed k is more suitable
here.