Re: Efficient cosine similarity computation

2019-09-23 Thread Chee Yee Lim
I've been trying to achieve the same objective, coming up with approaches similar to your method 1 and 2. Method 2 is the slowest for me due to massive amount of data being shuffled around at each matrix operation stage. Method 3 is new to me, so I can't comment much. I ended up using an approach

Efficient cosine similarity computation

2019-09-23 Thread Stevens, Clay
There are several ways I can compute the cosine similarities between a Spark ML vector to each ML vector in a Spark DataFrame column then sorting for the highest results. However, I can't come up with a method that is faster than replacing the `/data/` in a Spark ML Word2Vec model, then using