Hi Satyajit,

Have you tried to adjust a higher threshold for columnSimilarities to lower
the computation cost?

BTW, can you also comment out most of other codes and just run
columnSimilarities and do a simple computation like counting for the entries
of returned CoordinateMatrix? So we can make sure the problem is exactly at
columnSimilarities?

E.g,

val exact = mat.columnSimilarities(0.5)
val exactCount = exact.entries.count





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Document-Similarity-Spark-Mllib-tp20196p20219.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to