Hi Satyajit, Have you tried to adjust a higher threshold for columnSimilarities to lower the computation cost?
BTW, can you also comment out most of other codes and just run columnSimilarities and do a simple computation like counting for the entries of returned CoordinateMatrix? So we can make sure the problem is exactly at columnSimilarities? E.g, val exact = mat.columnSimilarities(0.5) val exactCount = exact.entries.count ----- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Document-Similarity-Spark-Mllib-tp20196p20219.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org