Re: WARN from Similarity Calculation

2015-02-18 Thread Debasish Das
I am still debugging it but I believe if m% of users have unusually large columns and the RDD partitioner on RowMatrix is hashPartitioner then due to the basic algorithm without sampling, some partitions can cause unusually large number of keys... If my debug shows that I will add a custom

Re: WARN from Similarity Calculation

2015-02-17 Thread Xiangrui Meng
It may be caused by GC pause. Did you check the GC time in the Spark UI? -Xiangrui On Sun, Feb 15, 2015 at 8:10 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am sometimes getting WARN from running Similarity calculation: 15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing