PySpark crashed because "remote RPC client disassociated"

2016-06-29 Thread jw.cmu
I am running my own PySpark application (solving matrix factorization using Gemulla's DSGD algorithm). The program seemed to work fine on smaller movielens dataset but failed on larger Netflix data. It too about 14 hours to complete two iterations and lost an executor (I used totally 8 executors

MLlib Collaborative Filtering failed to run with rank 1000

2014-10-03 Thread jw.cmu
I was able to run collaborative filtering with low rank numbers, like 20~160 on the netflix dataset, but it fails due to the following error when I set the rank to 1000: 14/10/03 03:27:36 WARN TaskSetManager: Loss was due to java.lang.IllegalArgumentException java.lang.IllegalArgumentException:

Re: MLlib Collaborative Filtering failed to run with rank 1000

2014-10-03 Thread jw.cmu
Thanks, Xiangrui. I didn't check the test error yet. I agree that rank 1000 might overfit for this particular dataset. Currently I'm just running some scalability tests - I'm trying to see how large the model can be scaled to given a fixed amount of hardware. -- View this message in context: