Re: MLlib training time question

2015-12-06 Thread Haoyue Wang
Thanks Yanbo! I check the Spark UI, and found that in Exp 1), there are 52 jobs and 99 stages, in Exp 2), there are 105 jobs and 206 stages. The time spent on each jobs are 3s-4s, on each stages are 1-2s. That's why the Exp 2) take 2x times than Exp 1). And I also found that in Exp 2), the

Re: MLlib training time question

2015-12-05 Thread Yanbo Liang
Hi Haoyue, Could you find the time spent on each stage of the LinearRegression model training at the Spark UI? It can tell us which stage is the most time-consuming and help us to analyze the cause. Yanbo 2015-12-05 15:14 GMT+08:00 Haoyue Wang : > Hi all, > I'm doing some

MLlib training time question

2015-12-04 Thread Haoyue Wang
Hi all, I'm doing some experiment with Spark MLlib (version 1.5.0). I train LogisticRegressionModel on a 2.06GB dataset (# of data: 2396130, # of features: 3231961, # of classes: 2, format: LibSVM). I deployed Spark to a 4 nodes cluster, each node's spec: CPU: Intel(R) Xeon(R) CPU E5-2650 0 @