Thanks Yanbo!
I check the Spark UI, and found that in Exp 1), there are 52 jobs and 99
stages, in Exp 2), there are 105 jobs and 206 stages. The time spent on
each jobs are 3s-4s, on each stages are 1-2s. That's why the Exp 2) take 2x
times than Exp 1).
And I also found that in Exp 2), the
Hi Haoyue,
Could you find the time spent on each stage of the LinearRegression model
training at the Spark UI?
It can tell us which stage is the most time-consuming and help us to
analyze the cause.
Yanbo
2015-12-05 15:14 GMT+08:00 Haoyue Wang :
> Hi all,
> I'm doing some
Hi all,
I'm doing some experiment with Spark MLlib (version 1.5.0). I train
LogisticRegressionModel on a 2.06GB dataset (# of data: 2396130, # of
features: 3231961, # of classes: 2, format: LibSVM). I deployed Spark to a
4 nodes cluster, each node's spec: CPU: Intel(R) Xeon(R) CPU E5-2650 0 @