.
On Fri, Mar 26, 2021 at 8:43 AM Williams, David (Risk Value Stream)
mailto:david.willi...@lloydsbanking.com.invalid>>
wrote:
Classification: Public
Thanks again Sean.
We did try increasing the partitions but to no avail. Maybe it's because of
the low dataset volumes as you say so the ov
get that working in distributed, will we get
benefits similar to spark ML?
Best Regards,
Dave Williams
From: Sean Owen
Sent: 26 March 2021 13:20
To: Williams, David (Risk Value Stream)
Cc: user@spark.apache.org
Subject: Re: FW: Email to Spark Org please
-- This email has reached the Bank via
d suspect that you are using just 1 partition for such a small data set,
and get no parallelism from Spark.
repartition your input to many more partitions, but, it's unlikely to get much
faster than in-core sklearn for this task.
On Thu, Mar 25, 2021 at 11:39 AM Williams, David (Risk Value Stream)
Classification: Public
Hi Team,
We are trying to utilize ML Gradient Boosting Tree Classification algorithm and
found the performance of the algorithm is very poor during training.
We would like to see we can improve the performance timings since, it is taking
2 days for training for a