RE: FW: Email to Spark Org please

2021-04-01 Thread Williams, David (Risk Value Stream)
. On Fri, Mar 26, 2021 at 8:43 AM Williams, David (Risk Value Stream) mailto:david.willi...@lloydsbanking.com.invalid>> wrote: Classification: Public Thanks again Sean. We did try increasing the partitions but to no avail. Maybe it's because of the low dataset volumes as you say so the ov

RE: FW: Email to Spark Org please

2021-03-26 Thread Williams, David (Risk Value Stream)
get that working in distributed, will we get benefits similar to spark ML? Best Regards, Dave Williams From: Sean Owen Sent: 26 March 2021 13:20 To: Williams, David (Risk Value Stream) Cc: user@spark.apache.org Subject: Re: FW: Email to Spark Org please -- This email has reached the Bank via

RE: FW: Email to Spark Org please

2021-03-26 Thread Williams, David (Risk Value Stream)
d suspect that you are using just 1 partition for such a small data set, and get no parallelism from Spark. repartition your input to many more partitions, but, it's unlikely to get much faster than in-core sklearn for this task. On Thu, Mar 25, 2021 at 11:39 AM Williams, David (Risk Value Stream)

FW: Email to Spark Org please

2021-03-25 Thread Williams, David (Risk Value Stream)
Classification: Public Hi Team, We are trying to utilize ML Gradient Boosting Tree Classification algorithm and found the performance of the algorithm is very poor during training. We would like to see we can improve the performance timings since, it is taking 2 days for training for a