Spark is overkill for this problem; use sklearn.
But I'd suspect that you are using just 1 partition for such a small data
set, and get no parallelism from Spark.
repartition your input to many more partitions, but, it's unlikely to get
much faster than in-core sklearn for this task.

On Thu, Mar 25, 2021 at 11:39 AM Williams, David (Risk Value Stream)
<david.willi...@lloydsbanking.com.invalid> wrote:

> Classification: Public
>
>
>
> Hi Team,
>
>
>
> We are trying to utilize ML Gradient Boosting Tree Classification
> algorithm and found the performance of the algorithm is very poor during
> training.
>
>
>
> We would like to see we can improve the performance timings since, it is
> taking 2 days for training for a smaller dataset.
>
>
>
> Our dataset size is 40000. Number of features used for training is 564.
>
>
>
> The same dataset when we use in Sklearn python training is completed in 3
> hours but when used ML Gradient Boosting it is taking 2 days.
>
>
>
> We tried increasing number of executors, executor cores, driver memory etc
> but couldn’t see any improvements.
>
>
>
> The following are the parameters used for training.
>
>
>
> gbt = GBTClassifier(featuresCol='features', labelCol='bad_flag',
> predictionCol='prediction', maxDepth=11,  maxIter=10000, stepSize=0.01,
> subsamplingRate=0.5, minInstancesPerNode=110)
>
>
>
> If you could help us with any suggestions to tune this,  that will be
> really helpful
>
>
>
> Many thanks,
>
> Dave Williams
>
>
>
> Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
> Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
>
> Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
> Registered in England and Wales no. 2065. Telephone 0207626 1500.
>
> Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
> Registered in Scotland no. SC327000. Telephone: 03457 801 801.
>
> Lloyds Bank Corporate Markets plc. Registered office: 25 Gresham Street,
> London EC2V 7HN. Registered in England and Wales no. 10399850.
>
> Scottish Widows Schroder Personal Wealth Limited. Registered Office: 25
> Gresham Street, London EC2V 7HN. Registered in England and Wales no.
> 11722983.
>
> Lloyds Bank plc, Bank of Scotland plc and Lloyds Bank Corporate Markets
> plc are authorised by the Prudential Regulation Authority and regulated by
> the Financial Conduct Authority and Prudential Regulation Authority.
>
> Scottish Widows Schroder Personal Wealth Limited is authorised and
> regulated by the Financial Conduct Authority.
>
> Lloyds Bank Corporate Markets Wertpapierhandelsbank GmbH is a wholly-owned
> subsidiary of Lloyds Bank Corporate Markets plc. Lloyds Bank Corporate
> Markets Wertpapierhandelsbank GmbH has its registered office at
> Thurn-und-Taxis Platz 6, 60313 Frankfurt, Germany. The company is
> registered with the Amtsgericht Frankfurt am Main, HRB 111650. Lloyds Bank
> Corporate Markets Wertpapierhandelsbank GmbH is supervised by the
> Bundesanstalt für Finanzdienstleistungsaufsicht.
>
> Halifax is a division of Bank of Scotland plc.
>
> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
> Scotland no. SC218813.
>
> This e-mail (including any attachments) is private and confidential and
> may contain privileged material. If you have received this e-mail in error,
> please notify the sender and delete it (including any attachments)
> immediately. You must not copy, distribute, disclose or use any of the
> information in it or any attachments. Telephone calls may be monitored or
> recorded.
>

Reply via email to