[ 
https://issues.apache.org/jira/browse/SPARK-23704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-23704:
---------------------------------
    Component/s: PySpark

> PySpark access of individual trees in random forest is slow
> -----------------------------------------------------------
>
>                 Key: SPARK-23704
>                 URL: https://issues.apache.org/jira/browse/SPARK-23704
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 2.2.1
>         Environment: PySpark 2.2.1 / Windows 10
>            Reporter: Julian King
>            Priority: Minor
>
> Making predictions from a randomForestClassifier PySpark is much faster than 
> making predictions from an individual tree contained within the .trees 
> attribute. 
> In fact, the model.transform call without an action is more than 10x slower 
> for an individual tree vs the model.transform call for the random forest 
> model.
> See 
> [https://stackoverflow.com/questions/49297470/slow-individual-tree-access-for-random-forest-in-pyspark]
>  for example with timing.
> Ideally:
>  * Getting a prediction from a single tree should be comparable to or faster 
> than getting predictions from the whole tree
>  * Getting all the predictions from all the individual trees should be 
> comparable in speed to getting the predictions from the random forest
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to