[ https://issues.apache.org/jira/browse/SPARK-23704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler updated SPARK-23704: --------------------------------- Component/s: PySpark > PySpark access of individual trees in random forest is slow > ----------------------------------------------------------- > > Key: SPARK-23704 > URL: https://issues.apache.org/jira/browse/SPARK-23704 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark > Affects Versions: 2.2.1 > Environment: PySpark 2.2.1 / Windows 10 > Reporter: Julian King > Priority: Minor > > Making predictions from a randomForestClassifier PySpark is much faster than > making predictions from an individual tree contained within the .trees > attribute. > In fact, the model.transform call without an action is more than 10x slower > for an individual tree vs the model.transform call for the random forest > model. > See > [https://stackoverflow.com/questions/49297470/slow-individual-tree-access-for-random-forest-in-pyspark] > for example with timing. > Ideally: > * Getting a prediction from a single tree should be comparable to or faster > than getting predictions from the whole tree > * Getting all the predictions from all the individual trees should be > comparable in speed to getting the predictions from the random forest > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org