[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447545#comment-16447545 ]
Tamilselvan Veeramani commented on SPARK-21476: ----------------------------------------------- I am facing the same issue - high task deserialization with RD model when calling model.transform(dataset).select("probability"); since only predict(features.asInstanceOf[Vector]) implemented in transformImpl method of RandomForestClassificationModel and still other methods predictRaw(features.asInstanceOf[FeaturesType]) & predictProbability(features.asInstanceOf[FeaturesType]) are not implemented and hence Spark Structured Streaming with RF model continue to be big challenge with high task deserialization time. any plan to implement these method near soon ? thanks > RandomForest classification model not using broadcast in transform > ------------------------------------------------------------------ > > Key: SPARK-21476 > URL: https://issues.apache.org/jira/browse/SPARK-21476 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.2.0 > Reporter: Saurabh Agrawal > Priority: Minor > > I notice significant task deserialization latency while running prediction > with pipelines using RandomForestClassificationModel. While digging into the > source, found that the transform method in RandomForestClassificationModel > binds to its parent ProbabilisticClassificationModel and the only concrete > definition that RandomForestClassificationModel provides and which is > actually used in transform is that of predictRaw. Broadcasting is not being > used in predictRaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org