[ https://issues.apache.org/jira/browse/SPARK-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063040#comment-14063040 ]
Xiangrui Meng commented on SPARK-2361: -------------------------------------- PR that uses broadcast for both training and prediction: https://github.com/apache/spark/pull/1427 > Decide whether to broadcast or serialize the weights directly in MLlib > algorithms > --------------------------------------------------------------------------------- > > Key: SPARK-2361 > URL: https://issues.apache.org/jira/browse/SPARK-2361 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Xiangrui Meng > > In the current implementation, MLlib serializes weights directly into > closure. This is okay for small feature dimension, but not efficient for > feature dimensions beyond 1M. Especially the default akka.frameSize is 10m. > We should use broadcast when the size of the serialized task is going to be > large. -- This message was sent by Atlassian JIRA (v6.2#6252)