[ https://issues.apache.org/jira/browse/SPARK-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116532#comment-14116532 ]
Apache Spark commented on SPARK-3327: ------------------------------------- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/2217 > Make broadcasted value mutable for caching useful information > ------------------------------------------------------------- > > Key: SPARK-3327 > URL: https://issues.apache.org/jira/browse/SPARK-3327 > Project: Spark > Issue Type: New Feature > Reporter: Liang-Chi Hsieh > > When implementing some algorithms, it is helpful that we can cache some > useful information for using later. > Specifically, we would like to performa operation "A" on each partition of > data. Some variables are updated. Then we want to run operation "B" on the > data too. "B" operation uses the variables updated by operation "A". > One of the examples is the Liblinear on Spark from Dr. Lin. They discuss the > problem in Section IV.D of the paper "Large-scale Logistic Regression and > Linear Support Vector Machines Using Spark." > Currently broadcasted variables can satisfy partial need for that. We can > broadcast variables to reduce communication costs. However, because > broadcasted variables can not be modified, it doesn't help solve the problem > and we maybe need to collect updated variables back to master and broadcast > them again before conducting next data operation. > I would like to add an interface to broadcasted variables to make them > mutable so later data operations can use them again. > -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org