[ 
https://issues.apache.org/jira/browse/SPARK-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116532#comment-14116532
 ] 

Apache Spark commented on SPARK-3327:
-------------------------------------

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/2217

> Make broadcasted value mutable for caching useful information
> -------------------------------------------------------------
>
>                 Key: SPARK-3327
>                 URL: https://issues.apache.org/jira/browse/SPARK-3327
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Liang-Chi Hsieh
>
> When implementing some algorithms, it is helpful that we can cache some 
> useful information for using later.
> Specifically, we would like to performa operation "A" on each partition of 
> data. Some variables are updated. Then we want to run operation "B" on the 
> data too. "B" operation uses the variables updated by operation "A".
> One of the examples is the Liblinear on Spark from Dr. Lin. They discuss the 
> problem in Section IV.D of the paper "Large-scale Logistic Regression and 
> Linear Support Vector Machines Using Spark."
> Currently broadcasted variables can satisfy partial need for that. We can 
> broadcast variables to reduce communication costs. However, because 
> broadcasted variables can not be modified, it doesn't help solve the problem 
> and we maybe need to collect updated variables back to master and broadcast 
> them again before conducting next data operation.
> I would like to add an interface to broadcasted variables to make them 
> mutable so later data operations can use them again.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to