[ 
https://issues.apache.org/jira/browse/SPARK-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943143#comment-14943143
 ] 

Kåre Blakstad commented on SPARK-6404:
--------------------------------------

I do believe there's an issue with this approach. The first one being one must 
broadcast at the specified batch interval. I would rather define this interval 
myself for each broadcast, since it might be big database or file reads, which 
is not necessary every micro batch. Also, if you want to reuse some data for 
different broadcast, eg. do some transformations over it before it's 
broadcasted, this would be much harder, due to the evaluation of the expression 
being done local to the RDD transformation. 

Today I solve this by using a mutable broadcast var which is updated with an 
Akka scheduler after the previous broadcast is unpersisted, but I'm not sure 
that Spark internals approve of this as the best way.

> Call broadcast() in each interval for spark streaming programs.
> ---------------------------------------------------------------
>
>                 Key: SPARK-6404
>                 URL: https://issues.apache.org/jira/browse/SPARK-6404
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Yifan Wang
>
> If I understand it correctly, Spark’s broadcast() function will be called 
> only once at the beginning of the batch. For streaming applications that need 
> to run for 24/7, it is often needed to update variables that shared by 
> broadcast() dynamically. It would be ideal if broadcast() could be called at 
> the beginning of each interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to