[ https://issues.apache.org/jira/browse/SPARK-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943143#comment-14943143 ]
Kåre Blakstad commented on SPARK-6404: -------------------------------------- I do believe there's an issue with this approach. The first one being one must broadcast at the specified batch interval. I would rather define this interval myself for each broadcast, since it might be big database or file reads, which is not necessary every micro batch. Also, if you want to reuse some data for different broadcast, eg. do some transformations over it before it's broadcasted, this would be much harder, due to the evaluation of the expression being done local to the RDD transformation. Today I solve this by using a mutable broadcast var which is updated with an Akka scheduler after the previous broadcast is unpersisted, but I'm not sure that Spark internals approve of this as the best way. > Call broadcast() in each interval for spark streaming programs. > --------------------------------------------------------------- > > Key: SPARK-6404 > URL: https://issues.apache.org/jira/browse/SPARK-6404 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Yifan Wang > > If I understand it correctly, Spark’s broadcast() function will be called > only once at the beginning of the batch. For streaming applications that need > to run for 24/7, it is often needed to update variables that shared by > broadcast() dynamically. It would be ideal if broadcast() could be called at > the beginning of each interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org