Hello spark developers,
Anyone can shed some lights on the life cycle of the broadcast variables?
Basically, if I have a broadcast variable defined in a loop and for each
iteration, I provide a different value.
// For example:
for(i< 1 to 10) {
val bc = sc.broadcast(i)
sc.parallelize(Seq(1,2,3)).map{id => val i = bc.value; (id,
i)}.toDF("id", "i").write.parquet("/dummy_output")
}
Do I need to active manage the broadcast variable in this case? I know this
example is not real but please imagine this broadcast variable can hold an
array of 1M Long.
Regards,
Jerry
On Sun, Aug 21, 2016 at 1:07 PM, Jerry Lam <[email protected]> wrote:
> Hello spark developers,
>
> Can someone explain to me what is the lifecycle of a broadcast variable?
> When a broadcast variable will be garbage-collected at the driver-side and
> at the executor-side? Does a spark application need to actively manage the
> broadcast variables to ensure that it will not run in OOM?
>
> Best Regards,
>
> Jerry
>