Yes you want to actively unpersist() or destroy() broadcast variables
when they're no longer needed. They can eventually be removed when the
reference on the driver is garbage collected, but you usually would
not want to rely on that.

On Mon, Aug 29, 2016 at 4:30 PM, Jerry Lam <chiling...@gmail.com> wrote:
> Hello spark developers,
>
> Anyone can shed some lights on the life cycle of the broadcast variables?
> Basically, if I have a broadcast variable defined in a loop and for each
> iteration, I provide a different value.
> // For example:
> for(i< 1 to 10) {
>     val bc = sc.broadcast(i)
>     sc.parallelize(Seq(1,2,3)).map{id => val i = bc.value; (id,
> i)}.toDF("id", "i").write.parquet("/dummy_output")
> }
>
> Do I need to active manage the broadcast variable in this case? I know this
> example is not real but please imagine this broadcast variable can hold an
> array of 1M Long.
>
> Regards,
>
> Jerry
>
>
>
> On Sun, Aug 21, 2016 at 1:07 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>
>> Hello spark developers,
>>
>> Can someone explain to me what is the lifecycle of a broadcast variable?
>> When a broadcast variable will be garbage-collected at the driver-side and
>> at the executor-side? Does a spark application need to actively manage the
>> broadcast variables to ensure that it will not run in OOM?
>>
>> Best Regards,
>>
>> Jerry
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to