The broadcasted table can't seem to be resued across multiple actions.
e.g.
val small_df_bc = broadcast(small_df)
big_df1.join(small_df_bc, Seq("id")).write.parquet("/test1")
big_df2.join(small_df_bc, Seq("id")).write.parquet("/test2")
we can tell the small df has been distributed twice in the
Hi Tyson,
The broadcast variable should remain in-memory of the executors and reused
unless you unpersist, destroy it or it goes out of context.
Hope this helps.
Thanks
Ankur
On Wed, Jun 10, 2020 at 5:28 PM wrote:
> We have a case where data the is small enough to be broadcasted in joined
>
We have a case where data the is small enough to be broadcasted in joined
with multiple tables in a single plan. Looking at the physical plan, I do
not see anything that indicates if the broadcast data is done only once
i.e., the BroadcastExchange is being reused i.i.e., that data is not