Re: ForEachBatch collecting batch to driver

2020-03-11 Thread Burak Yavuz
foreachBatch gives you the micro-batch as a DataFrame, which is
distributed. If you don't call collect on that DataFrame, it shouldn't have
any memory implications on the Driver.

On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li  wrote:

> Hi all,
>
> I’m curious on how foreachbatch works in spark structured streaming. So
> since it is taking in a micro batch dataframe, that means the code in
> foreachbatch is executing on spark driver? Does this mean for large
> batches, you could potentially have OOM issues from collecting each
> partition into the driver?
> --
> Cheers,
> Ruijing Li
>


ForEachBatch collecting batch to driver

2020-03-10 Thread Ruijing Li
Hi all,

I’m curious on how foreachbatch works in spark structured streaming. So
since it is taking in a micro batch dataframe, that means the code in
foreachbatch is executing on spark driver? Does this mean for large
batches, you could potentially have OOM issues from collecting each
partition into the driver?
-- 
Cheers,
Ruijing Li