foreachBatch gives you the micro-batch as a DataFrame, which is
distributed. If you don't call collect on that DataFrame, it shouldn't have
any memory implications on the Driver.

On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li <liruijin...@gmail.com> wrote:

> Hi all,
>
> I’m curious on how foreachbatch works in spark structured streaming. So
> since it is taking in a micro batch dataframe, that means the code in
> foreachbatch is executing on spark driver? Does this mean for large
> batches, you could potentially have OOM issues from collecting each
> partition into the driver?
> --
> Cheers,
> Ruijing Li
>

Reply via email to