foreachBatch gives you the micro-batch as a DataFrame, which is distributed. If you don't call collect on that DataFrame, it shouldn't have any memory implications on the Driver.
On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li <liruijin...@gmail.com> wrote: > Hi all, > > I’m curious on how foreachbatch works in spark structured streaming. So > since it is taking in a micro batch dataframe, that means the code in > foreachbatch is executing on spark driver? Does this mean for large > batches, you could potentially have OOM issues from collecting each > partition into the driver? > -- > Cheers, > Ruijing Li >