Re: ForEachBatch collecting batch to driver

2020-03-11 Thread Burak Yavuz
foreachBatch gives you the micro-batch as a DataFrame, which is distributed. If you don't call collect on that DataFrame, it shouldn't have any memory implications on the Driver. On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li wrote: > Hi all, > > I’m curious on how foreachbatch works in spark

ForEachBatch collecting batch to driver

2020-03-10 Thread Ruijing Li
Hi all, I’m curious on how foreachbatch works in spark structured streaming. So since it is taking in a micro batch dataframe, that means the code in foreachbatch is executing on spark driver? Does this mean for large batches, you could potentially have OOM issues from collecting each partition