You can use persist() or cache() operation on DataFrame. On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng <nezhazh...@gmail.com> wrote:
> Hi all, > > I have a scenario like this: > > val df = dataframe.map().filter() > // agg 1 > val query1 = df.sum.writeStream.start > // agg 2 > val query2 = df.count.writeStream.start > > With spark streaming, we can apply persist() on rdd to reuse the df > computation result, when we call persist() after filter() map().filter() > operator only run once. > With SS, we can’t apply persist() direct on dataframe. query1 and query2 > will not reuse result after filter. map/filter run twice. So is there a way > to solve this. > > Regards, > > Shu li Zheng > >