Re: [Structured Streaming] Reuse computation result

2018-02-01 Thread Sandip Mehta
You can use persist() or cache() operation on DataFrame. On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng wrote: > Hi all, > > I have a scenario like this: > > val df = dataframe.map().filter() > // agg 1 > val query1 = df.sum.writeStream.start > // agg 2 > val query2 =

Re: [Structured Streaming] Reuse computation result

2017-12-29 Thread Lalwani, Jayesh
and do count. From: Shu Li Zheng <nezhazh...@gmail.com> Date: Tuesday, December 26, 2017 at 5:32 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: [Structured Streaming] Reuse computation result Hi all, I have a scenario like this: val df = dataframe.map().filter

[Structured Streaming] Reuse computation result

2017-12-26 Thread Shu Li Zheng
Hi all, I have a scenario like this: val df = dataframe.map().filter() // agg 1 val query1 = df.sum.writeStream.start // agg 2 val query2 = df.count.writeStream.start With spark streaming, we can apply persist() on rdd to reuse the df computation result, when we call persist() after filter()