You can use persist() or cache() operation on DataFrame.
On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng wrote:
> Hi all,
>
> I have a scenario like this:
>
> val df = dataframe.map().filter()
> // agg 1
> val query1 = df.sum.writeStream.start
> // agg 2
> val query2 =
and do count.
From: Shu Li Zheng <nezhazh...@gmail.com>
Date: Tuesday, December 26, 2017 at 5:32 AM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: [Structured Streaming] Reuse computation result
Hi all,
I have a scenario like this:
val df = dataframe.map().filter
Hi all,
I have a scenario like this:
val df = dataframe.map().filter()
// agg 1
val query1 = df.sum.writeStream.start
// agg 2
val query2 = df.count.writeStream.start
With spark streaming, we can apply persist() on rdd to reuse the df computation
result, when we call persist() after filter()