You can use persist() or cache() operation on DataFrame.

On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng <nezhazh...@gmail.com> wrote:

> Hi all,
>
> I have a scenario like this:
>
> val df = dataframe.map().filter()
> // agg 1
> val query1 = df.sum.writeStream.start
> // agg 2
> val query2 = df.count.writeStream.start
>
> With spark streaming, we can apply persist() on rdd to reuse the df
> computation result, when we call persist() after filter() map().filter()
> operator only run once.
> With SS, we can’t apply persist() direct on dataframe. query1 and query2
> will not reuse result after filter. map/filter run twice. So is there a way
> to solve this.
>
> Regards,
>
> Shu li Zheng
>
>

Reply via email to