A cached DataFrame isn't supposed to change, by definition.
You can re-read each time or consider setting up a streaming source on
the table which provides a result that updates as new data comes in.

On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos <tomas.barta...@gmail.com> wrote:
>
> Hello,
>
> I have a cached dataframe:
>
> spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.cache
>
> I would like to access the "live" data for this data frame without deleting 
> the cache (using unpersist()). Whatever I do I always get the cached data on 
> subsequent queries. Even adding new column to the query doesn't help:
>
> spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.withColumn("dummy",
>  lit("dummy"))
>
>
> I'm able to workaround this using cached sql view, but I couldn't find a pure 
> dataFrame solution.
>
> Thank you,
> Tomas

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to