A cached DataFrame isn't supposed to change, by definition. You can re-read each time or consider setting up a streaming source on the table which provides a result that updates as new data comes in.
On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos <tomas.barta...@gmail.com> wrote: > > Hello, > > I have a cached dataframe: > > spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.cache > > I would like to access the "live" data for this data frame without deleting > the cache (using unpersist()). Whatever I do I always get the cached data on > subsequent queries. Even adding new column to the query doesn't help: > > spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.withColumn("dummy", > lit("dummy")) > > > I'm able to workaround this using cached sql view, but I couldn't find a pure > dataFrame solution. > > Thank you, > Tomas --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org