I am currently building a spark structured streaming application where I am doing a batch-stream join. And the source for the batch data gets updated periodically.
So, I am planning to do a persist/unpersist of that batch data periodically. Below is a sample code which I am using to persist and unpersist the batch data. Flow: -> Read the batch data -> persist the batch data -> For every one hour, unpersist the data and read the batch data and persist it again. But, I am not seeing the batch data getting refreshed for every hour. Code: var batchDF = handler.readBatchDF(sparkSession) batchDF.persist(StorageLevel.MEMORY_AND_DISK) var refreshedTime: Instant = Instant.now() if (Duration.between(refreshedTime, Instant.now()).getSeconds > refreshTime) { refreshedTime = Instant.now() batchDF.unpersist(false) batchDF = handler.readBatchDF(sparkSession) .persist(StorageLevel.MEMORY_AND_DISK) } Is there any better way to achieve this scenario in spark structured streaming jobs ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org