Hi All, We have a Static DataFrame with as follows. -------------- id|time_stamp| -------------- |1|1540527851| |2|1540525602| |3|1530529187| |4|1520529185| |5|1510529182| |6|1578945709| --------------
We also have live stream of events, a Streaming DataFrame which contains id and updated time_stamp. The first batch of event is as follows - -------------- id|time_stamp| -------------- |1|1540527888| |2|1540525999| |3|1530529784| -------------- Now after every batch, we want to update the Static DataFrame with the updated values of Streaming Dataframe like follows - Static DF after first batch : -------------- id|time_stamp| -------------- |1|1540527888| |2|1540525999| |3|1530529784| |4|1520529185| |5|1510529182| |6|1578945709| -------------- Below is the code snippet of our pyspark code with spark 2.4.4 static_df = spark.read.schema(schemaName).json(fileName) streaming_df = spark.readStream(....) new_reference_data = update_reference_df(streaming_df, static_df) def update_reference_df(df, static_df): query: StreamingQuery = df \ .writeStream \ .outputMode("append") \ .foreachBatch(lambda batch_df, batchId: update_static_df(batch_df, static_df)) \ .start() return query def update_static_df(batch_df, static_df): static_df: DataFrame = static_df.union(batch_df.join(static_df, (batch_df.id== static_df.id) "left_anti")) return static_df Question: I want to know how will the static_df get refreshed with the new values from the data processed via foreachBatch. As I know foreachBatch returns nothing (VOID). I need to use the refreshed values from static_df in further processing. Appreciate for your help. Thanks in advance ! Debu