Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Michael Armbrust
Yes ! On Thu, Dec 1, 2016 at 12:57 PM, ayan guha wrote: > Thanks TD. Will it be available in pyspark too? > On 1 Dec 2016 19:55, "Tathagata Das" wrote: > >> In

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread ayan guha
Thanks TD. Will it be available in pyspark too? On 1 Dec 2016 19:55, "Tathagata Das" wrote: > In the meantime, if you are interested, you can read the design doc in the > corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124 > > On Thu, Dec 1, 2016

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Tathagata Das
In the meantime, if you are interested, you can read the design doc in the corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124 On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das wrote: > That feature is coming in 2.1.0. We have added watermarking,

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Tathagata Das
That feature is coming in 2.1.0. We have added watermarking, that will track the event time of the data and accordingly close old windows, output its corresponding aggregate and then drop its corresponding state. But in that case, you will have to use append mode, and aggregated data of a

[structured streaming] How to remove outdated data when use Window Operations

2016-11-29 Thread Xinyu Zhang
Hi I want to use window operations. However, if i don't remove any data, the "complete" table will become larger and larger as time goes on. So I want to remove some outdated data in the complete table that I would never use. Is there any method to meet my requirement? Thanks!