Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Michael Armbrust
Yes

!

On Thu, Dec 1, 2016 at 12:57 PM, ayan guha  wrote:

> Thanks TD. Will it be available in pyspark too?
> On 1 Dec 2016 19:55, "Tathagata Das"  wrote:
>
>> In the meantime, if you are interested, you can read the design doc in
>> the corresponding JIRA - https://issues.apache.org/ji
>> ra/browse/SPARK-18124
>>
>> On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> That feature is coming in 2.1.0. We have added watermarking, that will
>>> track the event time of the data and accordingly close old windows, output
>>> its corresponding aggregate and then drop its corresponding state. But in
>>> that case, you will have to use append mode, and aggregated data of a
>>> particular window will be evicted only when the windows is closed. You will
>>> be able to control the threshold on how long to wait for late, out-of-order
>>> data before closing a window.
>>>
>>> We will be updated the docs soon to explain this.
>>>
>>> On Tue, Nov 29, 2016 at 8:30 PM, Xinyu Zhang  wrote:
>>>
 Hi

 I want to use window operations. However, if i don't remove any data,
 the "complete" table will become larger and larger as time goes on. So I
 want to remove some outdated data in the complete table that I would never
 use.
 Is there any method to meet my requirement?

 Thanks!





>>>
>>>
>>


Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread ayan guha
Thanks TD. Will it be available in pyspark too?
On 1 Dec 2016 19:55, "Tathagata Das"  wrote:

> In the meantime, if you are interested, you can read the design doc in the
> corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124
>
> On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> That feature is coming in 2.1.0. We have added watermarking, that will
>> track the event time of the data and accordingly close old windows, output
>> its corresponding aggregate and then drop its corresponding state. But in
>> that case, you will have to use append mode, and aggregated data of a
>> particular window will be evicted only when the windows is closed. You will
>> be able to control the threshold on how long to wait for late, out-of-order
>> data before closing a window.
>>
>> We will be updated the docs soon to explain this.
>>
>> On Tue, Nov 29, 2016 at 8:30 PM, Xinyu Zhang  wrote:
>>
>>> Hi
>>>
>>> I want to use window operations. However, if i don't remove any data,
>>> the "complete" table will become larger and larger as time goes on. So I
>>> want to remove some outdated data in the complete table that I would never
>>> use.
>>> Is there any method to meet my requirement?
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>
>>
>


Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Tathagata Das
In the meantime, if you are interested, you can read the design doc in the
corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124

On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das 
wrote:

> That feature is coming in 2.1.0. We have added watermarking, that will
> track the event time of the data and accordingly close old windows, output
> its corresponding aggregate and then drop its corresponding state. But in
> that case, you will have to use append mode, and aggregated data of a
> particular window will be evicted only when the windows is closed. You will
> be able to control the threshold on how long to wait for late, out-of-order
> data before closing a window.
>
> We will be updated the docs soon to explain this.
>
> On Tue, Nov 29, 2016 at 8:30 PM, Xinyu Zhang  wrote:
>
>> Hi
>>
>> I want to use window operations. However, if i don't remove any data, the
>> "complete" table will become larger and larger as time goes on. So I want
>> to remove some outdated data in the complete table that I would never use.
>> Is there any method to meet my requirement?
>>
>> Thanks!
>>
>>
>>
>>
>>
>
>


Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Tathagata Das
That feature is coming in 2.1.0. We have added watermarking, that will
track the event time of the data and accordingly close old windows, output
its corresponding aggregate and then drop its corresponding state. But in
that case, you will have to use append mode, and aggregated data of a
particular window will be evicted only when the windows is closed. You will
be able to control the threshold on how long to wait for late, out-of-order
data before closing a window.

We will be updated the docs soon to explain this.

On Tue, Nov 29, 2016 at 8:30 PM, Xinyu Zhang  wrote:

> Hi
>
> I want to use window operations. However, if i don't remove any data, the
> "complete" table will become larger and larger as time goes on. So I want
> to remove some outdated data in the complete table that I would never use.
> Is there any method to meet my requirement?
>
> Thanks!
>
>
>
>
>