Re: Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread Fabian Hueske
Hi,

No, I don't think this behavior is weird.

If we would retract when idle state is discarded, the result would no
longer correspond to the query.
So we would produce incorrect results even if the removed state would never
by used again.

If you want to have consistent, exact results you need to either provide
the necessary resources to hold the complete state or configure idle state
retention in a way that the deleted state is not needed again.
Another solution that is not supported yet would be to change the semantics
of your query and move records that are older than a certain threshold.
In that case, the query would only operate on tail of the stream, e.g., the
last day or week.

Best, Fabian



2018-08-21 12:03 GMT+02:00 徐涛 :

> Hi Fabian,
> Is the behavior a bit weird? Because it leads to data inconsistency.
>
> Best,
> Henry
>
>
> 在 2018年8月21日,下午5:14,Fabian Hueske  写道:
>
> Hi,
>
> In the given example, article_id 123 will always remain in the external
> storage. The state is removed and hence it cannot be retracted anymore.
> Once the state was removed and the count reaches 10, a second record for
> article_id 123 will be emitted to the data store.
>
> As soon as you enable state retention and state is needed that was
> removed, the query result can become inconsistent.
>
> Best, Fabian
>
> 2018-08-21 10:52 GMT+02:00 徐涛 :
>
>> Hi Fabian,
>> SELECT article_id FROM praise GROUP BY article_id having count(1)>=10
>> If article_id 123 has 100 praises and remains its state in the dynamic
>> table ,and when the time passed, its state is removed, but later the
>> article_id 123 has never reached to 10 praises.
>> How can other program know that the state is been removed? Because the
>> sink currently has the praises count stored as 100, it is not consistent as
>> the dynamic table.
>>
>> Best,
>> Henry
>>
>>
>> 在 2018年8月21日,下午4:16,Fabian Hueske  写道:
>>
>> Hi,
>>
>> No, it won't. I will simply remove state that has not been accessed for
>> the configured time but not change the result.
>> For example, if you have a GROUP BY aggregation and the state for a
>> grouping key is removed, the operator will start a new aggregation if a
>> record with the removed grouping key arrives.
>>
>> Idle state retention is not meant to affect the semantics of a query.
>> The semantics of updating the result should be defined in the query,
>> e.g., with a WHERE clause that removes all records that are older than 1
>> day (note, this is not supported yet).
>>
>> Best, Fabian
>>
>> 2018-08-21 10:04 GMT+02:00 徐涛 :
>>
>>> Hi All,
>>> Will idle state retention trigger retract in dynamic table?
>>>
>>> Best,
>>> Henry
>>>
>>
>>
>>
>
>


Re: Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread Xingcan Cui
Hi Henry,

Idle state retention is just making a trade-off between the accuracy and the 
storage consumption. It can meet part of the calculation requirements in the 
stream environment, but not all. For instance, in your use case, if there 
exists a TTL for each article, their praise states can be safely removed after 
a period of time. Otherwise, inconsistencies are unavoidable.

We admit that there should be other state retention mechanisms which can be 
applied in different scenarios. However, for now, setting a larger retention 
time or simply omitting this config seems to be the only choices.

Best,
Xingcan

> On Aug 21, 2018, at 6:03 PM, 徐涛  wrote:
> 
> Hi Fabian,
>   Is the behavior a bit weird? Because it leads to data inconsistency.
> 
> Best,
> Henry
> 
>> 在 2018年8月21日,下午5:14,Fabian Hueske > <mailto:fhue...@gmail.com>> 写道:
>> 
>> Hi,
>> 
>> In the given example, article_id 123 will always remain in the external 
>> storage. The state is removed and hence it cannot be retracted anymore.
>> Once the state was removed and the count reaches 10, a second record for 
>> article_id 123 will be emitted to the data store.
>> 
>> As soon as you enable state retention and state is needed that was removed, 
>> the query result can become inconsistent.
>> 
>> Best, Fabian
>> 
>> 2018-08-21 10:52 GMT+02:00 徐涛 > <mailto:happydexu...@gmail.com>>:
>> Hi Fabian,
>>  SELECT article_id FROM praise GROUP BY article_id having count(1)>=10
>>  If article_id 123 has 100 praises and remains its state in the dynamic 
>> table ,and when the time passed, its state is removed, but later the 
>> article_id 123 has never reached to 10 praises.
>>  How can other program know that the state is been removed? Because the 
>> sink currently has the praises count stored as 100, it is not consistent as 
>> the dynamic table.
>> 
>> Best, 
>> Henry
>> 
>> 
>>> 在 2018年8月21日,下午4:16,Fabian Hueske >> <mailto:fhue...@gmail.com>> 写道:
>>> 
>>> Hi,
>>> 
>>> No, it won't. I will simply remove state that has not been accessed for the 
>>> configured time but not change the result.
>>> For example, if you have a GROUP BY aggregation and the state for a 
>>> grouping key is removed, the operator will start a new aggregation if a 
>>> record with the removed grouping key arrives.
>>> 
>>> Idle state retention is not meant to affect the semantics of a query. 
>>> The semantics of updating the result should be defined in the query, e.g., 
>>> with a WHERE clause that removes all records that are older than 1 day 
>>> (note, this is not supported yet).
>>> 
>>> Best, Fabian
>>> 
>>> 2018-08-21 10:04 GMT+02:00 徐涛 >> <mailto:happydexu...@gmail.com>>:
>>> Hi All,
>>> Will idle state retention trigger retract in dynamic table?
>>> 
>>> Best,
>>> Henry
>>> 
>> 
>> 
> 



Re: Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread 徐涛
Hi Fabian,
Is the behavior a bit weird? Because it leads to data inconsistency.

Best,
Henry

> 在 2018年8月21日,下午5:14,Fabian Hueske  写道:
> 
> Hi,
> 
> In the given example, article_id 123 will always remain in the external 
> storage. The state is removed and hence it cannot be retracted anymore.
> Once the state was removed and the count reaches 10, a second record for 
> article_id 123 will be emitted to the data store.
> 
> As soon as you enable state retention and state is needed that was removed, 
> the query result can become inconsistent.
> 
> Best, Fabian
> 
> 2018-08-21 10:52 GMT+02:00 徐涛  <mailto:happydexu...@gmail.com>>:
> Hi Fabian,
>   SELECT article_id FROM praise GROUP BY article_id having count(1)>=10
>   If article_id 123 has 100 praises and remains its state in the dynamic 
> table ,and when the time passed, its state is removed, but later the 
> article_id 123 has never reached to 10 praises.
>   How can other program know that the state is been removed? Because the 
> sink currently has the praises count stored as 100, it is not consistent as 
> the dynamic table.
> 
> Best, 
> Henry
> 
> 
>> 在 2018年8月21日,下午4:16,Fabian Hueske > <mailto:fhue...@gmail.com>> 写道:
>> 
>> Hi,
>> 
>> No, it won't. I will simply remove state that has not been accessed for the 
>> configured time but not change the result.
>> For example, if you have a GROUP BY aggregation and the state for a grouping 
>> key is removed, the operator will start a new aggregation if a record with 
>> the removed grouping key arrives.
>> 
>> Idle state retention is not meant to affect the semantics of a query. 
>> The semantics of updating the result should be defined in the query, e.g., 
>> with a WHERE clause that removes all records that are older than 1 day 
>> (note, this is not supported yet).
>> 
>> Best, Fabian
>> 
>> 2018-08-21 10:04 GMT+02:00 徐涛 > <mailto:happydexu...@gmail.com>>:
>> Hi All,
>> Will idle state retention trigger retract in dynamic table?
>> 
>> Best,
>> Henry
>> 
> 
> 



Re: Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread Fabian Hueske
Hi,

In the given example, article_id 123 will always remain in the external
storage. The state is removed and hence it cannot be retracted anymore.
Once the state was removed and the count reaches 10, a second record for
article_id 123 will be emitted to the data store.

As soon as you enable state retention and state is needed that was removed,
the query result can become inconsistent.

Best, Fabian

2018-08-21 10:52 GMT+02:00 徐涛 :

> Hi Fabian,
> SELECT article_id FROM praise GROUP BY article_id having count(1)>=10
> If article_id 123 has 100 praises and remains its state in the dynamic
> table ,and when the time passed, its state is removed, but later the
> article_id 123 has never reached to 10 praises.
> How can other program know that the state is been removed? Because the
> sink currently has the praises count stored as 100, it is not consistent as
> the dynamic table.
>
> Best,
> Henry
>
>
> 在 2018年8月21日,下午4:16,Fabian Hueske  写道:
>
> Hi,
>
> No, it won't. I will simply remove state that has not been accessed for
> the configured time but not change the result.
> For example, if you have a GROUP BY aggregation and the state for a
> grouping key is removed, the operator will start a new aggregation if a
> record with the removed grouping key arrives.
>
> Idle state retention is not meant to affect the semantics of a query.
> The semantics of updating the result should be defined in the query, e.g.,
> with a WHERE clause that removes all records that are older than 1 day
> (note, this is not supported yet).
>
> Best, Fabian
>
> 2018-08-21 10:04 GMT+02:00 徐涛 :
>
>> Hi All,
>> Will idle state retention trigger retract in dynamic table?
>>
>> Best,
>> Henry
>>
>
>
>


Re: Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread 徐涛
Hi Fabian,
SELECT article_id FROM praise GROUP BY article_id having count(1)>=10
If article_id 123 has 100 praises and remains its state in the dynamic 
table ,and when the time passed, its state is removed, but later the article_id 
123 has never reached to 10 praises.
How can other program know that the state is been removed? Because the 
sink currently has the praises count stored as 100, it is not consistent as the 
dynamic table.

Best, 
Henry


> 在 2018年8月21日,下午4:16,Fabian Hueske  写道:
> 
> Hi,
> 
> No, it won't. I will simply remove state that has not been accessed for the 
> configured time but not change the result.
> For example, if you have a GROUP BY aggregation and the state for a grouping 
> key is removed, the operator will start a new aggregation if a record with 
> the removed grouping key arrives.
> 
> Idle state retention is not meant to affect the semantics of a query. 
> The semantics of updating the result should be defined in the query, e.g., 
> with a WHERE clause that removes all records that are older than 1 day (note, 
> this is not supported yet).
> 
> Best, Fabian
> 
> 2018-08-21 10:04 GMT+02:00 徐涛  <mailto:happydexu...@gmail.com>>:
> Hi All,
> Will idle state retention trigger retract in dynamic table?
> 
> Best,
> Henry
> 



Re: Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread Fabian Hueske
Hi,

No, it won't. I will simply remove state that has not been accessed for the
configured time but not change the result.
For example, if you have a GROUP BY aggregation and the state for a
grouping key is removed, the operator will start a new aggregation if a
record with the removed grouping key arrives.

Idle state retention is not meant to affect the semantics of a query.
The semantics of updating the result should be defined in the query, e.g.,
with a WHERE clause that removes all records that are older than 1 day
(note, this is not supported yet).

Best, Fabian

2018-08-21 10:04 GMT+02:00 徐涛 :

> Hi All,
> Will idle state retention trigger retract in dynamic table?
>
> Best,
> Henry
>


Will idle state retention trigger retract in dynamic table?

2018-08-21 Thread 徐涛
Hi All,
Will idle state retention trigger retract in dynamic table?

Best,
Henry