Re: State migration for sql job

2021-06-09 Thread Yuval Itzchakov
As my company is also a heavy user of Flink SQL, the state migration story
is very important to us.

I as well believe that adding new fields should start to accumulate state
from the point in time of the change forward.

Is anyone actively working on this? Is there anyway to get involved?

On Tue, Jun 8, 2021, 17:33 aitozi  wrote:

> Thanks for JING & Kurt's reply. I think we prefer to choose the option (a)
> that will not take  the history data into account.
>
> IMO, if we want to process all the historical data, we have to store the
> original data, which may be a big overhead to backend. But if we just
> aggregate after the new added function, may just need a data format
> transfer. Besides, the business logic we met only need the new aggFunction
> accumulate with new data.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: State migration for sql job

2021-06-08 Thread aitozi
Thanks for JING & Kurt's reply. I think we prefer to choose the option (a)
that will not take  the history data into account. 

IMO, if we want to process all the historical data, we have to store the
original data, which may be a big overhead to backend. But if we just
aggregate after the new added function, may just need a data format
transfer. Besides, the business logic we met only need the new aggFunction
accumulate with new data.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: State migration for sql job

2021-06-08 Thread Kurt Young
What kind of expectation do you have after you add the "max(a)" aggregation:

a. Keep summing a and start to calculate max(a) after you added. In other
words, max(a) won't take the history data into account.
b. First process all the historical data to get a result of max(a), and
then start to compute sum(a) and max(a) together for the real-time data.

Best,
Kurt


On Tue, Jun 8, 2021 at 2:11 PM JING ZHANG  wrote:

> Hi aitozi,
> This is a popular demand that many users mentioned, which appears in user
> mail list for several times.
> Unfortunately, it is not supported by Flink SQL yet, maybe would be solved
> in the future. BTW, a few company try to solve the problem in some
> specified user cases on their internal Flink version[2].
> Currently, you may try use `State Processor API`[1] as temporary solution.
> 1. Do a savepoint
> 2. Generates updated the savepoint based on State Processor API
> 3. Recover from the new savepoint.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/libs/state_processor_api/
> [2] https://developer.aliyun.com/article/781455
>
> Best regards,
> JING ZHANG
>
> aitozi  于2021年6月8日周二 下午1:54写道:
>
>> When use flink sql, we encounter a big problem to deal with sql state
>> compatibility. Think we have a group agg sql like ```sql select sum(`a`)
>> from source_t group by `uid` ``` But if i want to add a new agg column to
>> ```sql select sum(`a`), max(`a`) from source_t group by `uid` ``` Then sql
>> state will not be compatible. Is there any on-going work/thoughts to
>> improve this situation?
>> --
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> 
>> at Nabble.com.
>>
>


Re: State migration for sql job

2021-06-07 Thread JING ZHANG
Hi aitozi,
This is a popular demand that many users mentioned, which appears in user
mail list for several times.
Unfortunately, it is not supported by Flink SQL yet, maybe would be solved
in the future. BTW, a few company try to solve the problem in some
specified user cases on their internal Flink version[2].
Currently, you may try use `State Processor API`[1] as temporary solution.
1. Do a savepoint
2. Generates updated the savepoint based on State Processor API
3. Recover from the new savepoint.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/libs/state_processor_api/
[2] https://developer.aliyun.com/article/781455

Best regards,
JING ZHANG

aitozi  于2021年6月8日周二 下午1:54写道:

> When use flink sql, we encounter a big problem to deal with sql state
> compatibility. Think we have a group agg sql like ```sql select sum(`a`)
> from source_t group by `uid` ``` But if i want to add a new agg column to
> ```sql select sum(`a`), max(`a`) from source_t group by `uid` ``` Then sql
> state will not be compatible. Is there any on-going work/thoughts to
> improve this situation?
> --
> Sent from the Apache Flink User Mailing List archive. mailing list archive
>  at
> Nabble.com.
>


State migration for sql job

2021-06-07 Thread aitozi
When use flink sql, we encounter a big problem to deal with sql state
compatibility.Think we have a group agg sql like ```sqlselect sum(`a`) from
source_t group by `uid But if i want to add a new agg column to
```sqlselect sum(`a`), max(`a`) from source_t group by `uidThen sql
state will not be compatible. Is there any on-going work/thoughts to improve
this situation?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/