Thanks @Timo for the detailed reply.

Let’s continue the technical discussion on the former FLIP discussion 
thread[1]:  [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior. 

We can share our decision to user and user-zh on this thread once we have a 
final proposal.

Best,
Leonard

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-162-Consistent-Flink-SQL-time-function-behavior-tc48116.html

> 在 2021年1月22日,00:53,Timo Walther <twal...@apache.org> 写道:
> 
> Hi Leonard,
> 
> thanks for working on this topic. I agree that time handling is not easy in 
> Flink at the moment. We added new time data types (and some are still not 
> supported which even further complicates things like TIME(9)). We should 
> definitely improve this situation for users.
> 
> This is a pretty opinionated topic and it seems that the SQL standard is not 
> really deciding this but is at least supporting. So let me express my opinion 
> for the most important functions:
> 
> LOCALDATE / LOCALTIME / LOCALTIMESTAMP
> 
> --> uses session time zone, returns DATE/TIME/TIMESTAMP
> 
> I think those are the most obvious ones because the LOCAL indicates that the 
> locality should be materialized into the result and any time zone information 
> (coming from session config or data) is not important afterwards.
> 
> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
> 
> --> uses session time zone, returns DATE/TIME/TIMESTAMP
> 
> I'm very sceptical about this behavior. Almost all mature systems (Oracle, 
> Postgres) and new high quality systems (Presto, Snowflake) use a data type 
> with some degree of time zone information encoded. In a globalized world with 
> businesses spanning different regions, I think we should do this as well. 
> There should be a difference between CURRENT_TIMESTAMP and LOCALTIMESTAMP. 
> And users should be able to choose which behavior they prefer for their 
> pipeline.
> 
> If we would design this from scatch, I would suggest the following:
> 
> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / LOCALTIME 
> for materialized timestamp parts
> 
> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to materialize 
> all session time information into every record. It it the most generic data 
> type and allows to cast to all other timestamp data types. This generic 
> ability can be used for filter predicates as well either through implicit or 
> explicit casting.
> 
> PROCTIME/ROWTIME should be time functions based on a long value. Both 
> System.currentMillis() and our watermark system work on long values. Those 
> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation 
> should always happen based on UTC. We discussed it in a different thread, but 
> we should allow PROCTIME globally. People need a way to create instances of 
> TIMESTAMP WITH LOCAL TIME ZONE. This is not considered in the current design 
> doc. Many pipelines contain UTC timestamps and thus it should be easy to 
> create one. Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with 
> this type because we should remember that TIMESTAMP WITH LOCAL TIME ZONE 
> accepts all timestamp data types as casting target [1]. We could allow 
> TIMESTAMP WITH TIME ZONE in the future for ROWTIME.
> 
> In any case, windows should simply adapt their behavior to the passed 
> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by 
> considering the current session time zone.
> 
> If we would like to design this with less effort required, we could think 
> about returning TIMESTAMP WITH LOCAL TIME ZONE also for CURRENT_TIMESTAMP.
> 
> 
> I will try to involve more people into this discussion.
> 
> Thanks,
> Timo
> 
> [1] 
> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3
>  
> <https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3>
> 
> On 21.01.21 15:32, Leonard Xu wrote:
>>> Before the changes, as I am writing this reply, the local time here is 
>>> 2021-01-21 12:03:35 (Beijing time, UTC+8).
>>> And I tried these 5 functions in sql client, and got:
>>> 
>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, 
>>> CURRENT_TIME;
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> |                  EXPR$0 |                  EXPR$1 |       
>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 
>>> 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> After the changes, the expected behavior will change to:
>>> 
>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, 
>>> CURRENT_TIME;
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> |                  EXPR$0 |                  EXPR$1 |       
>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 
>>> 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still be 
>>> TIMESTAMP;
>> To Kurt, thanks  for the intuitive case, it really clear, you’re wright that 
>> I want to propose to change the return value of these functions. It’s the 
>> most important part of the topic from user's perspective.
>>> I think this definitely deserves a FLIP.
>> To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start 
>> the FLIP discussion soon.
>>>> If use the default Flink SQL,&nbsp; the window time range of the
>>>> statistics is incorrect, then the statistical results will naturally be
>>>> incorrect.
>> To zhisheng, sorry to hear that this problem influenced your production 
>> jobs,  Could you share your SQL pattern?  we can have more inputs and try to 
>> resolve them.
>> Best,
>> Leonard
>>> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjt...@gmail.com 
>>> <mailto:xbjt...@gmail.com> <mailto:xbjt...@gmail.com 
>>> <mailto:xbjt...@gmail.com>>> wrote:
>>> I found above example format may mess up in different mail client, I post a 
>>> picture here[1].
>>> 
>>> Best,
>>> Leonard
>>> 
>>> [1] 
>>> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>>
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
>>>  
>>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>>>
>>> 
>>>> 在 2021年1月19日,16:22,Leonard Xu <xbjt...@gmail.com 
>>>> <mailto:xbjt...@gmail.com> <mailto:xbjt...@gmail.com 
>>>> <mailto:xbjt...@gmail.com>>> 写道:
>>>> 
>>>> Hi, all
>>>> 
>>>> I want to start the discussion about correcting time-related function 
>>>> behavior in Flink SQL, this is a tricky topic but I think it’s time to 
>>>> address it.
>>>> 
>>>> Currently some temporal function behaviors are wired to users.
>>>> 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a 
>>>> timezone offset with the wall-clock time in users' local time zone, users 
>>>> need to add their local time zone offset manually to get expected local 
>>>> timestamp(e.g: Users in Germany need to +1h to get expected local 
>>>> timestamp).
>>>> 
>>>> 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get 
>>>> wall-clock timestamp in local time zone, and thus they need write UDF in 
>>>> their SQL just for implementing a simple filter like WHERE date_col =  
>>>> CURRENT_DATE.
>>>> 
>>>> 3. Another common case  is the time window  with day interval based on 
>>>> PROCTIME(), user plan to put all data from one day into the same window, 
>>>> but the window is assigned using timestamp in UTC+0 timezone rather than 
>>>> the session timezone which leads to the window starts with an offset(e.g: 
>>>> Users in China need to add -8h in their business sql start and then +8h 
>>>> when output the result, the conversion like a magic for users).
>>>> 
>>>> These problems come from that lots of time-related functions like 
>>>> PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are 
>>>> returning time values based on UTC+0 time zone.
>>>> 
>>>> This topic will lead to a comparison of the three types, i.e. 
>>>> TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and 
>>>> TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I 
>>>> wrote a document[1] to help understand them better. You can also know the 
>>>> tree timestamp types behavior in Hadoop ecosystem from the reference link 
>>>> int the doc.
>>>> 
>>>> 
>>>> I Invested all Flink time-related functions current behavior and compared 
>>>> with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an 
>>>> excel [2] to organize them well, we can use it for the next discussion. 
>>>> Please let me know if I missed something.
>>>> From my investigation, I think we need to correct the behavior of function 
>>>> NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct 
>>>> them, we can change the function return type or function return value or 
>>>> change return type and return value both. All of those way are valid 
>>>> because SQL:2011 does not specify the function return type and every SQL 
>>>> engine vendor has its own implementation. For example the 
>>>> CURRENT_TIMESTAMP function,
>>>> 
>>>> FLINK current behavior        existed problem other vendors' behavior 
>>>> proposed change
>>>> CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
>>>> TIMESTAMP(0) NOT NULL
>>>> 
>>>> #session timezone: UTC
>>>> 2020-12-28T23:52:52
>>>> 
>>>> #session timezone: UTC+8
>>>> 2020-12-28T23:52:52
>>>> 
>>>> wall clock:
>>>> UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but user 
>>>> expects current timestamp in session time zone      In MySQL, Spark, the 
>>>> function NOW() and CURRENT_TIMESTAMP return current timestamp value in 
>>>> session time zone,the return type is TIMESTAMP
>>>> 
>>>> In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current 
>>>> timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
>>>> 
>>>> In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return 
>>>> current timestamp in session time zone,the return type is TIMESTAMP WITH 
>>>> LOCAL TIME ZONE Flink should return current timestamp in session time 
>>>> zone, the return type should be TIMESTAMP
>>>> 
>>>> 
>>>> I tend to only change the return value for these problematic functions and 
>>>> introduce an option for compatibility consideration, what do you think?
>>>> 
>>>> 
>>>> Looking forward to your feedback.
>>>> 
>>>> Best,
>>>> Leonard
>>>> 
>>>> [1] 
>>>> https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing><https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>>
>>>>  
>>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing><https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>>>
>>>> [2] 
>>>> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>
>>>>  
>>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>>
>>>>  
>>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing><https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
>>>>  
>>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>>>

Reply via email to