Thanks @Timo for the detailed reply. Let’s continue the technical discussion on the former FLIP discussion thread[1]: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior.
We can share our decision to user and user-zh on this thread once we have a final proposal. Best, Leonard [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-162-Consistent-Flink-SQL-time-function-behavior-tc48116.html > 在 2021年1月22日,00:53,Timo Walther <twal...@apache.org> 写道: > > Hi Leonard, > > thanks for working on this topic. I agree that time handling is not easy in > Flink at the moment. We added new time data types (and some are still not > supported which even further complicates things like TIME(9)). We should > definitely improve this situation for users. > > This is a pretty opinionated topic and it seems that the SQL standard is not > really deciding this but is at least supporting. So let me express my opinion > for the most important functions: > > LOCALDATE / LOCALTIME / LOCALTIMESTAMP > > --> uses session time zone, returns DATE/TIME/TIMESTAMP > > I think those are the most obvious ones because the LOCAL indicates that the > locality should be materialized into the result and any time zone information > (coming from session config or data) is not important afterwards. > > CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP > > --> uses session time zone, returns DATE/TIME/TIMESTAMP > > I'm very sceptical about this behavior. Almost all mature systems (Oracle, > Postgres) and new high quality systems (Presto, Snowflake) use a data type > with some degree of time zone information encoded. In a globalized world with > businesses spanning different regions, I think we should do this as well. > There should be a difference between CURRENT_TIMESTAMP and LOCALTIMESTAMP. > And users should be able to choose which behavior they prefer for their > pipeline. > > If we would design this from scatch, I would suggest the following: > > - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / LOCALTIME > for materialized timestamp parts > > - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to materialize > all session time information into every record. It it the most generic data > type and allows to cast to all other timestamp data types. This generic > ability can be used for filter predicates as well either through implicit or > explicit casting. > > PROCTIME/ROWTIME should be time functions based on a long value. Both > System.currentMillis() and our watermark system work on long values. Those > should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation > should always happen based on UTC. We discussed it in a different thread, but > we should allow PROCTIME globally. People need a way to create instances of > TIMESTAMP WITH LOCAL TIME ZONE. This is not considered in the current design > doc. Many pipelines contain UTC timestamps and thus it should be easy to > create one. Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with > this type because we should remember that TIMESTAMP WITH LOCAL TIME ZONE > accepts all timestamp data types as casting target [1]. We could allow > TIMESTAMP WITH TIME ZONE in the future for ROWTIME. > > In any case, windows should simply adapt their behavior to the passed > timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by > considering the current session time zone. > > If we would like to design this with less effort required, we could think > about returning TIMESTAMP WITH LOCAL TIME ZONE also for CURRENT_TIMESTAMP. > > > I will try to involve more people into this discussion. > > Thanks, > Timo > > [1] > https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 > > <https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3> > > On 21.01.21 15:32, Leonard Xu wrote: >>> Before the changes, as I am writing this reply, the local time here is >>> 2021-01-21 12:03:35 (Beijing time, UTC+8). >>> And I tried these 5 functions in sql client, and got: >>> >>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >>> CURRENT_TIME; >>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>> | EXPR$0 | EXPR$1 | >>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>> After the changes, the expected behavior will change to: >>> >>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >>> CURRENT_TIME; >>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>> | EXPR$0 | EXPR$1 | >>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>> The return type of now(), proctime() and CURRENT_TIMESTAMP still be >>> TIMESTAMP; >> To Kurt, thanks for the intuitive case, it really clear, you’re wright that >> I want to propose to change the return value of these functions. It’s the >> most important part of the topic from user's perspective. >>> I think this definitely deserves a FLIP. >> To Jark, nice suggestion, I prepared a FLIP for this topic, and will start >> the FLIP discussion soon. >>>> If use the default Flink SQL, the window time range of the >>>> statistics is incorrect, then the statistical results will naturally be >>>> incorrect. >> To zhisheng, sorry to hear that this problem influenced your production >> jobs, Could you share your SQL pattern? we can have more inputs and try to >> resolve them. >> Best, >> Leonard >>> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjt...@gmail.com >>> <mailto:xbjt...@gmail.com> <mailto:xbjt...@gmail.com >>> <mailto:xbjt...@gmail.com>>> wrote: >>> I found above example format may mess up in different mail client, I post a >>> picture here[1]. >>> >>> Best, >>> Leonard >>> >>> [1] >>> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>> >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png >>> >>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>>> >>> >>>> 在 2021年1月19日,16:22,Leonard Xu <xbjt...@gmail.com >>>> <mailto:xbjt...@gmail.com> <mailto:xbjt...@gmail.com >>>> <mailto:xbjt...@gmail.com>>> 写道: >>>> >>>> Hi, all >>>> >>>> I want to start the discussion about correcting time-related function >>>> behavior in Flink SQL, this is a tricky topic but I think it’s time to >>>> address it. >>>> >>>> Currently some temporal function behaviors are wired to users. >>>> 1. When users use a PROCTIME() in SQL, the value of PROCTIME() has a >>>> timezone offset with the wall-clock time in users' local time zone, users >>>> need to add their local time zone offset manually to get expected local >>>> timestamp(e.g: Users in Germany need to +1h to get expected local >>>> timestamp). >>>> >>>> 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP to get >>>> wall-clock timestamp in local time zone, and thus they need write UDF in >>>> their SQL just for implementing a simple filter like WHERE date_col = >>>> CURRENT_DATE. >>>> >>>> 3. Another common case is the time window with day interval based on >>>> PROCTIME(), user plan to put all data from one day into the same window, >>>> but the window is assigned using timestamp in UTC+0 timezone rather than >>>> the session timezone which leads to the window starts with an offset(e.g: >>>> Users in China need to add -8h in their business sql start and then +8h >>>> when output the result, the conversion like a magic for users). >>>> >>>> These problems come from that lots of time-related functions like >>>> PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are >>>> returning time values based on UTC+0 time zone. >>>> >>>> This topic will lead to a comparison of the three types, i.e. >>>> TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and >>>> TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I >>>> wrote a document[1] to help understand them better. You can also know the >>>> tree timestamp types behavior in Hadoop ecosystem from the reference link >>>> int the doc. >>>> >>>> >>>> I Invested all Flink time-related functions current behavior and compared >>>> with other DB vendors like Pg,Presto, Hive, Spark, Snowflake, I made an >>>> excel [2] to organize them well, we can use it for the next discussion. >>>> Please let me know if I missed something. >>>> From my investigation, I think we need to correct the behavior of function >>>> NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct >>>> them, we can change the function return type or function return value or >>>> change return type and return value both. All of those way are valid >>>> because SQL:2011 does not specify the function return type and every SQL >>>> engine vendor has its own implementation. For example the >>>> CURRENT_TIMESTAMP function, >>>> >>>> FLINK current behavior existed problem other vendors' behavior >>>> proposed change >>>> CURRENT_TIMESTAMP CURRENT_TIMESTAMP >>>> TIMESTAMP(0) NOT NULL >>>> >>>> #session timezone: UTC >>>> 2020-12-28T23:52:52 >>>> >>>> #session timezone: UTC+8 >>>> 2020-12-28T23:52:52 >>>> >>>> wall clock: >>>> UTC+8: 2020-12-29 07:52:52 Wrong value:returns UTC timestamp, but user >>>> expects current timestamp in session time zone In MySQL, Spark, the >>>> function NOW() and CURRENT_TIMESTAMP return current timestamp value in >>>> session time zone,the return type is TIMESTAMP >>>> >>>> In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current >>>> timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE >>>> >>>> In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return >>>> current timestamp in session time zone,the return type is TIMESTAMP WITH >>>> LOCAL TIME ZONE Flink should return current timestamp in session time >>>> zone, the return type should be TIMESTAMP >>>> >>>> >>>> I tend to only change the return value for these problematic functions and >>>> introduce an option for compatibility consideration, what do you think? >>>> >>>> >>>> Looking forward to your feedback. >>>> >>>> Best, >>>> Leonard >>>> >>>> [1] >>>> https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing >>>> >>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing><https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing >>>> >>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>> >>>> >>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing >>>> >>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing><https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing >>>> >>>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>>> >>>> [2] >>>> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing >>>> >>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> >>>> >>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing >>>> >>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>> >>>> >>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing >>>> >>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing><https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing >>>> >>>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>>>