Hi, all

I want to start the discussion about correcting time-related function behavior 
in Flink SQL, this is a tricky topic but I think it’s time to address it. 

Currently some temporal function behaviors are wired to users.
1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a timezone 
offset with the wall-clock time in users' local time zone, users need to add 
their local time zone offset manually to get expected local timestamp(e.g: 
Users in Germany need to +1h to get expected local timestamp). 

2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get 
wall-clock timestamp in local time zone, and thus they need write UDF in their 
SQL just for implementing a simple filter like WHERE date_col =  CURRENT_DATE. 

3. Another common case  is the time window  with day interval based on 
PROCTIME(), user plan to put all data from one day into the same window, but 
the window is assigned using timestamp in UTC+0 timezone rather than the 
session timezone which leads to the window starts with an offset(e.g: Users in 
China need to add -8h in their business sql start and then +8h when output the 
result, the conversion like a magic for users). 

These problems come from that lots of time-related functions like PROCTIME(), 
NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time 
values based on UTC+0 time zone.

This topic will lead to a comparison of the three types, i.e. 
TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and 
TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I 
wrote a document[1] to help understand them better. You can also know the tree 
timestamp types behavior in Hadoop ecosystem from the reference link int the 
doc.


I Invested all Flink time-related functions current behavior and compared with 
other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] 
to organize them well, we can use it for the next discussion. Please let me 
know if I missed something.
From my investigation, I think we need to correct the behavior of function 
NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them, 
we can change the function return type or function return value or change 
return type and return value both. All of those way are valid because SQL:2011 
does not specify the function return type and every SQL engine vendor has its 
own implementation. For example the CURRENT_TIMESTAMP function,

FLINK   current behavior        existed problem other vendors' behavior 
proposed change
CURRENT_TIMESTAMP       CURRENT_TIMESTAMP
TIMESTAMP(0) NOT NULL

#session timezone: UTC
2020-12-28T23:52:52

#session timezone: UTC+8
2020-12-28T23:52:52

wall clock:
UTC+8: 2020-12-29 07:52:52      Wrong value:returns UTC timestamp, but user 
expects current timestamp in session time zone      In MySQL, Spark, the 
function NOW() and CURRENT_TIMESTAMP return current timestamp value in session 
time zone,the return type is TIMESTAMP

In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current timestamp 
in session time zone,the return type is TIMESTAMP WITH TIME ZONE

In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return current 
timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME 
ZONE   Flink should return current timestamp in session time zone, the return 
type should be TIMESTAMP


I tend to only change the return value for these problematic functions and 
introduce an option for compatibility consideration, what do you think?


Looking forward to your feedback.

Best,
Leonard

[1] 
https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
 
<https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>
 
[2] 
https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
 
<https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>
 

Reply via email to