Thanks to Leonard Xu for discussing this tricky topic. At present, there are
many Flink jobs in our production environment that are used to count day-level
reports (eg: count PV/UV ).
If use the default Flink SQL, the window time range of the statistics is
incorrect, then the statistical results will naturally be incorrect.
The user needs to deal with the time zone manually in order to solve the
problem.
If Flink itself can solve these time zone issues, then I think it will be
user-friendly.
Thank you
Best!
zhisheng
------------------ ???????? ------------------
??????:
"dev"
<[email protected]>;
????????: 2021??1??19??(??????) ????6:35
??????: "dev"<[email protected]>;
????: Re: [DISCUSS] Correct time-related function behavior in Flink SQL
I found above example format may mess up in different mail client, I post a
picture here[1].
Best,
Leonard
[1]
https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
<https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>
> ?? 2021??1??19????16:22??Leonard Xu <[email protected]> ??????
>
> Hi, all
>
> I want to start the discussion about correcting time-related function
behavior in Flink SQL, this is a tricky topic but I think it??s time to address
it.
>
> Currently some temporal function behaviors are wired to users.
> 1. When users use a PROCTIME() in SQL, the value of PROCTIME() has a
timezone offset with the wall-clock time in users' local time zone, users need
to add their local time zone offset manually to get expected local
timestamp(e.g: Users in Germany need to +1h to get expected local timestamp).
>
> 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP to
get wall-clock timestamp in local time zone, and thus they need write UDF in
their SQL just for implementing a simple filter like WHERE date_col =
CURRENT_DATE.
>
> 3. Another common case is the time window with day interval
based on PROCTIME(), user plan to put all data from one day into the same
window, but the window is assigned using timestamp in UTC+0 timezone rather
than the session timezone which leads to the window starts with an offset(e.g:
Users in China need to add -8h in their business sql start and then +8h when
output the result, the conversion like a magic for users).
>
> These problems come from that lots of time-related functions like
PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are
returning time values based on UTC+0 time zone.
>
> This topic will lead to a comparison of the three types, i.e.
TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and
TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I
wrote a document[1] to help understand them better. You can also know the tree
timestamp types behavior in Hadoop ecosystem from the reference link int the
doc.
>
>
> I Invested all Flink time-related functions current behavior and compared
with other DB vendors like Pg,Presto, Hive, Spark, Snowflake, I made an
excel [2] to organize them well, we can use it for the next discussion. Please
let me know if I missed something.
> From my investigation, I think we need to correct the behavior of function
NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them,
we can change the function return type or function return value or change
return type and return value both. All of those way are valid because SQL:2011
does not specify the function return type and every SQL engine vendor has its
own implementation. For example the CURRENT_TIMESTAMP function,
>
> FLINK current behavior existed problem other vendors' behavior
proposed change
> CURRENT_TIMESTAMP CURRENT_TIMESTAMP
> TIMESTAMP(0) NOT NULL
>
> #session timezone: UTC
> 2020-12-28T23:52:52
>
> #session timezone: UTC+8
> 2020-12-28T23:52:52
>
> wall clock:
> UTC+8: 2020-12-29 07:52:52 Wrong value??returns UTC timestamp, but user
expects current timestamp in session time zone In MySQL, Spark, the
function NOW() and CURRENT_TIMESTAMP return current timestamp value in session
time zone??the return type is TIMESTAMP
>
> In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current
timestamp in session time zone??the return type is TIMESTAMP WITH TIME ZONE
>
> In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return
current timestamp in session time zone??the return type is TIMESTAMP WITH LOCAL
TIME ZONE Flink should return current timestamp in session time zone, the
return type should be TIMESTAMP
>
>
> I tend to only change the return value for these problematic functions and
introduce an option for compatibility consideration, what do you think?
>
>
> Looking forward to your feedback.
>
> Best,
> Leonard
>
> [1]
https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
<https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>
> [2]
https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>