Forgot one more thing. Continue with displaying in UTC. As a user, if Flink want to display the timestamp in UTC, why don't we offer something like UTC_TIMESTAMP?
Best, Kurt On Fri, Jan 22, 2021 at 4:33 PM Kurt Young <ykt...@gmail.com> wrote: > Before jumping into technique details, let's take a step back to discuss > user experience. > > The first important question is what kind of date and time will Flink > display when users call > CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are similar). > > Should it always display the date and time in UTC or in the user's time > zone? I think this part is the > reason that surprised lots of users. If we forget about the type and > internal representation of these > two methods, as a user, my instinct tells me that these two methods should > display my wall clock time. > > Display time in UTC? I'm not sure, why I should care about UTC time? I > want to get my current timestamp. > For those users who have never gone abroad, they might not even be able to > realize that this is affected > by the time zone. > > Best, > Kurt > > > On Fri, Jan 22, 2021 at 12:25 PM Leonard Xu <xbjt...@gmail.com> wrote: > >> Thanks @Timo for the detailed reply, let's go on this topic on this >> discussion, I've merged all mails to this discussion. >> >> > LOCALDATE / LOCALTIME / LOCALTIMESTAMP >> > >> > --> uses session time zone, returns DATE/TIME/TIMESTAMP >> >> > >> > CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP >> > >> > --> uses session time zone, returns DATE/TIME/TIMESTAMP >> > >> > I'm very sceptical about this behavior. Almost all mature systems >> (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a >> data type with some degree of time zone information encoded. In a >> globalized world with businesses spanning different regions, I think we >> should do this as well. There should be a difference between >> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose >> which behavior they prefer for their pipeline. >> >> >> I know that the two series should be different at first glance, but >> different SQL engines can have their own explanations,for example, >> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in Snowflake[1] and has >> no difference, and Spark only supports the later one and doesn’t support >> LOCALTIME/LOCALTIMESTAMP[2]. >> >> >> > If we would design this from scatch, I would suggest the following: >> > >> > - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / >> LOCALTIME for materialized timestamp parts >> >> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL standard, but >> LOCALDATE not, I don’t think it’s a good idea that dropping functions which >> SQL standard supported and introducing a replacement which SQL standard not >> reminded. >> >> >> > - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to >> materialize all session time information into every record. It it the most >> generic data type and allows to cast to all other timestamp data types. >> This generic ability can be used for filter predicates as well either >> through implicit or explicit casting. >> >> TIMESTAMP WITH TIME ZONE indeed contains more information to describe a >> time point, but the type TIMESTAMP can cast to all other timestamp data >> types combining with session time zone as well, and it also can be used for >> filter predicates. For type casting between BIGINT and TIMESTAMP, I think >> the function way using TO_TIMEMTAMP()/FROM_UNIXTIMESTAMP() is more clear. >> >> > PROCTIME/ROWTIME should be time functions based on a long value. Both >> System.currentMillis() and our watermark system work on long values. Those >> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation >> should always happen based on UTC. >> > We discussed it in a different thread, but we should allow PROCTIME >> globally. People need a way to create instances of TIMESTAMP WITH LOCAL >> TIME ZONE. This is not considered in the current design doc. >> > Many pipelines contain UTC timestamps and thus it should be easy to >> create one. >> > Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with this type >> because we should remember that TIMESTAMP WITH LOCAL TIME ZONE accepts all >> timestamp data types as casting target [1]. We could allow TIMESTAMP WITH >> TIME ZONE in the future for ROWTIME. >> > In any case, windows should simply adapt their behavior to the passed >> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by >> considering the current session time zone. >> >> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for PROCTIME has >> more clear semantics, but I realized that user didn’t care the type but >> more about the expressed value they saw, and change the type from TIMESTAMP >> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we need >> consider all places where the TIMESTAMP type used, and many builtin >> functions and UDFs doest not support TIMESTAMP WITH LOCAL TIME ZONE type. >> That means both user and Flink devs need to refactor the code(UDF, builtin >> functions, sql pipeline), to be honest, I didn’t see strong motivation that >> we have to do the pretty big refactor from user’s perspective and >> developer’s perspective. >> >> In one word, both your suggestion and my proposal can resolve almost all >> user problems,the divergence is whether we need to spend pretty energy just >> to get a bit more accurate semantics? I think we need a tradeoff. >> >> >> Best, >> Leonard >> [1] >> https://trino.io/docs/current/functions/datetime.html#current_timestamp < >> https://trino.io/docs/current/functions/datetime.html#current_timestamp> >> [2] https://issues.apache.org/jira/browse/SPARK-30374 < >> https://issues.apache.org/jira/browse/SPARK-30374> >> >> >> >> >> >> >> > 2021-01-22,00:53,Timo Walther <twal...@apache.org> : >> > >> > Hi Leonard, >> > >> > thanks for working on this topic. I agree that time handling is not >> easy in Flink at the moment. We added new time data types (and some are >> still not supported which even further complicates things like TIME(9)). We >> should definitely improve this situation for users. >> > >> > This is a pretty opinionated topic and it seems that the SQL standard >> is not really deciding this but is at least supporting. So let me express >> my opinion for the most important functions: >> > >> > LOCALDATE / LOCALTIME / LOCALTIMESTAMP >> > >> > --> uses session time zone, returns DATE/TIME/TIMESTAMP >> > >> > I think those are the most obvious ones because the LOCAL indicates >> that the locality should be materialized into the result and any time zone >> information (coming from session config or data) is not important >> afterwards. >> > >> > CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP >> > >> > --> uses session time zone, returns DATE/TIME/TIMESTAMP >> > >> > I'm very sceptical about this behavior. Almost all mature systems >> (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a >> data type with some degree of time zone information encoded. In a >> globalized world with businesses spanning different regions, I think we >> should do this as well. There should be a difference between >> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose >> which behavior they prefer for their pipeline. >> > >> > If we would design this from scatch, I would suggest the following: >> > >> > - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / >> LOCALTIME for materialized timestamp parts >> > >> > - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to >> materialize all session time information into every record. It it the most >> generic data type and allows to cast to all other timestamp data types. >> This generic ability can be used for filter predicates as well either >> through implicit or explicit casting. >> > >> > PROCTIME/ROWTIME should be time functions based on a long value. Both >> System.currentMillis() and our watermark system work on long values. Those >> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation >> should always happen based on UTC. We discussed it in a different thread, >> but we should allow PROCTIME globally. People need a way to create >> instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not considered in the >> current design doc. Many pipelines contain UTC timestamps and thus it >> should be easy to create one. Also, both CURRENT_TIMESTAMP and >> LOCALTIMESTAMP can work with this type because we should remember that >> TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data types as casting >> target [1]. We could allow TIMESTAMP WITH TIME ZONE in the future for >> ROWTIME. >> > >> > In any case, windows should simply adapt their behavior to the passed >> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by >> considering the current session time zone. >> > >> > If we would like to design this with less effort required, we could >> think about returning TIMESTAMP WITH LOCAL TIME ZONE also for >> CURRENT_TIMESTAMP. >> > >> > >> > I will try to involve more people into this discussion. >> > >> > Thanks, >> > Timo >> > >> > [1] >> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 >> < >> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 >> > >> >> >> >> > 2021-01-21,22:32,Leonard Xu <xbjt...@gmail.com> : >> >> Before the changes, as I am writing this reply, the local time here is >> 2021-01-21 12:03:35 (Beijing time, UTC+8). >> >> And I tried these 5 functions in sql client, and got: >> >> >> >> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >> CURRENT_TIME; >> >> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> >> | EXPR$0 | EXPR$1 | >> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >> >> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> >> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >> >> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> >> After the changes, the expected behavior will change to: >> >> >> >> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >> CURRENT_TIME; >> >> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> >> | EXPR$0 | EXPR$1 | >> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >> >> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> >> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >> >> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> >> The return type of now(), proctime() and CURRENT_TIMESTAMP still be >> TIMESTAMP; >> > >> > To Kurt, thanks for the intuitive case, it really clear, you’re wright >> that I want to propose to change the return value of these functions. It’s >> the most important part of the topic from user's perspective. >> > >> >> I think this definitely deserves a FLIP. >> > To Jark, nice suggestion, I prepared a FLIP for this topic, and will >> start the FLIP discussion soon. >> > >> >>> If use the default Flink SQL, the window time range of the >> >>> statistics is incorrect, then the statistical results will naturally >> be >> >>> incorrect. >> > To zhisheng, sorry to hear that this problem influenced your production >> jobs, Could you share your SQL pattern? we can have more inputs and try >> to resolve them. >> > >> > >> > Best, >> > Leonard >> >> >> >> > 2021-01-21,14:19,Jark Wu <imj...@gmail.com> : >> > >> > Great examples to understand the problem and the proposed changes, >> @Kurt! >> > >> > Thanks Leonard for investigating this problem. >> > The time-zone problems around time functions and windows have bothered a >> > lot of users. It's time to fix them! >> > >> > The return value changes sound reasonable to me, and keeping the return >> > type unchanged will minimize the surprise to the users. >> > Besides that, I think it would be better to mention how this affects the >> > window behaviors, and the interoperability with DataStream. >> > >> > I think this definitely deserves a FLIP. >> > >> > ==================================================== >> > >> > Hi zhisheng, >> > >> > Do you have examples to illustrate which case will get the wrong window >> > boundaries? >> > That will help to verify whether the proposed changes can solve your >> > problem. >> > >> > Best, >> > Jark >> >> >> >> >> > 2021-01-21,12:54,zhisheng <173855...@qq.com> : >> > >> > Thanks to Leonard Xu for discussing this tricky topic. At present, >> there are many Flink jobs in our production environment that are used to >> count day-level reports (eg: count PV/UV ). >> > >> > If use the default Flink SQL, the window time range of the >> statistics is incorrect, then the statistical results will naturally be >> incorrect. >> > >> > The user needs to deal with the time zone manually in order to solve >> the problem. >> > >> > If Flink itself can solve these time zone issues, then I think it will >> be user-friendly. >> > >> > Thank you >> > >> > Best!; >> > zhisheng >> >> >> >> >> > 2021-01-21,12:11,Kurt Young <ykt...@gmail.com> : >> > >> > cc this to user & user-zh mailing list because this will affect lots of >> users, and also quite a lot of users >> > were asking questions around this topic. >> > >> > Let me try to understand this from user's perspective. >> > >> > Your proposal will affect five functions, which are: >> > PROCTIME() >> > NOW() >> > CURRENT_DATE >> > CURRENT_TIME >> > CURRENT_TIMESTAMP >> > Before the changes, as I am writing this reply, the local time here is >> 2021-01-21 12:03:35 (Beijing time, UTC+8). >> > And I tried these 5 functions in sql client, and got: >> > >> > Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >> CURRENT_TIME; >> > >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> > | EXPR$0 | EXPR$1 | >> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >> > >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> > | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >> > >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> > After the changes, the expected behavior will change to: >> > >> > Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >> CURRENT_TIME; >> > >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> > | EXPR$0 | EXPR$1 | >> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >> > >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> > | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >> > >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >> > The return type of now(), proctime() and CURRENT_TIMESTAMP still be >> TIMESTAMP; >> > >> > Best, >> > Kurt >> >> >> >> >> >> >>