Forgot one more thing. Continue with displaying in UTC. As a user, if Flink
want to display the timestamp
in UTC, why don't we offer something like UTC_TIMESTAMP?

Best,
Kurt


On Fri, Jan 22, 2021 at 4:33 PM Kurt Young <ykt...@gmail.com> wrote:

> Before jumping into technique details, let's take a step back to discuss
> user experience.
>
> The first important question is what kind of date and time will Flink
> display when users call
>  CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are similar).
>
> Should it always display the date and time in UTC or in the user's time
> zone? I think this part is the
> reason that surprised lots of users. If we forget about the type and
> internal representation of these
> two methods, as a user, my instinct tells me that these two methods should
> display my wall clock time.
>
> Display time in UTC? I'm not sure, why I should care about UTC time? I
> want to get my current timestamp.
> For those users who have never gone abroad, they might not even be able to
> realize that this is affected
> by the time zone.
>
> Best,
> Kurt
>
>
> On Fri, Jan 22, 2021 at 12:25 PM Leonard Xu <xbjt...@gmail.com> wrote:
>
>> Thanks @Timo for the detailed reply, let's go on this topic on this
>> discussion,  I've merged all mails to this discussion.
>>
>> > LOCALDATE / LOCALTIME / LOCALTIMESTAMP
>> >
>> > --> uses session time zone, returns DATE/TIME/TIMESTAMP
>>
>> >
>> > CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
>> >
>> > --> uses session time zone, returns DATE/TIME/TIMESTAMP
>> >
>> > I'm very sceptical about this behavior. Almost all mature systems
>> (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a
>> data type with some degree of time zone information encoded. In a
>> globalized world with businesses spanning different regions, I think we
>> should do this as well. There should be a difference between
>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose
>> which behavior they prefer for their pipeline.
>>
>>
>> I know that the two series should be different at first glance, but
>> different SQL engines can have their own explanations,for example,
>> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in Snowflake[1] and has
>> no difference, and Spark only supports the later one and doesn’t support
>> LOCALTIME/LOCALTIMESTAMP[2].
>>
>>
>> > If we would design this from scatch, I would suggest the following:
>> >
>> > - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE /
>> LOCALTIME for materialized timestamp parts
>>
>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL standard, but
>> LOCALDATE not, I don’t think it’s a good idea that dropping functions which
>> SQL standard supported and introducing a replacement which SQL standard not
>> reminded.
>>
>>
>> > - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to
>> materialize all session time information into every record. It it the most
>> generic data type and allows to cast to all other timestamp data types.
>> This generic ability can be used for filter predicates as well either
>> through implicit or explicit casting.
>>
>> TIMESTAMP WITH TIME ZONE indeed contains more information to describe a
>> time point, but the type TIMESTAMP  can cast to all other timestamp data
>> types combining with session time zone as well, and it also can be used for
>> filter predicates. For type casting between BIGINT and TIMESTAMP, I think
>> the function way using TO_TIMEMTAMP()/FROM_UNIXTIMESTAMP() is more clear.
>>
>> > PROCTIME/ROWTIME should be time functions based on a long value. Both
>> System.currentMillis() and our watermark system work on long values. Those
>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation
>> should always happen based on UTC.
>> > We discussed it in a different thread, but we should allow PROCTIME
>> globally. People need a way to create instances of TIMESTAMP WITH LOCAL
>> TIME ZONE. This is not considered in the current design doc.
>> > Many pipelines contain UTC timestamps and thus it should be easy to
>> create one.
>> > Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with this type
>> because we should remember that TIMESTAMP WITH LOCAL TIME ZONE accepts all
>> timestamp data types as casting target [1]. We could allow TIMESTAMP WITH
>> TIME ZONE in the future for ROWTIME.
>> > In any case, windows should simply adapt their behavior to the passed
>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by
>> considering the current session time zone.
>>
>> I also agree returning  TIMESTAMP WITH LOCAL TIME ZONE for PROCTIME has
>> more clear semantics, but I realized that user didn’t care the type but
>> more about the expressed value they saw, and change the type from TIMESTAMP
>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we need
>> consider all places where the TIMESTAMP type used, and many builtin
>> functions and UDFs doest not support  TIMESTAMP WITH LOCAL TIME ZONE type.
>> That means both user and Flink devs need to refactor the code(UDF, builtin
>> functions, sql pipeline), to be honest, I didn’t see strong motivation that
>> we have to do the pretty big refactor from user’s perspective and
>> developer’s perspective.
>>
>> In one word, both your suggestion and my proposal can resolve almost all
>> user problems,the divergence is whether we need to spend pretty energy just
>> to get a bit more accurate semantics?   I think we need a tradeoff.
>>
>>
>> Best,
>> Leonard
>> [1]
>> https://trino.io/docs/current/functions/datetime.html#current_timestamp <
>> https://trino.io/docs/current/functions/datetime.html#current_timestamp>
>> [2] https://issues.apache.org/jira/browse/SPARK-30374 <
>> https://issues.apache.org/jira/browse/SPARK-30374>
>>
>>
>>
>>
>>
>>
>> >  2021-01-22,00:53,Timo Walther <twal...@apache.org> :
>> >
>> > Hi Leonard,
>> >
>> > thanks for working on this topic. I agree that time handling is not
>> easy in Flink at the moment. We added new time data types (and some are
>> still not supported which even further complicates things like TIME(9)). We
>> should definitely improve this situation for users.
>> >
>> > This is a pretty opinionated topic and it seems that the SQL standard
>> is not really deciding this but is at least supporting. So let me express
>> my opinion for the most important functions:
>> >
>> > LOCALDATE / LOCALTIME / LOCALTIMESTAMP
>> >
>> > --> uses session time zone, returns DATE/TIME/TIMESTAMP
>> >
>> > I think those are the most obvious ones because the LOCAL indicates
>> that the locality should be materialized into the result and any time zone
>> information (coming from session config or data) is not important
>> afterwards.
>> >
>> > CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
>> >
>> > --> uses session time zone, returns DATE/TIME/TIMESTAMP
>> >
>> > I'm very sceptical about this behavior. Almost all mature systems
>> (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a
>> data type with some degree of time zone information encoded. In a
>> globalized world with businesses spanning different regions, I think we
>> should do this as well. There should be a difference between
>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose
>> which behavior they prefer for their pipeline.
>> >
>> > If we would design this from scatch, I would suggest the following:
>> >
>> > - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE /
>> LOCALTIME for materialized timestamp parts
>> >
>> > - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to
>> materialize all session time information into every record. It it the most
>> generic data type and allows to cast to all other timestamp data types.
>> This generic ability can be used for filter predicates as well either
>> through implicit or explicit casting.
>> >
>> > PROCTIME/ROWTIME should be time functions based on a long value. Both
>> System.currentMillis() and our watermark system work on long values. Those
>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation
>> should always happen based on UTC. We discussed it in a different thread,
>> but we should allow PROCTIME globally. People need a way to create
>> instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not considered in the
>> current design doc. Many pipelines contain UTC timestamps and thus it
>> should be easy to create one. Also, both CURRENT_TIMESTAMP and
>> LOCALTIMESTAMP can work with this type because we should remember that
>> TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data types as casting
>> target [1]. We could allow TIMESTAMP WITH TIME ZONE in the future for
>> ROWTIME.
>> >
>> > In any case, windows should simply adapt their behavior to the passed
>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by
>> considering the current session time zone.
>> >
>> > If we would like to design this with less effort required, we could
>> think about returning TIMESTAMP WITH LOCAL TIME ZONE also for
>> CURRENT_TIMESTAMP.
>> >
>> >
>> > I will try to involve more people into this discussion.
>> >
>> > Thanks,
>> > Timo
>> >
>> > [1]
>> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3
>> <
>> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3
>> >
>>
>>
>>
>> >  2021-01-21,22:32,Leonard Xu <xbjt...@gmail.com> :
>> >> Before the changes, as I am writing this reply, the local time here is
>> 2021-01-21 12:03:35 (Beijing time, UTC+8).
>> >> And I tried these 5 functions in sql client, and got:
>> >>
>> >> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
>> CURRENT_TIME;
>> >>
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> >> |                  EXPR$0 |                  EXPR$1 |
>>  CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>> >>
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> >> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |
>> 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>> >>
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> >> After the changes, the expected behavior will change to:
>> >>
>> >> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
>> CURRENT_TIME;
>> >>
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> >> |                  EXPR$0 |                  EXPR$1 |
>>  CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>> >>
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> >> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |
>> 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>> >>
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> >> The return type of now(), proctime() and CURRENT_TIMESTAMP still be
>> TIMESTAMP;
>> >
>> > To Kurt, thanks  for the intuitive case, it really clear, you’re wright
>> that I want to propose to change the return value of these functions. It’s
>> the most important part of the topic from user's perspective.
>> >
>> >> I think this definitely deserves a FLIP.
>> > To Jark,  nice suggestion, I prepared a FLIP for this topic, and will
>> start the FLIP discussion soon.
>> >
>> >>> If use the default Flink SQL,&nbsp; the window time range of the
>> >>> statistics is incorrect, then the statistical results will naturally
>> be
>> >>> incorrect.
>> > To zhisheng, sorry to hear that this problem influenced your production
>> jobs,  Could you share your SQL pattern?  we can have more inputs and try
>> to resolve them.
>> >
>> >
>> > Best,
>> > Leonard
>>
>>
>>
>> >  2021-01-21,14:19,Jark Wu <imj...@gmail.com> :
>> >
>> > Great examples to understand the problem and the proposed changes,
>> @Kurt!
>> >
>> > Thanks Leonard for investigating this problem.
>> > The time-zone problems around time functions and windows have bothered a
>> > lot of users. It's time to fix them!
>> >
>> > The return value changes sound reasonable to me, and keeping the return
>> > type unchanged will minimize the surprise to the users.
>> > Besides that, I think it would be better to mention how this affects the
>> > window behaviors, and the interoperability with DataStream.
>> >
>> > I think this definitely deserves a FLIP.
>> >
>> > ====================================================
>> >
>> > Hi zhisheng,
>> >
>> > Do you have examples to illustrate which case will get the wrong window
>> > boundaries?
>> > That will help to verify whether the proposed changes can solve your
>> > problem.
>> >
>> > Best,
>> > Jark
>>
>>
>>
>>
>> > 2021-01-21,12:54,zhisheng <173855...@qq.com> :
>> >
>> > Thanks to Leonard Xu for discussing this tricky topic. At present,
>> there are many Flink jobs in our production environment that are used to
>> count day-level reports (eg: count PV/UV ).&nbsp;
>> >
>> > If use the default Flink SQL,&nbsp; the window time range of the
>> statistics is incorrect, then the statistical results will naturally be
>> incorrect.&nbsp;
>> >
>> > The user needs to deal with the time zone manually in order to solve
>> the problem.&nbsp;
>> >
>> > If Flink itself can solve these time zone issues, then I think it will
>> be user-friendly.
>> >
>> > Thank you
>> >
>> > Best!;
>> > zhisheng
>>
>>
>>
>>
>> >  2021-01-21,12:11,Kurt Young <ykt...@gmail.com> :
>> >
>> > cc this to user & user-zh mailing list because this will affect lots of
>> users, and also quite a lot of users
>> > were asking questions around this topic.
>> >
>> > Let me try to understand this from user's perspective.
>> >
>> > Your proposal will affect five functions, which are:
>> > PROCTIME()
>> > NOW()
>> > CURRENT_DATE
>> > CURRENT_TIME
>> > CURRENT_TIMESTAMP
>> > Before the changes, as I am writing this reply, the local time here is
>> 2021-01-21 12:03:35 (Beijing time, UTC+8).
>> > And I tried these 5 functions in sql client, and got:
>> >
>> > Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
>> CURRENT_TIME;
>> >
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> > |                  EXPR$0 |                  EXPR$1 |
>>  CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>> >
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> > | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |
>> 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>> >
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> > After the changes, the expected behavior will change to:
>> >
>> > Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
>> CURRENT_TIME;
>> >
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> > |                  EXPR$0 |                  EXPR$1 |
>>  CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>> >
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> > | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |
>> 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>> >
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> > The return type of now(), proctime() and CURRENT_TIMESTAMP still be
>> TIMESTAMP;
>> >
>> > Best,
>> > Kurt
>>
>>
>>
>>
>>
>>
>>

Reply via email to