I also prefer to not expand this FLIP further, but we could open a discussion thread right after this FLIP being accepted and start coding & reviewing. Make technique discussion and coding more pipelined will improve efficiency.
Best, Kurt On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu <xbjt...@gmail.com> wrote: > Hi, Timo > > > I do think that this topic must be part of the FLIP as well. Esp. if the > FLIP has the title "time function behavior" and this is clearly a > behavioral aspect. We are performing a heavy refactoring of the SQL query > semantics in Flink here which will affect a lot of users. We cannot rework > the time functions a third time after this. > > I checked a couple of other vendors. It seems that they all lock the > timestamp when the query is started. And as you said, in this case both > mature (Oracle) and less mature systems (Hive, MySQL) have the same > behavior. > > FLIP-162> “These problems come from the fact that lots of time-related > functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and > CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone." > The motivation of FLIP-162 is to correct the wrong time-related function > value which caused by timezone. And after our discussed before, we found > it's related to the function return type compared to SQL standard and other > vendors and thus we proposed make the function return type also consistent. > This is the exact meaning of the FLIP title and that the FLIP plans to do. > > But for the function materialization mechanism, we didn't consider yet as > a part of our plan because we need to fix the timezone and function type > issues no matter we modify the function materialization mechanism in the > future or not. > So I think it's not belong to this FLIP scope. > > It will have been a great work if we can fix current FLIP's 7 proposals > well, we don't want to expand the scope again Eps it's not part of our > plan. > > What do you think? @Timo > > And what’s others' thoughts? @Jark @Kurt > > Best, > Leonard > > > > > > Flink should not differ. I fear that we have to adopt this behavior as > well to call us standard compliant. Otherwise it will also not be possible > to have Hive compatibility with proper semantics. It could lead to > unintended behavior. > > > > I see two options for this topic: > > > > 1) Clearly distinguish between query-start and processing time > > > > MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We > could run all the previously discussed functions that have a meaning in > other systems in query-start time and use a different name for processing > time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOCALTIMESTAMP`, > `SYS_LOCALDATE`, `SYS_LOCALTIME`? > > > > 2) Introduce a config option > > > > We are non-compliant by default and allow typical batch behavior if > needed via a config option. But batch/stream unification should not mean > that we disable certain unification aspects by default. > > > > What do you think? > > > > Regards, > > Timo > > > > On 28.01.21 16:51, Leonard Xu wrote: > >> Hi, Timo > >>> I'm sorry that I need to open another discussion thread befoe voting > but I think we should also discuss this in this FLIP before it pops up at a > later stage. > >>> > >>> How do we want our time functions to behave in long running queries? > >> It’s okay to open this thread. Although I don’t want to consider the > function value materialization in this FLIP scope, I could try explain > something. > >>> See also: > >>> > https://stackoverflow.com/questions/5522656/sql-now-in-long-running-query > >>> > >>> I think this was never discussed thoroughly. Actually > CURRENT_TIMESTAMP/NOW/LOCALTIMESTAMP should have slightly different > semantics than PROCTIME(). What it is our current behavior? Are we > materializing those time values during planning? > >> Currently CURRENT_TIMESTAMP/NOW/LOCALTIMESTAMP keeps same behavior in > both Batch and Stream world, the function value is materialized for per > record not the query start(plan phase). > >> For PROCTIME(), it also keeps same behavior in both Batch and Stream > world, in fact we just supported PROCTIME() in Batch last week[1]. > >> In one word, we keep same semantics/behavior for Batch and Stream. > >>> Esp. long running batch queries might suffer from inconsistencies > here. When a timestamp is produced by one operator using CURRENT_TIMESTAMP > and a different one might filter relating to CURRENT_TIMESTAMP. > >> It’s a good question, and I've found some users have asked simillar > questions in user/user-zh mail-list, given a fact that many Batch systems > like Hive/Presto using the value of query start, but it’s not suitable for > Stream engine, for example user will use CURRENT_TIMESTAMP to define event > time. > >> As a unified Batch/Stream SQL engine, keep same semantics/behavior is > important, and I agree the Batch user case should also be considered. > >> But I think this should be discussed in another topic like 'the > unification of Batch/Stream' which is beyond the scope of this FLIP. > >> This FLIP aims to correct the wrong return type/return value of current > time functions. > >> Best, > >> Leonard > >> [1] https://issues.apache.org/jira/browse/FLINK-17868 < > https://issues.apache.org/jira/browse/FLINK-17868> < > https://issues.apache.org/jira/browse/FLINK-17868 < > https://issues.apache.org/jira/browse/FLINK-17868>> > >>> Regards, > >>> Timo > >>> > >>> > >>> On 28.01.21 13:46, Leonard Xu wrote: > >>>> Hi, Jark > >>>>> I have a minor suggestion: > >>>>> I think we will still suggest users use TIMESTAMP even if we have > TIMESTAMP_NTZ. Then it seems > >>>>> introducing TIMESTAMP_NTZ doesn't help much for users, but > introduces more learning costs. > >>>> I think your suggestion makes sense, we should suggest users use > TIMESTAMP for TIMESTAMP WITHOUT TIME ZONE as we did now, updated as > following: > >>>> original type name : > shortcut type name : > >>>> TIMESTAMP / TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP > >>>> TIMESTAMP WITH LOCAL TIME ZONE <=> > TIMESTAMP_LTZ > >>>> TIMESTAMP WITH TIME ZONE <=> > TIMESTAMP_TZ (supports them in the future) > >>>> Best, > >>>> Leonard > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Thu, 28 Jan 2021 at 18:52, Leonard Xu <xbjt...@gmail.com <mailto: > xbjt...@gmail.com> <mailto:xbjt...@gmail.com <mailto:xbjt...@gmail.com>>> > wrote: > >>>>> > >>>>>> Thanks all for sharing your opinions. > >>>>>> > >>>>>> Looks like we’ve reached a consensus about the topic. > >>>>>> > >>>>>> @Timo: > >>>>>>> 1) Are we on the same page that LOCALTIMESTAMP returns TIMESTAMP > and not > >>>>>> TIMESTAMP_LTZ? Maybe we should quickly list also > LOCALTIME/LOCALDATE and > >>>>>> LOCALTIMESTAMP for completeness. > >>>>>> Yes, LOCALTIMESTAMP returns TIMESTAMP, LOCALTIME returns TIME, the > >>>>>> behavior of them is clear so I just listed them in the excel[1] of > this > >>>>>> FLIP references. > >>>>>> > >>>>>>> 2) Shall we add aliases for the timestamp types as part of this > FLIP? I > >>>>>> see Snowflake supports TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ > [1]. I > >>>>>> think the discussion was quite cumbersome with the full string of > >>>>>> `TIMESTAMP WITH LOCAL TIME ZONE`. With this FLIP we are making this > type > >>>>>> even more prominent. And important concepts should have a short name > >>>>>> because they are used frequently. According to the FLIP, we are > introducing > >>>>>> the abbriviation already in function names like `TO_TIMESTAMP_LTZ`. > >>>>>> `TIMESTAMP_LTZ` could be treated similar to `STRING` for > >>>>>> `VARCHAR(MAX_INT)`, the serializable string representation would > not change. > >>>>>> > >>>>>> @Timo @Jark > >>>>>> Nice idea, I also suffered from the long name during the > discussions, the > >>>>>> abbreviation will not only help us, but also makes it more > convenient for > >>>>>> users. I list the abbreviation name mapping to support: > >>>>>> TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP_NTZ (which > synonyms > >>>>>> TIMESTAMP) > >>>>>> TIMESTAMP WITH LOCAL TIME ZONE <=> TIMESTAMP_LTZ > >>>>>> TIMESTAMP WITH TIME ZONE <=> TIMESTAMP_TZ > (supports > >>>>>> them in the future) > >>>>>>> 3) I'm fine with supporting all conversion classes like > >>>>>> java.time.LocalDateTime, java.sql.Timestamp that TimestampType > supported > >>>>>> for LocalZonedTimestampType. But we agree that Instant stays the > default > >>>>>> conversion class right? The default extraction defined in [2] will > not > >>>>>> change, correct? > >>>>>> Yes, Instant stays the default conversion class. The default > >>>>>> > >>>>>>> 4) I would remove the comment "Flink supports TIME-related types > with > >>>>>> precision well", because unfortunately this is still not correct. > We still > >>>>>> have issues with TIME(9), it would be great if someone can finally > fix that > >>>>>> though. Maybe the implementation of this FLIP would be a good time > to fix > >>>>>> this issue. > >>>>>> You’re right, TIME(9) is not supported yet, I'll take account of > TIME(9) > >>>>>> to the scope of this FLIP. > >>>>>> > >>>>>> > >>>>>> I’ve updated this FLIP[2] according your suggestions @Jark @Timo > >>>>>> I’ll start the vote soon if there’re no objections. > >>>>>> > >>>>>> Best, > >>>>>> Leonard > >>>>>> > >>>>>> [1] > >>>>>> > https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing > >>>>>> < > >>>>>> > https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing > < > https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing > > > >>>>>>> > >>>>>> [2] > >>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior > > > >>>>>> < > >>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior > >> > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> On 28.01.21 03:18, Jark Wu wrote: > >>>>>>>> Thanks Leonard for the further investigation. > >>>>>>>> I think we all agree we should correct the return value of > >>>>>>>> CURRENT_TIMESTAMP. > >>>>>>>> Regarding the return type of CURRENT_TIMESTAMP, I also agree > >>>>>> TIMESTAMP_LTZ > >>>>>>>> would be more worldwide useful. This may need more effort, but if > this > >>>>>> is > >>>>>>>> the right direction, we should do it. > >>>>>>>> Regarding the CURRENT_TIME, if CURRENT_TIMESTAMP returns > >>>>>>>> TIMESTAMP_LTZ, then I think CURRENT_TIME shouldn't return TIME_TZ. > >>>>>>>> Otherwise, CURRENT_TIME will be quite special and strange. > >>>>>>>> Thus I think it has to return TIME type. Given that we already > have > >>>>>>>> CURRENT_DATE which returns > >>>>>>>> DATE WITHOUT TIME ZONE, I think it's fine to return TIME WITHOUT > TIME > >>>>>> ZONE > >>>>>>>> for CURRENT_TIME. > >>>>>>>> In a word, the updated FLIP looks good to me. I especially like > the > >>>>>>>> proposed new function TO_TIMESTAMP_LTZ(numeric, [,scale]). > >>>>>>>> This will be very convenient to define rowtime on a long value > which is > >>>>>> a > >>>>>>>> very common case and has been complained a lot in mailing list. > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> On Mon, 25 Jan 2021 at 21:12, Kurt Young <ykt...@gmail.com> > wrote: > >>>>>>>>> Thanks Leonard for the detailed response and also the bad case > about > >>>>>> option > >>>>>>>>> 1, these all > >>>>>>>>> make sense to me. > >>>>>>>>> > >>>>>>>>> Also nice catch about conversion support of > LocalZonedTimestampType, I > >>>>>>>>> think it actually > >>>>>>>>> makes sense to support java.sql.Timestamp as well as > >>>>>>>>> java.time.LocalDateTime. It also has > >>>>>>>>> a slight benefit that we might have a chance to run the udf > which took > >>>>>> them > >>>>>>>>> as input parameter > >>>>>>>>> after we change the return type. > >>>>>>>>> > >>>>>>>>> Regarding to the return type of CURRENT_TIME, I also think > timezone > >>>>>>>>> information is not useful. > >>>>>>>>> To not expand this FLIP further, I'm lean to keep it as it is. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Kurt > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Jan 25, 2021 at 8:50 PM Leonard Xu <xbjt...@gmail.com> > wrote: > >>>>>>>>> > >>>>>>>>>> Hi, All > >>>>>>>>>> > >>>>>>>>>> Thanks for your comments. I think all of the thread have agreed > that: > >>>>>>>>>> (1) The return values of > >>>>>> CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME() > >>>>>>>>>> are wrong. > >>>>>>>>>> (2) The LOCALTIME/LOCALTIMESTAMP and > CURRENT_TIME/CURRENT_TIMESTAMP > >>>>>>>>> should > >>>>>>>>>> be different whether from SQL standard’s perspective or mature > >>>>>> systems. > >>>>>>>>>> (3) The semantics of three TIMESTAMP types in Flink SQL follows > the > >>>>>> SQL > >>>>>>>>>> standard and also keeps the same with other 'good' vendors. > >>>>>>>>>> TIMESTAMP => A literal in > >>>>>>>>>> ‘yyyy-MM-dd HH:mm:ss’ format to describe a time, does not > contain > >>>>>>>>> timezone > >>>>>>>>>> info, can not represent an absolute time point. > >>>>>>>>>> TIMESTAMP WITH LOCAL ZONE => Records the elapsed time from > >>>>>> absolute > >>>>>>>>>> time point origin, can represent an absolute time point, > requires > >>>>>> local > >>>>>>>>>> time zone when expressed with ‘yyyy-MM-dd HH:mm:ss’ format. > >>>>>>>>>> TIMESTAMP WITH TIME ZONE => Consists of time zone info > and a > >>>>>>>>>> literal in ‘yyyy-MM-dd HH:mm:ss’ format to describe time, can > >>>>>> represent > >>>>>>>>> an > >>>>>>>>>> absolute time point. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Currently we've two ways to correct > >>>>>>>>>> CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME(). > >>>>>>>>>> > >>>>>>>>>> option (1): As the FLIP proposed, change the return value from > UTC > >>>>>>>>>> timezone to local timezone. > >>>>>>>>>> Pros: (1) The change looks smaller to users and > developers > >>>>>> (2) > >>>>>>>>>> There're many SQL engines adopted this way > >>>>>>>>>> Cons: (1) connector devs may confuse the underlying > value of > >>>>>>>>>> TimestampData which needs to change according to data type (2) > I > >>>>>> thought > >>>>>>>>>> about this weekend. Unfortunately I found a bad case: > >>>>>>>>>> > >>>>>>>>>> The proposal is fine if we only use it in FLINK SQL world, but > we > >>>>>> need to > >>>>>>>>>> consider the conversion between Table/DataStream, assume a > record > >>>>>>>>> produced > >>>>>>>>>> in UTC+0 timezone with TIMESTAMP '1970-01-01 08:00:44' and the > Flink > >>>>>> SQL > >>>>>>>>>> processes the data with session time zone 'UTC+8', if the sql > program > >>>>>>>>> need > >>>>>>>>>> to convert the Table to DataStream, then we need to calculate > the > >>>>>>>>> timestamp > >>>>>>>>>> in StreamRecord with session time zone (UTC+8), then we will > get 44 in > >>>>>>>>>> DataStream program, but it is wrong because the expected value > should > >>>>>> be > >>>>>>>>> (8 > >>>>>>>>>> * 60 * 60 + 44). The corner case tell us that the > ROWTIME/PROCTIME in > >>>>>>>>> Flink > >>>>>>>>>> are based on UTC+0, when correct the PROCTIME() function, the > better > >>>>>> way > >>>>>>>>> is > >>>>>>>>>> to use TIMESTAMP WITH LOCAL TIME ZONE which keeps same long > value with > >>>>>>>>> time > >>>>>>>>>> based on UTC+0 and can be expressed with local timezone. > >>>>>>>>>> > >>>>>>>>>> option (2) : As we considered in the FLIP as well as @Timo > suggested, > >>>>>>>>>> change the return type to TIMESTAMP WITH LOCAL TIME ZONE, the > >>>>>> expressed > >>>>>>>>>> value depends on the local time zone. > >>>>>>>>>> Pros: (1) Make Flink SQL more close to SQL standard (2) > Can > >>>>>> deal > >>>>>>>>>> the conversion between Table/DataStream well > >>>>>>>>>> Cons: (1) We need to discuss the return value/type of > >>>>>>>>> CURRENT_TIME > >>>>>>>>>> function (2) The change is bigger to users, we need to support > >>>>>> TIMESTAMP > >>>>>>>>>> WITH LOCAL TIME ZONE in connectors/formats as well as custom > >>>>>> connectors. > >>>>>>>>>> (3)The TIMESTAMP WITH LOCAL TIME ZONE support > is > >>>>>> weak > >>>>>>>>>> in Flink, thus we need some improvement,but the workload does > not > >>>>>> matter > >>>>>>>>>> as long as we are doing the right thing ^_^ > >>>>>>>>>> > >>>>>>>>>> Due to the above bad case for option (1). I think option 2 > should be > >>>>>>>>>> adopted, > >>>>>>>>>> But we also need to consider some problems: > >>>>>>>>>> (1) More conversion classes like LocalDateTime, sql.Timestamp > should > >>>>>> be > >>>>>>>>>> supported for LocalZonedTimestampType to resolve the UDF > compatibility > >>>>>>>>> issue > >>>>>>>>>> (2) The timezone offset for window size of one day should still > be > >>>>>>>>>> considered > >>>>>>>>>> (3) All connectors/formats should supports TIMESTAMP WITH LOCAL > TIME > >>>>>> ZONE > >>>>>>>>>> well and we also should record in document > >>>>>>>>>> I’ll update these sections of FLIP-162. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> We also need to discuss the CURRENT_TIME function. I know the > standard > >>>>>>>>> way > >>>>>>>>>> is using TIME WITH TIME ZONE(there's no TIME WITH LOCAL TIME > ZONE), > >>>>>> but > >>>>>>>>> we > >>>>>>>>>> don't support this type yet and I don't see strong motivation to > >>>>>> support > >>>>>>>>> it > >>>>>>>>>> so far. > >>>>>>>>>> Compared to CURRENT_TIMESTAMP, the CURRENT_TIME can not > represent an > >>>>>>>>>> absolute time point which should be considered as a string > consisting > >>>>>> of > >>>>>>>>> a > >>>>>>>>>> time with 'HH:mm:ss' format and time zone info. We have several > >>>>>> options > >>>>>>>>>> for this: > >>>>>>>>>> (1) We can forbid CURRENT_TIME as @Timo proposed to make all > Flink SQL > >>>>>>>>>> functions follow the standard well, in this way, we need to > offer > >>>>>> some > >>>>>>>>>> guidance for user upgrading Flink versions. > >>>>>>>>>> (2) We can also support it from a user's perspective who has > used > >>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, btw,Snowflake also > >>>>>> returns > >>>>>>>>>> TIME type. > >>>>>>>>>> (3) Returns TIMESTAMP WITH LOCAL TIME ZONE to make it equal to > >>>>>>>>>> CURRENT_TIMESTAMP as Calcite did. > >>>>>>>>>> > >>>>>>>>>> I can image (1) which we don't want to left a bad smell in > Flink SQL, > >>>>>>>>> and > >>>>>>>>>> I also accept (2) because I think users do not consider time > zone > >>>>>> issues > >>>>>>>>>> when they use CURRENT_DATE/CURRENT_TIME, and the timezone info > in > >>>>>> time is > >>>>>>>>>> not very useful. > >>>>>>>>>> > >>>>>>>>>> I don’t have a strong opinion for them. What do others think? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I hope I've addressed your concerns. @Timo @Kurt > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Leonard > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Most of the mature systems have a clear difference between > >>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. I wouldn't take Spark or > Hive > >>>>>> as a > >>>>>>>>>> good example. Snowflake decided for TIMESTAMP WITH LOCAL TIME > ZONE. > >>>>>> As I > >>>>>>>>>> mentioned in the last comment, I could also imagine this > behavior for > >>>>>>>>>> Flink. But in any case, there should be some time zone > information > >>>>>>>>>> considered in order to cast to all other types. > >>>>>>>>>>> > >>>>>>>>>>>>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL > >>>>>>>>>> standard, but > >>>>>>>>>>>>>> LOCALDATE not, I don’t think it’s a good idea that dropping > >>>>>>>>>> functions which > >>>>>>>>>>>>>> SQL standard supported and introducing a replacement which > SQL > >>>>>>>>>> standard not > >>>>>>>>>>>>>> reminded. > >>>>>>>>>>> > >>>>>>>>>>> We can still add those functions in the future. But since we > don't > >>>>>>>>> offer > >>>>>>>>>> a TIME WITH TIME ZONE, it is better to not support this > function at > >>>>>> all > >>>>>>>>> for > >>>>>>>>>> now. And by the way, this is exactly the behavior that also > Microsoft > >>>>>> SQL > >>>>>>>>>> Server does: it also just supports CURRENT_TIMESTAMP (but it > returns > >>>>>>>>>> TIMESTAMP without a zone which completes the confusion). > >>>>>>>>>>> > >>>>>>>>>>>>>> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for > >>>>>> PROCTIME > >>>>>>>>>> has > >>>>>>>>>>>>>> more clear semantics, but I realized that user didn’t care > the > >>>>>> type > >>>>>>>>>> but > >>>>>>>>>>>>>> more about the expressed value they saw, and change the > type from > >>>>>>>>>> TIMESTAMP > >>>>>>>>>>>>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that > we > >>>>>> need > >>>>>>>>>>>>>> consider all places where the TIMESTAMP type used > >>>>>>>>>>> > >>>>>>>>>>> From a UDF perspective, I think nothing will change. The new > type > >>>>>>>>> system > >>>>>>>>>> and type inference were designed to support all these cases. > There is > >>>>>> a > >>>>>>>>>> reason why Java has adopted Joda time, because it is hard to > come up > >>>>>>>>> with a > >>>>>>>>>> good time library. That's why also we and the other Hadoop > ecosystem > >>>>>>>>> folks > >>>>>>>>>> have decided for 3 different kinds of LocalDateTime, > ZonedDateTime, > >>>>>> and > >>>>>>>>>> Instance. It makes the library more complex, but time is a > complex > >>>>>> topic. > >>>>>>>>>>> > >>>>>>>>>>> I also doubt that many users work with only one time zone. > Take the > >>>>>> US > >>>>>>>>>> as an example, a country with 3 different timezones. Somebody > working > >>>>>>>>> with > >>>>>>>>>> US data cannot properly see the data points with just LOCAL > TIME ZONE. > >>>>>>>>> But > >>>>>>>>>> on the other hand, a lot of event data is stored using a UTC > >>>>>> timestamp. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>> Before jumping into technique details, let's take a step > back to > >>>>>>>>>> discuss > >>>>>>>>>>>>> user experience. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The first important question is what kind of date and time > will > >>>>>>>>> Flink > >>>>>>>>>>>>> display when users call > >>>>>>>>>>>>> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they > are > >>>>>>>>>> similar). > >>>>>>>>>>>>> > >>>>>>>>>>>>> Should it always display the date and time in UTC or in the > user's > >>>>>>>>>> time > >>>>>>>>>>>>> zone? > >>>>>>>>>>> > >>>>>>>>>>> @Kurt: I think we all agree that the current behavior with just > >>>>>> showing > >>>>>>>>>> UTC is wrong. Also, we all agree that when calling > CURRENT_TIMESTAMP > >>>>>> or > >>>>>>>>>> PROCTIME a user would like to see the time in it's current time > zone. > >>>>>>>>>>> > >>>>>>>>>>> As you said, "my wall clock time". > >>>>>>>>>>> > >>>>>>>>>>> However, the question is what is the data type of what you > "see". If > >>>>>>>>> you > >>>>>>>>>> pass this record on to a different system, operator, or > different > >>>>>>>>> cluster, > >>>>>>>>>> should the "my" get lost or materialized into the record? > >>>>>>>>>>> > >>>>>>>>>>> TIMESTAMP -> completely lost and could cause confusion in a > different > >>>>>>>>>> system > >>>>>>>>>>> > >>>>>>>>>>> TIMESTAMP WITH LOCAL TIME ZONE -> at least the UTC is correct, > so you > >>>>>>>>>> can provide a new local time zone > >>>>>>>>>>> > >>>>>>>>>>> TIMESTAMP WITH TIME ZONE -> also "your" location is persisted > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 22.01.21 09:38, Kurt Young wrote: > >>>>>>>>>>>> Forgot one more thing. Continue with displaying in UTC. As a > user, > >>>>>> if > >>>>>>>>>> Flink > >>>>>>>>>>>> want to display the timestamp > >>>>>>>>>>>> in UTC, why don't we offer something like UTC_TIMESTAMP? > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Kurt > >>>>>>>>>>>> On Fri, Jan 22, 2021 at 4:33 PM Kurt Young <ykt...@gmail.com> > >>>>>> wrote: > >>>>>>>>>>>>> Before jumping into technique details, let's take a step > back to > >>>>>>>>>> discuss > >>>>>>>>>>>>> user experience. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The first important question is what kind of date and time > will > >>>>>> Flink > >>>>>>>>>>>>> display when users call > >>>>>>>>>>>>> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they > are > >>>>>>>>>> similar). > >>>>>>>>>>>>> > >>>>>>>>>>>>> Should it always display the date and time in UTC or in the > user's > >>>>>>>>> time > >>>>>>>>>>>>> zone? I think this part is the > >>>>>>>>>>>>> reason that surprised lots of users. If we forget about the > type > >>>>>> and > >>>>>>>>>>>>> internal representation of these > >>>>>>>>>>>>> two methods, as a user, my instinct tells me that these two > methods > >>>>>>>>>> should > >>>>>>>>>>>>> display my wall clock time. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Display time in UTC? I'm not sure, why I should care about > UTC > >>>>>> time? > >>>>>>>>> I > >>>>>>>>>>>>> want to get my current timestamp. > >>>>>>>>>>>>> For those users who have never gone abroad, they might not > even be > >>>>>>>>>> able to > >>>>>>>>>>>>> realize that this is affected > >>>>>>>>>>>>> by the time zone. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Kurt > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Jan 22, 2021 at 12:25 PM Leonard Xu < > xbjt...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks @Timo for the detailed reply, let's go on this topic > on > >>>>>> this > >>>>>>>>>>>>>> discussion, I've merged all mails to this discussion. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm very sceptical about this behavior. Almost all mature > systems > >>>>>>>>>>>>>> (Oracle, Postgres) and new high quality systems (Presto, > >>>>>> Snowflake) > >>>>>>>>>> use a > >>>>>>>>>>>>>> data type with some degree of time zone information > encoded. In a > >>>>>>>>>>>>>> globalized world with businesses spanning different > regions, I > >>>>>> think > >>>>>>>>>> we > >>>>>>>>>>>>>> should do this as well. There should be a difference between > >>>>>>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be > able to > >>>>>>>>>> choose > >>>>>>>>>>>>>> which behavior they prefer for their pipeline. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I know that the two series should be different at first > glance, > >>>>>> but > >>>>>>>>>>>>>> different SQL engines can have their own explanations,for > example, > >>>>>>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in > Snowflake[1] > >>>>>>>>> and > >>>>>>>>>> has > >>>>>>>>>>>>>> no difference, and Spark only supports the later one and > doesn’t > >>>>>>>>>> support > >>>>>>>>>>>>>> LOCALTIME/LOCALTIMESTAMP[2]. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If we would design this from scatch, I would suggest the > >>>>>> following: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick > LOCALDATE / > >>>>>>>>>>>>>> LOCALTIME for materialized timestamp parts > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL > >>>>>>>>> standard, > >>>>>>>>>> but > >>>>>>>>>>>>>> LOCALDATE not, I don’t think it’s a good idea that dropping > >>>>>>>>> functions > >>>>>>>>>> which > >>>>>>>>>>>>>> SQL standard supported and introducing a replacement which > SQL > >>>>>>>>>> standard not > >>>>>>>>>>>>>> reminded. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME > ZONE to > >>>>>>>>>>>>>> materialize all session time information into every record. > It it > >>>>>>>>> the > >>>>>>>>>> most > >>>>>>>>>>>>>> generic data type and allows to cast to all other timestamp > data > >>>>>>>>>> types. > >>>>>>>>>>>>>> This generic ability can be used for filter predicates as > well > >>>>>>>>> either > >>>>>>>>>>>>>> through implicit or explicit casting. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> TIMESTAMP WITH TIME ZONE indeed contains more information to > >>>>>>>>> describe > >>>>>>>>>> a > >>>>>>>>>>>>>> time point, but the type TIMESTAMP can cast to all other > >>>>>> timestamp > >>>>>>>>>> data > >>>>>>>>>>>>>> types combining with session time zone as well, and it also > can be > >>>>>>>>>> used for > >>>>>>>>>>>>>> filter predicates. For type casting between BIGINT and > TIMESTAMP, > >>>>>> I > >>>>>>>>>> think > >>>>>>>>>>>>>> the function way using TO_TIMEMTAMP()/FROM_UNIXTIMESTAMP() > is more > >>>>>>>>>> clear. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> PROCTIME/ROWTIME should be time functions based on a long > value. > >>>>>>>>> Both > >>>>>>>>>>>>>> System.currentMillis() and our watermark system work on long > >>>>>> values. > >>>>>>>>>> Those > >>>>>>>>>>>>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the > main > >>>>>>>>>> calculation > >>>>>>>>>>>>>> should always happen based on UTC. > >>>>>>>>>>>>>>> We discussed it in a different thread, but we should allow > >>>>>> PROCTIME > >>>>>>>>>>>>>> globally. People need a way to create instances of > TIMESTAMP WITH > >>>>>>>>>> LOCAL > >>>>>>>>>>>>>> TIME ZONE. This is not considered in the current design doc. > >>>>>>>>>>>>>>> Many pipelines contain UTC timestamps and thus it should > be easy > >>>>>> to > >>>>>>>>>>>>>> create one. > >>>>>>>>>>>>>>> Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work > with > >>>>>> this > >>>>>>>>>> type > >>>>>>>>>>>>>> because we should remember that TIMESTAMP WITH LOCAL TIME > ZONE > >>>>>>>>>> accepts all > >>>>>>>>>>>>>> timestamp data types as casting target [1]. We could allow > >>>>>> TIMESTAMP > >>>>>>>>>> WITH > >>>>>>>>>>>>>> TIME ZONE in the future for ROWTIME. > >>>>>>>>>>>>>>> In any case, windows should simply adapt their behavior to > the > >>>>>>>>> passed > >>>>>>>>>>>>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a > day is > >>>>>>>>>> defined by > >>>>>>>>>>>>>> considering the current session time zone. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for > >>>>>> PROCTIME > >>>>>>>>>> has > >>>>>>>>>>>>>> more clear semantics, but I realized that user didn’t care > the > >>>>>> type > >>>>>>>>>> but > >>>>>>>>>>>>>> more about the expressed value they saw, and change the > type from > >>>>>>>>>> TIMESTAMP > >>>>>>>>>>>>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that > we > >>>>>> need > >>>>>>>>>>>>>> consider all places where the TIMESTAMP type used, and many > >>>>>> builtin > >>>>>>>>>>>>>> functions and UDFs doest not support TIMESTAMP WITH LOCAL > TIME > >>>>>> ZONE > >>>>>>>>>> type. > >>>>>>>>>>>>>> That means both user and Flink devs need to refactor the > code(UDF, > >>>>>>>>>> builtin > >>>>>>>>>>>>>> functions, sql pipeline), to be honest, I didn’t see strong > >>>>>>>>>> motivation that > >>>>>>>>>>>>>> we have to do the pretty big refactor from user’s > perspective and > >>>>>>>>>>>>>> developer’s perspective. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In one word, both your suggestion and my proposal can > resolve > >>>>>> almost > >>>>>>>>>> all > >>>>>>>>>>>>>> user problems,the divergence is whether we need to spend > pretty > >>>>>>>>>> energy just > >>>>>>>>>>>>>> to get a bit more accurate semantics? I think we need a > >>>>>> tradeoff. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Leonard > >>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>> > https://trino.io/docs/current/functions/datetime.html#current_timestamp > >>>>>>>>> < > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>> > https://trino.io/docs/current/functions/datetime.html#current_timestamp> > >>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/SPARK-30374 < > >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-30374> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2021-01-22,00:53,Timo Walther <twal...@apache.org> : > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Leonard, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> thanks for working on this topic. I agree that time > handling is > >>>>>> not > >>>>>>>>>>>>>> easy in Flink at the moment. We added new time data types > (and > >>>>>> some > >>>>>>>>>> are > >>>>>>>>>>>>>> still not supported which even further complicates things > like > >>>>>>>>>> TIME(9)). We > >>>>>>>>>>>>>> should definitely improve this situation for users. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is a pretty opinionated topic and it seems that the > SQL > >>>>>>>>> standard > >>>>>>>>>>>>>> is not really deciding this but is at least supporting. So > let me > >>>>>>>>>> express > >>>>>>>>>>>>>> my opinion for the most important functions: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think those are the most obvious ones because the LOCAL > >>>>>> indicates > >>>>>>>>>>>>>> that the locality should be materialized into the result > and any > >>>>>>>>> time > >>>>>>>>>> zone > >>>>>>>>>>>>>> information (coming from session config or data) is not > important > >>>>>>>>>>>>>> afterwards. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm very sceptical about this behavior. Almost all mature > systems > >>>>>>>>>>>>>> (Oracle, Postgres) and new high quality systems (Presto, > >>>>>> Snowflake) > >>>>>>>>>> use a > >>>>>>>>>>>>>> data type with some degree of time zone information > encoded. In a > >>>>>>>>>>>>>> globalized world with businesses spanning different > regions, I > >>>>>> think > >>>>>>>>>> we > >>>>>>>>>>>>>> should do this as well. There should be a difference between > >>>>>>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be > able to > >>>>>>>>>> choose > >>>>>>>>>>>>>> which behavior they prefer for their pipeline. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If we would design this from scatch, I would suggest the > >>>>>> following: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick > LOCALDATE / > >>>>>>>>>>>>>> LOCALTIME for materialized timestamp parts > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME > ZONE to > >>>>>>>>>>>>>> materialize all session time information into every record. > It it > >>>>>>>>> the > >>>>>>>>>> most > >>>>>>>>>>>>>> generic data type and allows to cast to all other timestamp > data > >>>>>>>>>> types. > >>>>>>>>>>>>>> This generic ability can be used for filter predicates as > well > >>>>>>>>> either > >>>>>>>>>>>>>> through implicit or explicit casting. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> PROCTIME/ROWTIME should be time functions based on a long > value. > >>>>>>>>> Both > >>>>>>>>>>>>>> System.currentMillis() and our watermark system work on long > >>>>>> values. > >>>>>>>>>> Those > >>>>>>>>>>>>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the > main > >>>>>>>>>> calculation > >>>>>>>>>>>>>> should always happen based on UTC. We discussed it in a > different > >>>>>>>>>> thread, > >>>>>>>>>>>>>> but we should allow PROCTIME globally. People need a way to > create > >>>>>>>>>>>>>> instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not > >>>>>> considered > >>>>>>>>>> in the > >>>>>>>>>>>>>> current design doc. Many pipelines contain UTC timestamps > and thus > >>>>>>>>> it > >>>>>>>>>>>>>> should be easy to create one. Also, both CURRENT_TIMESTAMP > and > >>>>>>>>>>>>>> LOCALTIMESTAMP can work with this type because we should > remember > >>>>>>>>> that > >>>>>>>>>>>>>> TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data > types as > >>>>>>>>>> casting > >>>>>>>>>>>>>> target [1]. We could allow TIMESTAMP WITH TIME ZONE in the > future > >>>>>>>>> for > >>>>>>>>>>>>>> ROWTIME. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> In any case, windows should simply adapt their behavior to > the > >>>>>>>>> passed > >>>>>>>>>>>>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a > day is > >>>>>>>>>> defined by > >>>>>>>>>>>>>> considering the current session time zone. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If we would like to design this with less effort required, > we > >>>>>> could > >>>>>>>>>>>>>> think about returning TIMESTAMP WITH LOCAL TIME ZONE also > for > >>>>>>>>>>>>>> CURRENT_TIMESTAMP. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I will try to involve more people into this discussion. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 > >>>>>>>>>>>>>> < > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2021-01-21,22:32,Leonard Xu <xbjt...@gmail.com> : > >>>>>>>>>>>>>>>> Before the changes, as I am writing this reply, the local > time > >>>>>>>>> here > >>>>>>>>>> is > >>>>>>>>>>>>>> 2021-01-21 12:03:35 (Beijing time, UTC+8). > >>>>>>>>>>>>>>>> And I tried these 5 functions in sql client, and got: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, > >>>>>>>>>> CURRENT_DATE, > >>>>>>>>>>>>>> CURRENT_TIME; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>>> | EXPR$0 | EXPR$1 | > >>>>>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | > >>>>>>>>>>>>>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>>> After the changes, the expected behavior will change to: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, > >>>>>>>>>> CURRENT_DATE, > >>>>>>>>>>>>>> CURRENT_TIME; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>>> | EXPR$0 | EXPR$1 | > >>>>>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | > >>>>>>>>>>>>>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>>> The return type of now(), proctime() and > CURRENT_TIMESTAMP still > >>>>>>>>> be > >>>>>>>>>>>>>> TIMESTAMP; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> To Kurt, thanks for the intuitive case, it really clear, > you’re > >>>>>>>>>> wright > >>>>>>>>>>>>>> that I want to propose to change the return value of these > >>>>>>>>> functions. > >>>>>>>>>> It’s > >>>>>>>>>>>>>> the most important part of the topic from user's > perspective. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think this definitely deserves a FLIP. > >>>>>>>>>>>>>>> To Jark, nice suggestion, I prepared a FLIP for this > topic, and > >>>>>>>>> will > >>>>>>>>>>>>>> start the FLIP discussion soon. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If use the default Flink SQL, the window time > range of > >>>>>> the > >>>>>>>>>>>>>>>>> statistics is incorrect, then the statistical results > will > >>>>>>>>>> naturally > >>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>> incorrect. > >>>>>>>>>>>>>>> To zhisheng, sorry to hear that this problem influenced > your > >>>>>>>>>> production > >>>>>>>>>>>>>> jobs, Could you share your SQL pattern? we can have more > inputs > >>>>>>>>> and > >>>>>>>>>> try > >>>>>>>>>>>>>> to resolve them. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Leonard > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2021-01-21,14:19,Jark Wu <imj...@gmail.com> : > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Great examples to understand the problem and the proposed > >>>>>> changes, > >>>>>>>>>>>>>> @Kurt! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks Leonard for investigating this problem. > >>>>>>>>>>>>>>> The time-zone problems around time functions and windows > have > >>>>>>>>>> bothered a > >>>>>>>>>>>>>>> lot of users. It's time to fix them! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The return value changes sound reasonable to me, and > keeping the > >>>>>>>>>> return > >>>>>>>>>>>>>>> type unchanged will minimize the surprise to the users. > >>>>>>>>>>>>>>> Besides that, I think it would be better to mention how > this > >>>>>>>>> affects > >>>>>>>>>> the > >>>>>>>>>>>>>>> window behaviors, and the interoperability with DataStream. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think this definitely deserves a FLIP. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ==================================================== > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi zhisheng, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Do you have examples to illustrate which case will get the > wrong > >>>>>>>>>> window > >>>>>>>>>>>>>>> boundaries? > >>>>>>>>>>>>>>> That will help to verify whether the proposed changes can > solve > >>>>>>>>> your > >>>>>>>>>>>>>>> problem. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2021-01-21,12:54,zhisheng <173855...@qq.com> : > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks to Leonard Xu for discussing this tricky topic. At > >>>>>> present, > >>>>>>>>>>>>>> there are many Flink jobs in our production environment > that are > >>>>>>>>> used > >>>>>>>>>> to > >>>>>>>>>>>>>> count day-level reports (eg: count PV/UV ). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If use the default Flink SQL, the window time range > of the > >>>>>>>>>>>>>> statistics is incorrect, then the statistical results will > >>>>>> naturally > >>>>>>>>>> be > >>>>>>>>>>>>>> incorrect. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The user needs to deal with the time zone manually in > order to > >>>>>>>>> solve > >>>>>>>>>>>>>> the problem. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If Flink itself can solve these time zone issues, then I > think it > >>>>>>>>>> will > >>>>>>>>>>>>>> be user-friendly. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best!; > >>>>>>>>>>>>>>> zhisheng > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2021-01-21,12:11,Kurt Young <ykt...@gmail.com> : > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> cc this to user & user-zh mailing list because this will > affect > >>>>>>>>> lots > >>>>>>>>>> of > >>>>>>>>>>>>>> users, and also quite a lot of users > >>>>>>>>>>>>>>> were asking questions around this topic. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Let me try to understand this from user's perspective. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Your proposal will affect five functions, which are: > >>>>>>>>>>>>>>> PROCTIME() > >>>>>>>>>>>>>>> NOW() > >>>>>>>>>>>>>>> CURRENT_DATE > >>>>>>>>>>>>>>> CURRENT_TIME > >>>>>>>>>>>>>>> CURRENT_TIMESTAMP > >>>>>>>>>>>>>>> Before the changes, as I am writing this reply, the local > time > >>>>>> here > >>>>>>>>>> is > >>>>>>>>>>>>>> 2021-01-21 12:03:35 (Beijing time, UTC+8). > >>>>>>>>>>>>>>> And I tried these 5 functions in sql client, and got: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, > >>>>>>>>> CURRENT_DATE, > >>>>>>>>>>>>>> CURRENT_TIME; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>> | EXPR$0 | EXPR$1 | > >>>>>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | > >>>>>>>>>>>>>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>> After the changes, the expected behavior will change to: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, > >>>>>>>>> CURRENT_DATE, > >>>>>>>>>>>>>> CURRENT_TIME; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>> | EXPR$0 | EXPR$1 | > >>>>>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | > >>>>>>>>>>>>>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > +-------------------------+-------------------------+-------------------------+--------------+--------------+ > >>>>>>>>>>>>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP > still > >>>>>> be > >>>>>>>>>>>>>> TIMESTAMP; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Kurt > >