Hey Leonard, Thanks for summarizing the document. I have one quick question. I understand a temporal table w/o version means each row in the table only has one version. But are we still able to track different views of such a table through time, as rows are added/deleted to/from the table? For example, suppose I have an append-only table source with event-time and PK, will I be allowed to do an event-time temporal join with this table?
On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjt...@gmail.com> wrote: > Hi, all > > After a detailed offline discussion about the temporal table related > concept and behavior, we had a reliable solution and rejected several > alternatives. > Compared to rejected alternatives, the proposed approach is a more unified > story and also friendly to user and current Flink framework. > I improved the FLIP[1] with the proposed approach and refactored the > document organization to make it clear enough. > > Please let me know if you have any concerns, I’m looking forward your > comments. > > > Best > Leonard > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL > > > > > > 在 2020年8月4日,21:25,Leonard Xu <xbjt...@gmail.com> 写道: > > > > Hi, all > > > > I’ve updated the FLIP[1] with the terminology `ChangelogTime`. > > > > Best > > Leonard > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL > > > > > >> 在 2020年8月4日,20:58,Leonard Xu <xbjt...@gmail.com <mailto: > xbjt...@gmail.com>> 写道: > >> > >> Hi, Timo > >> > >> Thanks for you response. > >> > >>> 1) Naming: Is operation time a good term for this concept? If I read > "The operation time is the time when the changes happened in system." or > "The system time of DML execution in database", why don't we call it > `ChangelogTime` or `SystemTime`? Introducing another terminology of time in > Flink should be thought through. > >> > >> I agree that we should thought through. I have considered the name > `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the > name. > >> > >> I proposed `operationTime` because most changelog comes from Database > and we always called an action as `operation` rather than `change` in > Database, the operation time is easier to understand for database users, > but it's more like a database terminology. > >> > >> For `SystemTime`, user may confuse which one does the system in > `SystemTime` represents? Flink, Database or CDC tool. Maybe it’s not a > good name. > >> > >> `ChangelogTime` is a pretty choice which is more unified with existed > terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime` > and I’ll update the FLIP. > >> > >> > >>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also > expose the concept of time through the user-level `Row` type? The FLIP does > not mention this explictly. I think we can keep it as an internal concept > but I just wanted to ask for clarification. > >> > >> Yes, I want to keep it as an internal concept, we have discussed that > changelog time concept should be the third time concept(the other two are > event-time and processing-time). It’s not easy for normal users(or to help > normal users) understand the three concepts accurately, and I did not find > a big enough scenario that user need to touch the changelog time for now, > so I tend to do not expose the concept to users. > >> > >> > >> Best, > >> Leonard > >> > >> > >>> > >>> On 04.08.20 04:58, Leonard Xu wrote: > >>>> Thanks Konstantin, > >>>> Regarding your questions, hope my comments has address your questions > and I also add a few explanation in the FLIP. > >>>> Thank you all for the feedback, > >>>> It seems everyone involved in this thread has reached a consensus. > >>>> I will start a vote thread later. > >>>> Best, > >>>> Leonard > >>>>> 在 2020年8月3日,19:35,godfrey he <godfre...@gmail.com <mailto: > godfre...@gmail.com>> 写道: > >>>>> > >>>>> Thanks Lennard for driving this FLIP. > >>>>> Looks good to me. > >>>>> > >>>>> Best, > >>>>> Godfrey > >>>>> > >>>>> Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>> 于2020年8月3日周一 > 下午12:04写道: > >>>>> > >>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape. > >>>>>> +1 to start a vote. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhue...@gmail.com > <mailto:fhue...@gmail.com>> wrote: > >>>>>> > >>>>>>> Hi Leonard, > >>>>>>> > >>>>>>> Thanks for this FLIP! > >>>>>>> Looks good from my side. > >>>>>>> > >>>>>>> Cheers, Fabian > >>>>>>> > >>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman < > >>>>>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com> > >>>>>>>> : > >>>>>>> > >>>>>>>> Hi Leondard, > >>>>>>>> > >>>>>>>> Thank you for pushing this, I think the updated syntax looks > really > >>>>>> good > >>>>>>>> and the semantics make sense to me. > >>>>>>>> > >>>>>>>> +1 > >>>>>>>> > >>>>>>>> Seth > >>>>>>>> > >>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjt...@gmail.com > <mailto:xbjt...@gmail.com>> wrote: > >>>>>>>> > >>>>>>>>> Hi, Konstantin > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 1) A "Versioned Temporal Table DDL on source" can only be > joined > >>>>>> on > >>>>>>>> the > >>>>>>>>>> PRIMARY KEY attribute, correct? > >>>>>>>>> Yes, the PRIMARY KEY would be join key. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the > VIEW > >>>>>>>>> definition that defines > >>>>>>>>>> whether a event-time or processing time temporal table join is > >>>>>> used? > >>>>>>>>> > >>>>>>>>> I think event-time or processing-time temporal table join > depends on > >>>>>>> fact > >>>>>>>>> table’s time attribute in temporal join rather than from temporal > >>>>>> table > >>>>>>>>> side, the event-time or processing time in temporal table is just > >>>>>> used > >>>>>>> to > >>>>>>>>> split the validity period of versioned snapshot of temporal > table. > >>>>>> The > >>>>>>>>> processing time attribute is not necessary for temporal table > >>>>>> without > >>>>>>>>> version, only the primary key is required, the following VIEW is > also > >>>>>>>> valid > >>>>>>>>> for temporal table without version. > >>>>>>>>> CREATE VIEW latest_rates AS > >>>>>>>>> SELECT currency, LAST_VALUE(rate) -- only keep the > latest > >>>>>>>>> version > >>>>>>>>> FROM rates > >>>>>>>>> GROUP BY currency; -- inferred primary > key > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always > versioned > >>>>>> on > >>>>>>>>>> operation_time regardless of the lookup table attribute > (event-time > >>>>>>> or > >>>>>>>>>> processing time attribute), correct? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the > >>>>>>> o.time > >>>>>>>>> value to lookup the version of the temporal table. > >>>>>>>>> For fact table has the processing time attribute, it means only > >>>>>> lookup > >>>>>>>> the > >>>>>>>>> latest version of temporal table and we can do some optimization > in > >>>>>>>>> implementation like only keep the latest version. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Best > >>>>>>>>> Leonard > >>>>>>>> > >>>>>>> > >>>>>> > >>> > >> > > > > -- Best regards! Rui Li