Thanks Kurt, Yes, you are right. The `PERIOD FOR SYSTEM_TIME` that you linked before corresponds to the VERSION clause that I used and would explicitly define the versioning of a table. I didn't know that the `PERIOD FOR SYSTEM_TIME` cause is already defined by the SQL standard. I think we would need a slightly different syntax though because (so far) the validity of a row is determined by its own timestamp and the timestamp of the next row.
Adding a clause later solves the ambiguity issue for tables with multiple event-time attributes. However, I'd feel more comfortable having such a cause and an explicit definition of the temporal property from the beginning. I guess this is a matter of personal preference so I'll go with the majority if we decide that every table that has a primary key and an event-time attribute should be usable in an event-time temporal table join. Thanks, Fabian Am Di., 23. Juni 2020 um 16:58 Uhr schrieb Kurt Young <ykt...@gmail.com>: > Hi Fabian, > > I agree with you that implicitly letting event time to be the version of > the table will > work in most cases, but not for all. That's the reason I mentioned `PERIOD > FOR` [1] > syntax in my first email, which is already in sql standard to represent the > validity of > each row in the table. > > If the event time can't be used, or multiple event time are defined, we > could still add > this syntax in the future. > > What do you think? > > [1] > > https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15 > Best, > Kurt > > > On Tue, Jun 23, 2020 at 9:12 PM Fabian Hueske <fhue...@gmail.com> wrote: > > > Hi everyone, > > > > Every table with a primary key and an event-time attribute provides what > is > > needed for an event-time temporal table join. > > I agree that, from a technical point of view, the TEMPORAL keyword is not > > required. > > > > I'm more sceptical about implicitly deriving the versioning information > of > > a (temporal) table as the table's only event-time attribute. > > In the query > > > > SELECT * > > FROM orders o, rates r FOR SYSTEM_TIME AS OF o.ordertime > > WHERE o.currency = r.currency > > > > the syntax of the temporal table join does not explicitly reference the > > version of the temporal rates table. > > Hence, the system needs a way to derive the version of temporal table. > > > > Implicitly using the (only) event-time attribute of a temporal table > (rates > > in the example above) to identify the right version works in most cases, > > but probably not in all. > > * What if a table has more than one event-time attribute? (TableSchema is > > designed to support multiple watermarks; queries with interval joins > > produce tables with multiple event-time attributes, ...) > > * What if the table does not have an event-time attribute in its schema > but > > the version should only be provided as meta data? > > > > We could add a clause to define the version of a table, such as: > > > > CREATE TABLE rates ( > > currency CHAR(3) NOT NULL PRIMARY KEY, > > rate DOUBLE, > > rowtime TIMESTAMP, > > WATERMARK FOR rowtime AS rowtime - INTERVAL '5' MINUTE), > > VERSION (rowtime) > > WITH (...); > > > > The presence of a the VERSION clause (or whatever syntax) would > explicitly > > define the version of a (temporal) table. > > It would also render the need for the TEMPORAL keyword superfluous > because > > there would be another indicator that a table can be used in a temporal > > table join. > > > > I'm OK with not adding the TEMPORAL keyword, but I recommend that we > think > > again about the proposed implicit definition of a table's version and how > > it might limit use in the future. > > > > Cheers, > > Fabian > > > > Am Mo., 22. Juni 2020 um 16:14 Uhr schrieb Jark Wu <imj...@gmail.com>: > > > > > I'm also +1 for not adding the TEMPORAL keyword. > > > > > > +1 to make the PRIMARY KEY semantic clear for sources. > > > From my point of view: > > > > > > 1) PRIMARY KEY on changelog souruce: > > > It means that when the changelogs (INSERT/UPDATE/DELETE) are > > materialized, > > > the materialized table should be unique on the primary key columns. > > > Flink assumes messages are in order on the primary key. Flink doesn't > > > validate/enforces the key integrity, but simply trust it (thus NOT > > > ENFORCED). > > > Flink will use the PRIMARY KEY for some optimization, e.g. use the > > PRIMARY > > > KEY to update the materilized state by key in temporal join operator. > > > > > > 2) PRIMARY KEY on insert-only source: > > > I prefer to have the same semantic to the batch source and changelog > > > source, that it implies that records are not duplicate on the primary > > key. > > > Flink just simply trust the primary key constraint, and doesn't valid > it. > > > If there is duplicate primary keys with INSERT changeflag, then result > of > > > Flink query might be wrong. > > > > > > If this is a TEMPORAL TABLE FUNCTION scenario, that source emits > > duplicate > > > primary keys with INSERT changeflag, when we migrate this case to > > temporal > > > table DDL, > > > I think this source should emit INSERT/UPDATE (UPSERT) messages instead > > of > > > INSERT-only messages, e.g. a Kafka compacted topic source? > > > > > > Best, > > > Jark > > > > > > > > > On Mon, 22 Jun 2020 at 17:04, Konstantin Knauf <kna...@apache.org> > > wrote: > > > > > > > Hi everyone, > > > > > > > > I also agree with Leonard/Kurt's proposal for CREATE TEMPORAL TABLE. > > > > > > > > Best, > > > > > > > > Konstantin > > > > > > > > On Mon, Jun 22, 2020 at 10:53 AM Kurt Young <ykt...@gmail.com> > wrote: > > > > > > > > > I agree with Timo, semantic about primary key needs more thought > and > > > > > discussion, especially after FLIP-95 and FLIP-105. > > > > > > > > > > Best, > > > > > Kurt > > > > > > > > > > > > > > > On Mon, Jun 22, 2020 at 4:45 PM Timo Walther <twal...@apache.org> > > > wrote: > > > > > > > > > > > Hi Leonard, > > > > > > > > > > > > thanks for the summary. > > > > > > > > > > > > After reading all of the previous arguments and working on > > FLIP-95. I > > > > > > would also lean towards the conclusion of not adding the TEMPORAL > > > > > keyword. > > > > > > > > > > > > After FLIP-95, what we considered as a CREATE TEMPORAL TABLE can > be > > > > > > represented as a CREATE TABLE with PRIMARY KEY and WATERMARK. The > > FOR > > > > > > SYSTEM_TIME AS OF t would trigger the internal materialization > and > > > > > > "temporal" logic. > > > > > > > > > > > > However, we should discuss the meaning of PRIMARY KEY again in > this > > > > > > case. In a TEMPORAL TABLE scenario, the source would emit > duplicate > > > > > > primary keys with INSERT changeflag but at different point in > time. > > > > > > Currently, we require a PRIMARY KEY NOT ENFORCED declaration. The > > > > > > changelog semantics of FLIP-95 and FLIP-105 don't work well with > a > > > > > > primary key declaration. > > > > > > > > > > > > Regards, > > > > > > Timo > > > > > > > > > > > > > > > > > > On 20.06.20 17:08, Leonard Xu wrote: > > > > > > > Hi everyone, > > > > > > > > > > > > > > Thanks for the nice discussion. I’d like to move forward the > > work, > > > > > > please let me simply summarize the main opinion and current > > > > divergences. > > > > > > > > > > > > > > 1. The agreements have been achieved: > > > > > > > > > > > > > > 1.1 The motivation we're discussing temporal table DDL is just > > for > > > > > > creating temporal table in pure SQL to replace pre-process > temporal > > > > table > > > > > > in YAML/Table API for usability. > > > > > > > 1.2 The reason we use "TEMPORAL" keyword rather than “PERIOD > FOR > > > > > > SYSTEM_TIME” is to make user understand easily. > > > > > > > 1.3 For append-only table, it can convert to changelog table > > which > > > > has > > > > > > been discussed in FLIP-105, we assume the following temporal > table > > is > > > > > comes > > > > > > from changelog (Jark, fabian, Timo). > > > > > > > 1.4 For temporal join syntax, using "FOR SYSTEM_TIME AS OF x" > > > instead > > > > > of > > > > > > the current `LATERAL TABLE(rates(x))` has come to an > > > agreement(Fabian, > > > > > > Timo, Seth, Konstantin, Kurt). > > > > > > > > > > > > > > 2. The small divergence : > > > > > > > > > > > > > > About the definition syntax of the temporal table, > > > > > > > > > > > > > > CREATE [TEMPORAL] TABLE rates ( > > > > > > > currency CHAR(3) NOT NULL PRIMARY KEY, > > > > > > > rate DOUBLE, > > > > > > > rowtime TIMESTAMP, > > > > > > > WATERMARK FOR rowtime AS rowtime - INTERVAL '5' MINUTE) > > > > > > > WITH (...); > > > > > > > > > > > > > > there is small divergence whether add "TEMPORAL" keyword or > not. > > > > > > > > > > > > > > 2.1 one opinion is using "CREATE TEMPORAL TABLE" (Timo, > Fabian, > > > > Seth), > > > > > > the main advantages are: > > > > > > > (1)"TEMPORAL" keyword is intuitive to indicate the history > > tracking > > > > > > semantics. > > > > > > > (2)"TEMPORAL" keyword illustrates that queries can visit the > > > previous > > > > > > versions of a table like other DBMS use "PERIOD FOR SYSTEM_TIME" > > > > keyword. > > > > > > > > > > > > > > 2.2 the other is using "CREATE TABLE"(Kurt), the main > advantages > > > are: > > > > > > > (1)Just primary key and time attribute can track previous > > versions > > > > of a > > > > > > table well. > > > > > > > (2)The temporal behavior is triggered by temporal join syntax > > > rather > > > > > > than in DDL, all Flink DDL table are dynamic table logically > > > including > > > > > > temporal table. If we decide to use "TEMPORAL" keyword and treats > > > > > changelog > > > > > > as temporal table, other tables backed queue like Kafka should > also > > > use > > > > > > "TEMPORAL" keyword. > > > > > > > > > > > > > > > > > > > > > IMO, the statement “CREATE TEMPORARY TEMPORAL TABLE...” follows > > > with > > > > > 2.1 > > > > > > may confuse users much. If we take a second to think about, for > > > > > source/sink > > > > > > table which may backed queue (like kafka) or DB (like MySQL), we > > did > > > > not > > > > > > add any keyword in DDL to specify they are source or sinks, it > > works > > > > > well. > > > > > > > I think temporal table is the third one, kafka data source and > > DB > > > > data > > > > > > source can play as a source/sink/temporal table depends on the > > > > > > position/syntax that user put them in the query. The above rates > > > table > > > > > > > - can be a source table if user put it at `SELECT * FROM > > > rates;` > > > > > > > - can be a temporal table if user put it at `SELECT * FROM > > > > orders > > > > > > JOIN rates FOR SYSTEM_TIME AS OF orders.proctime > > > > > > > ON orders.currency = rates.currency;` > > > > > > > - can be sink table if user put is at `INSERT INTO rates > > > SELECT > > > > * > > > > > > FROM …; ` > > > > > > > From these cases, we found all tables defined in Flink should > be > > > > > > dynamic table logically, the source/sink/temporal role depends on > > the > > > > > > position/syntax in user’s query. > > > > > > > In fact we have used similar syntax for current lookup > > > table, > > > > we > > > > > > didn’t add “LOOKUP" or “TEMPORAL" keyword for lookup table and > > > trigger > > > > > the > > > > > > temporal join from the position/syntax(“FOR SYSTEM_TIME AS OF x") > > in > > > > > query. > > > > > > > > > > > > > > So, I prefer to resolve the small divergence with “CREATE > TABLE” > > > > which > > > > > > > (1) is more unified with our source/sink/temporal dynamic table > > > > > > conceptually, > > > > > > > (2) is aligned with current lookup table, > > > > > > > (3) also make users learn less keyword. > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > Best, > > > > > > > Leonard Xu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Konstantin Knauf > > > > > > > > https://twitter.com/snntrable > > > > > > > > https://github.com/knaufk > > > > > > > > > >