I agree with Timo, semantic about primary key needs more thought and discussion, especially after FLIP-95 and FLIP-105.
Best, Kurt On Mon, Jun 22, 2020 at 4:45 PM Timo Walther <twal...@apache.org> wrote: > Hi Leonard, > > thanks for the summary. > > After reading all of the previous arguments and working on FLIP-95. I > would also lean towards the conclusion of not adding the TEMPORAL keyword. > > After FLIP-95, what we considered as a CREATE TEMPORAL TABLE can be > represented as a CREATE TABLE with PRIMARY KEY and WATERMARK. The FOR > SYSTEM_TIME AS OF t would trigger the internal materialization and > "temporal" logic. > > However, we should discuss the meaning of PRIMARY KEY again in this > case. In a TEMPORAL TABLE scenario, the source would emit duplicate > primary keys with INSERT changeflag but at different point in time. > Currently, we require a PRIMARY KEY NOT ENFORCED declaration. The > changelog semantics of FLIP-95 and FLIP-105 don't work well with a > primary key declaration. > > Regards, > Timo > > > On 20.06.20 17:08, Leonard Xu wrote: > > Hi everyone, > > > > Thanks for the nice discussion. I’d like to move forward the work, > please let me simply summarize the main opinion and current divergences. > > > > 1. The agreements have been achieved: > > > > 1.1 The motivation we're discussing temporal table DDL is just for > creating temporal table in pure SQL to replace pre-process temporal table > in YAML/Table API for usability. > > 1.2 The reason we use "TEMPORAL" keyword rather than “PERIOD FOR > SYSTEM_TIME” is to make user understand easily. > > 1.3 For append-only table, it can convert to changelog table which has > been discussed in FLIP-105, we assume the following temporal table is comes > from changelog (Jark, fabian, Timo). > > 1.4 For temporal join syntax, using "FOR SYSTEM_TIME AS OF x" instead of > the current `LATERAL TABLE(rates(x))` has come to an agreement(Fabian, > Timo, Seth, Konstantin, Kurt). > > > > 2. The small divergence : > > > > About the definition syntax of the temporal table, > > > > CREATE [TEMPORAL] TABLE rates ( > > currency CHAR(3) NOT NULL PRIMARY KEY, > > rate DOUBLE, > > rowtime TIMESTAMP, > > WATERMARK FOR rowtime AS rowtime - INTERVAL '5' MINUTE) > > WITH (...); > > > > there is small divergence whether add "TEMPORAL" keyword or not. > > > > 2.1 one opinion is using "CREATE TEMPORAL TABLE" (Timo, Fabian, Seth), > the main advantages are: > > (1)"TEMPORAL" keyword is intuitive to indicate the history tracking > semantics. > > (2)"TEMPORAL" keyword illustrates that queries can visit the previous > versions of a table like other DBMS use "PERIOD FOR SYSTEM_TIME" keyword. > > > > 2.2 the other is using "CREATE TABLE"(Kurt), the main advantages are: > > (1)Just primary key and time attribute can track previous versions of a > table well. > > (2)The temporal behavior is triggered by temporal join syntax rather > than in DDL, all Flink DDL table are dynamic table logically including > temporal table. If we decide to use "TEMPORAL" keyword and treats changelog > as temporal table, other tables backed queue like Kafka should also use > "TEMPORAL" keyword. > > > > > > IMO, the statement “CREATE TEMPORARY TEMPORAL TABLE...” follows with 2.1 > may confuse users much. If we take a second to think about, for source/sink > table which may backed queue (like kafka) or DB (like MySQL), we did not > add any keyword in DDL to specify they are source or sinks, it works well. > > I think temporal table is the third one, kafka data source and DB data > source can play as a source/sink/temporal table depends on the > position/syntax that user put them in the query. The above rates table > > - can be a source table if user put it at `SELECT * FROM rates;` > > - can be a temporal table if user put it at `SELECT * FROM orders > JOIN rates FOR SYSTEM_TIME AS OF orders.proctime > > ON orders.currency = rates.currency;` > > - can be sink table if user put is at `INSERT INTO rates SELECT * > FROM …; ` > > From these cases, we found all tables defined in Flink should be > dynamic table logically, the source/sink/temporal role depends on the > position/syntax in user’s query. > > In fact we have used similar syntax for current lookup table, we > didn’t add “LOOKUP" or “TEMPORAL" keyword for lookup table and trigger the > temporal join from the position/syntax(“FOR SYSTEM_TIME AS OF x") in query. > > > > So, I prefer to resolve the small divergence with “CREATE TABLE” which > > (1) is more unified with our source/sink/temporal dynamic table > conceptually, > > (2) is aligned with current lookup table, > > (3) also make users learn less keyword. > > > > WDYT? > > > > Best, > > Leonard Xu > > > > > >