Hi Leonard,

thanks for the summary.

After reading all of the previous arguments and working on FLIP-95. I would also lean towards the conclusion of not adding the TEMPORAL keyword.

After FLIP-95, what we considered as a CREATE TEMPORAL TABLE can be represented as a CREATE TABLE with PRIMARY KEY and WATERMARK. The FOR SYSTEM_TIME AS OF t would trigger the internal materialization and "temporal" logic.

However, we should discuss the meaning of PRIMARY KEY again in this case. In a TEMPORAL TABLE scenario, the source would emit duplicate primary keys with INSERT changeflag but at different point in time. Currently, we require a PRIMARY KEY NOT ENFORCED declaration. The changelog semantics of FLIP-95 and FLIP-105 don't work well with a primary key declaration.

Regards,
Timo


On 20.06.20 17:08, Leonard Xu wrote:
Hi everyone,

Thanks for the nice discussion. I’d like to move forward the work, please let 
me simply summarize the main opinion and current divergences.

1. The agreements have been achieved:

1.1 The motivation we're discussing temporal table DDL is just for creating 
temporal table in pure SQL to replace pre-process temporal table in YAML/Table 
API for usability.
1.2 The reason we use "TEMPORAL" keyword rather than “PERIOD FOR SYSTEM_TIME” 
is to make user understand easily.
1.3 For append-only table, it can convert to changelog table which has been 
discussed in FLIP-105, we assume the following temporal table is comes from 
changelog (Jark, fabian, Timo).
1.4 For temporal join syntax, using "FOR SYSTEM_TIME AS OF x" instead of the 
current `LATERAL TABLE(rates(x))`  has come to an agreement(Fabian, Timo, Seth, 
Konstantin, Kurt).

2. The small divergence :

About the definition syntax of the temporal table,

CREATE [TEMPORAL] TABLE rates (
    currency CHAR(3) NOT NULL PRIMARY KEY,
    rate DOUBLE,
    rowtime TIMESTAMP,
    WATERMARK FOR rowtime AS rowtime - INTERVAL '5' MINUTE)
WITH (...);

there is small divergence whether add "TEMPORAL" keyword or not.

2.1  one opinion is using "CREATE TEMPORAL TABLE" (Timo, Fabian, Seth), the 
main advantages are:
(1)"TEMPORAL" keyword is intuitive to indicate the history tracking semantics.
(2)"TEMPORAL" keyword illustrates that queries can visit the previous versions of a table 
like other DBMS use "PERIOD FOR SYSTEM_TIME" keyword.

2.2 the other is using "CREATE TABLE"(Kurt), the main advantages are:
(1)Just primary key and time attribute can track previous versions of a table 
well.
(2)The temporal behavior is triggered by temporal join syntax rather than in DDL, all Flink DDL 
table are dynamic table logically including temporal table. If we decide to use 
"TEMPORAL" keyword and treats changelog as temporal table, other tables backed queue like 
Kafka should also use "TEMPORAL" keyword.


IMO, the statement “CREATE TEMPORARY TEMPORAL TABLE...” follows with 2.1 may 
confuse users much. If we take a second to think about, for source/sink table 
which may backed queue (like kafka) or DB (like MySQL), we did not add any 
keyword in DDL to specify they are source or sinks, it works well.
I think temporal table is the third one,  kafka data source and DB data source 
can play as a source/sink/temporal table depends on the position/syntax that 
user put them in the query. The above rates table
     - can be a source table if user put it at `SELECT * FROM rates;`
     - can be a temporal table if user put it at `SELECT * FROM orders JOIN 
rates FOR SYSTEM_TIME AS OF orders.proctime
              ON orders.currency = rates.currency;`
     - can be sink table if user put is at `INSERT INTO rates SELECT * FROM …; `
 From these cases, we found all tables defined in Flink should be dynamic table 
logically, the source/sink/temporal role depends on the position/syntax in 
user’s query.
       In fact we have used similar syntax for current lookup table, we didn’t add 
“LOOKUP" or “TEMPORAL" keyword for lookup table and trigger the temporal join from 
the position/syntax(“FOR SYSTEM_TIME AS OF x") in query.

So, I prefer to resolve the small divergence with “CREATE TABLE” which
(1) is more unified with our source/sink/temporal dynamic table conceptually,
(2) is aligned with current lookup table,
(3) also make users learn less keyword.

WDYT?

Best,
Leonard Xu



Reply via email to