Hi Jark, Thank you for your comments.
Ad. 1 My intention was to throw an exception if a user tries to use the ENFORCE mode. SQL standard states this is the default mode, so if we ever want to support the ENFORCE/NOT ENFORCE mode we need to make sure users now always provide the NOT ENFORCE flag. Does it make sense to you? If it does, I will clarify the FLIP on this topic. Ad. 2 This is the behavior of majority of RDBM systems. I think this is quite useful from the usability perspective. A column is nullable by default. It is also a good practice to always define PRIMARY KEY for a table. Therefore we would force users to add additional annotations. E.g. instead of: CREATE TABLE tableName(id INT PRIMARY KEY); user would always have to type CREATE TABLE tableName(id INT NOT NULL PRIMARY KEY); I am not totally against it, if this is the preferred way. Ad. 3 The only reason why I didn't want to add a full UNIQUE support is that we would have to decide how do we handle null for the UNIQUE constraint. Some databases allow multiple rows to have a null in a column with unique constraint, whereas other allow only one. That means if you have a table CREATE TABLE tableName (id INTEGER UNIQUE); In some databases it is legal to have rows (null; null; null; null; ....) while in others you can have at most (null). Do you think we should still discuss this as well? Ad. 4 I don't have a strong opinion on that. Actually SQL standard describes a Table descriptor. Some excerpts from the standard: — The name of the base table. — An indication of whether the table is a regular persistent base table, a system-versioned table, a global temporary table, a created local temporary table, or a declared local temporary table. ... — An indication of whether the table is insertable-into or not. — The column descriptor of each column in the table. — The descriptor of each table constraint specified for the table. In this context I thought primary key belongs to the Table rather than schema, where schema is as far as I understand just a list of Table column descriptors. Tbh the TableSchema does not fit very well into that description of the descriptor. If you have a good reasons why you think it would be better to have it in the TableSchema, I would be fine with it. We should though have a clear separation what belongs to the Table schema and what does not (e.g. partitions are in the Catalog table now). Best, Dawid On 20/11/2019 04:32, Jark Wu wrote: > Hi Dawid, > > Thanks for driving this. Primary key is a very important and useful feature > to Flink Table API / SQL. > > I'm +1 to the general design. > > And have some thoughts as following: > > 1. > The design says Flink only support NOT ENFORCED mode. But the DDL > and KeyConstraint#primaryKey(..) can pass in ENFORCED mode. > What will we do when user specify an ENFORCED mode? Shall we forbid > it? > > 2. > "When creating a table, creating a primary key constraint will alter > the columns nullability." > I think we should force the columns to be declared NOT NULL, instead of > change the columns nullability implicitly. > Otherwise, it may be confused what is the nullbility of the primary key > columns. > > 3. > "Support for Unique key is not part of the FLIP. It is just mentioned > to show how can we extend the primary key concept with more constraints in > the future." > I think we can include Unique Key as part of the FLIP, as it is already > stated clearly. We can put the implemenation task of unique key in a lower > priority. > > 4. > Method for retrieving primary key constraint is in CatalogBaseTable. > In SQL standard, primary key is declared in line or out of line. If it > is out of line, it is still in the schema part, i.e. besides column > definitions and in the parentheses. > So I think maybe primary key should belong to `TableSchema` becuase it > is a part of schema. > > CREATE TABLE xxx ( > a int, > b varchar, > c bigint, > primary key (a, b) > ) with (...) > > > Best, > Jark > > > On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[email protected]> > wrote: > >> Hi, >> >> I wanted to bring up the topic of primary key constraints that we could >> leverage for runtime optimizations. Please have a look at the proposal >> and I would especially draw your attention to the topic of nullability >> of columns that are part of a primary key. Some time in the future we >> will also be able to leverage it for upserting streaming tables. >> >> Here is the proposal: >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API >> >> Best, >> >> Dawid >> >> >>
signature.asc
Description: OpenPGP digital signature
