Thank you for your comments! I will start the vote then.
Best, Dawid On 22/11/2019 07:30, Jark Wu wrote: > Hi Dawid, > > I don't have other concerns now. Thanks for the explanation. > > Best, > Jark > > On Thu, 21 Nov 2019 at 18:13, Kurt Young <[email protected]> wrote: > >> Hi all, >> >> After some offline discussion with Dawid, I changed my mind. We still agree >> on that >> we should only support reading from tables with primary key, forbid writing >> to such table. >> >> But the enforce flag should be introduced with primary key, and probably >> should be >> `NOR ENFORCE` be default. The problem with making `ENFORCE` by default is >> when >> creating a table with primary key constraint. If we follow the SQL standard >> here, we should >> do the validity check during the creating operation, and this will >> introduce a really big >> burden here and might make supporting this feature not feasible. >> >> Best, >> Kurt >> >> >> On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <[email protected]> wrote: >> >>> Hi Dawid, >>> >>> Actually I'm not 100% sure about both choices: always enforce primary key >>> by Flink and not enforce at all. >>> >>> As you said, always enforce primary key is hard to achieve since Flink >>> doesn't >>> own data most of the time. But with some certain cases, it's actually >>> doable, like >>> writing a new table in a batch job, or if we have upsert flag supported >> in >>> message >>> queue. So it's hard to say Flink has no chance to enforce primary key >>> here. >>> >>> On the other hand, not enforce primary key indeed save us a lot of work, >>> but also >>> would cause confusing sometimes. Especially when the table is written and >>> read >>> by Flink only. For example, some user created a table with primary key >> not >>> enforced, >>> and they will use it as source and sink table. This situation is quite >>> confusing, because >>> Flink didn't enforce primary key when writing, so it's hard to say we can >>> rely on such >>> information during read. It looks we are contradictory with ourself. >>> >>> AFAIK, most databases will only do validity check only when writing to >>> table, and >>> simply trust such information during read. I think we could also follow >>> this way. Since >>> we are not sure yet what kind of behavior when writing to a table with >>> primary key, I would >>> suggest to defer this util we start to talk about the source and sink >>> behavior with primary >>> key, maybe as well as introducing update, delete flag. >>> >>> To summarize, my proposal would be: >>> 1. Add primary key but not introducing enforce flag now >>> 2. Table with primary key can't be used as sink, and we will simply trust >>> pk info during read. >>> >>> Best, >>> Kurt >>> >>> >>> On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[email protected] >>> >>> wrote: >>> >>>> Hi Kurt, >>>> >>>> You are right that this is included in the SQL standard. >>>> >>>> The problem is we are not a database and we do not own the data. There >>>> is no way for us to truly enforce the uniqueness constraint. We are >>>> always writing to some external systems that, moreover we are not an >>>> exclusive produce/consumer for that data. We cannot ensure that external >>>> systems will not corrupt the data. >>>> >>>> The way RDBM systems ensure uniqueness constraint is that they keep the >>>> index of all keys for a table that they check against for every write. >>>> This works only if the system that maintains the index is the only >>>> writer to that table. >>>> >>>> Moreover some of the systems also allow loosening this restriction e.g. >>>> >>>> DB2: >>>> >>>> >> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html >>>> HIVE(with a non standard syntax, they do it for the same reasons as >>>> listed above): >>>> >>>> >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints >>>> >>>> What do you think? >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> >>>> On 20/11/2019 13:51, Kurt Young wrote: >>>>> Hi all, >>>>> >>>>> Thanks Dawid for bringing this up, this is very important property of >>>> SQL >>>>> and >>>>> I'm big +1 to have this. >>>>> >>>>> But before the discussion going deeply, I want to first mention that >>>>> according to >>>>> SQL standard, any unique key constraint which including primary key, >>>> should >>>>> be >>>>> always be enforced. >>>>> >>>>> Please see: >>>>> >>>>> 4.18.3.1 Introduction to table constraints >>>>> >>>>> "NOTE 50 — A unique constraint is always enforced. A table check >>>> constraint >>>>> or >>>>> a referential constraint can either be enforced or not enforced." >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz < >>>> [email protected]> >>>>> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> After an offline discussion I updated the FLIP slightly. In the >> initial >>>>>> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it >>>>>> into the statement paranthesis. I also moved the primary key to the >>>>>> TableSchema class. >>>>>> >>>>>> I also extended sections about UNIQUE key support and how do the >>>> primary >>>>>> keys affect sources and sinks. >>>>>> >>>>>> Please have another look. If there are no further concerns, I would >>>> like >>>>>> to start the vote soon. >>>>>> >>>>>> Best, >>>>>> >>>>>> Dawid >>>>>> >>>>>> On 20/11/2019 04:32, Jark Wu wrote: >>>>>>> Hi Dawid, >>>>>>> >>>>>>> Thanks for driving this. Primary key is a very important and useful >>>>>> feature >>>>>>> to Flink Table API / SQL. >>>>>>> >>>>>>> I'm +1 to the general design. >>>>>>> >>>>>>> And have some thoughts as following: >>>>>>> >>>>>>> 1. > The design says Flink only support NOT ENFORCED mode. But the >> DDL >>>>>>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode. >>>>>>> What will we do when user specify an ENFORCED mode? Shall we >>>> forbid >>>>>>> it? >>>>>>> >>>>>>> 2. > "When creating a table, creating a primary key constraint will >>>> alter >>>>>>> the columns nullability." >>>>>>> I think we should force the columns to be declared NOT NULL, >>>> instead >>>>>> of >>>>>>> change the columns nullability implicitly. >>>>>>> Otherwise, it may be confused what is the nullbility of the >>>> primary >>>>>> key >>>>>>> columns. >>>>>>> >>>>>>> 3. > "Support for Unique key is not part of the FLIP. It is just >>>>>> mentioned >>>>>>> to show how can we extend the primary key concept with more >>>> constraints >>>>>> in >>>>>>> the future." >>>>>>> I think we can include Unique Key as part of the FLIP, as it is >>>>>> already >>>>>>> stated clearly. We can put the implemenation task of unique key in a >>>>>> lower >>>>>>> priority. >>>>>>> >>>>>>> 4. > Method for retrieving primary key constraint is in >>>> CatalogBaseTable. >>>>>>> In SQL standard, primary key is declared in line or out of line. >>>> If >>>>>> it >>>>>>> is out of line, it is still in the schema part, i.e. besides column >>>>>>> definitions and in the parentheses. >>>>>>> So I think maybe primary key should belong to `TableSchema` >>>> becuase >>>>>> it >>>>>>> is a part of schema. >>>>>>> >>>>>>> CREATE TABLE xxx ( >>>>>>> a int, >>>>>>> b varchar, >>>>>>> c bigint, >>>>>>> primary key (a, b) >>>>>>> ) with (...) >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> >>>>>>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz < >>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I wanted to bring up the topic of primary key constraints that we >>>> could >>>>>>>> leverage for runtime optimizations. Please have a look at the >>>> proposal >>>>>>>> and I would especially draw your attention to the topic of >>>> nullability >>>>>>>> of columns that are part of a primary key. Some time in the future >> we >>>>>>>> will also be able to leverage it for upserting streaming tables. >>>>>>>> >>>>>>>> Here is the proposal: >>>>>>>> >>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API >>>>>>>> Best, >>>>>>>> >>>>>>>> Dawid >>>>>>>> >>>>>>>> >>>>>>>> >>>>
signature.asc
Description: OpenPGP digital signature
