Thank you for your comments!

I will start the vote then.

Best,

Dawid

On 22/11/2019 07:30, Jark Wu wrote:
> Hi Dawid,
>
> I don't have other concerns now. Thanks for the explanation.
>
> Best,
> Jark
>
> On Thu, 21 Nov 2019 at 18:13, Kurt Young <[email protected]> wrote:
>
>> Hi all,
>>
>> After some offline discussion with Dawid, I changed my mind. We still agree
>> on that
>> we should only support reading from tables with primary key, forbid writing
>> to such table.
>>
>> But the enforce flag should be introduced with primary key, and probably
>> should be
>> `NOR ENFORCE` be default. The problem with making `ENFORCE` by default is
>> when
>> creating a table with primary key constraint. If we follow the SQL standard
>> here, we should
>> do the validity check during the creating operation, and this will
>> introduce a really big
>> burden here and might make supporting this feature not feasible.
>>
>> Best,
>> Kurt
>>
>>
>> On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <[email protected]> wrote:
>>
>>> Hi Dawid,
>>>
>>> Actually I'm not 100% sure about both choices: always enforce primary key
>>> by Flink and not enforce at all.
>>>
>>> As you said, always enforce primary key is hard to achieve since Flink
>>> doesn't
>>> own data most of the time. But with some certain cases, it's actually
>>> doable, like
>>> writing a new table in a batch job, or if we have upsert flag supported
>> in
>>> message
>>> queue. So it's hard to say Flink has no chance to enforce primary key
>>> here.
>>>
>>> On the other hand, not enforce primary key indeed save us a lot of work,
>>> but also
>>> would cause confusing sometimes. Especially when the table is written and
>>> read
>>> by Flink only. For example, some user created a table with primary key
>> not
>>> enforced,
>>> and they will use it as source and sink table. This situation is quite
>>> confusing, because
>>> Flink didn't enforce primary key when writing, so it's hard to say we can
>>> rely on such
>>> information during read. It looks we are contradictory with ourself.
>>>
>>> AFAIK, most databases will only do validity check only when writing to
>>> table, and
>>> simply trust such information during read. I think we could also follow
>>> this way. Since
>>> we are not sure yet what kind of behavior when writing to a table with
>>> primary key, I would
>>> suggest to defer this util we start to talk about the source and sink
>>> behavior with primary
>>> key, maybe as well as introducing update, delete flag.
>>>
>>> To summarize, my proposal would be:
>>> 1. Add primary key but not introducing enforce flag now
>>> 2. Table with primary key can't be used as sink, and we will simply trust
>>> pk info during read.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[email protected]
>>>
>>> wrote:
>>>
>>>> Hi Kurt,
>>>>
>>>> You are right that this is included in the SQL standard.
>>>>
>>>> The problem is we are not a database and we do not own the data. There
>>>> is no way for us to truly enforce the uniqueness constraint. We are
>>>> always writing to some external systems that, moreover we are not an
>>>> exclusive produce/consumer for that data. We cannot ensure that external
>>>> systems will not corrupt the data.
>>>>
>>>> The way RDBM systems ensure uniqueness constraint is that they keep the
>>>> index of all keys for a table that they check against for every write.
>>>> This works only if the system that maintains the index is the only
>>>> writer to that table.
>>>>
>>>> Moreover some of the systems also allow loosening this restriction e.g.
>>>>
>>>> DB2:
>>>>
>>>>
>> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
>>>> HIVE(with a non standard syntax, they do it for the same reasons as
>>>> listed above):
>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 20/11/2019 13:51, Kurt Young wrote:
>>>>> Hi all,
>>>>>
>>>>> Thanks Dawid for bringing this up, this is very important property of
>>>> SQL
>>>>> and
>>>>> I'm big +1 to have this.
>>>>>
>>>>> But before the discussion going deeply, I want to first mention that
>>>>> according to
>>>>> SQL standard, any unique key constraint which including primary key,
>>>> should
>>>>> be
>>>>> always be enforced.
>>>>>
>>>>> Please see:
>>>>>
>>>>> 4.18.3.1 Introduction to table constraints
>>>>>
>>>>> "NOTE 50 — A unique constraint is always enforced. A table check
>>>> constraint
>>>>> or
>>>>> a referential constraint can either be enforced or not enforced."
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>>
>>>>> On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <
>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> After an offline discussion I updated the FLIP slightly. In the
>> initial
>>>>>> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
>>>>>> into the statement paranthesis. I also moved the primary key to the
>>>>>> TableSchema class.
>>>>>>
>>>>>> I also extended sections about UNIQUE key support and how do the
>>>> primary
>>>>>> keys affect sources and sinks.
>>>>>>
>>>>>> Please have another look. If there are no further concerns, I would
>>>> like
>>>>>> to start the vote soon.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> On 20/11/2019 04:32, Jark Wu wrote:
>>>>>>> Hi Dawid,
>>>>>>>
>>>>>>> Thanks for driving this. Primary key is a very important and useful
>>>>>> feature
>>>>>>> to Flink Table API / SQL.
>>>>>>>
>>>>>>> I'm +1 to the general design.
>>>>>>>
>>>>>>> And have some thoughts as following:
>>>>>>>
>>>>>>> 1. > The design says Flink only support NOT ENFORCED mode. But the
>> DDL
>>>>>>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
>>>>>>>     What will we do when user specify an ENFORCED mode?  Shall we
>>>> forbid
>>>>>>> it?
>>>>>>>
>>>>>>> 2. > "When creating a table, creating a primary key constraint will
>>>> alter
>>>>>>> the columns nullability."
>>>>>>>     I think we should force the columns to be declared NOT NULL,
>>>> instead
>>>>>> of
>>>>>>> change the columns nullability implicitly.
>>>>>>>     Otherwise, it may be confused what is the nullbility of the
>>>> primary
>>>>>> key
>>>>>>> columns.
>>>>>>>
>>>>>>> 3. > "Support for Unique key is not part of the FLIP. It is just
>>>>>> mentioned
>>>>>>> to show how can we extend the primary key concept with more
>>>> constraints
>>>>>> in
>>>>>>> the future."
>>>>>>>     I think we can include Unique Key as part of the FLIP, as it is
>>>>>> already
>>>>>>> stated clearly. We can put the implemenation task of unique key in a
>>>>>> lower
>>>>>>> priority.
>>>>>>>
>>>>>>> 4. > Method for retrieving primary key constraint is in
>>>> CatalogBaseTable.
>>>>>>>     In SQL standard, primary key is declared in line or out of line.
>>>> If
>>>>>> it
>>>>>>> is out of line, it is still in the schema part, i.e. besides column
>>>>>>> definitions and in the parentheses.
>>>>>>>     So I think maybe primary key should belong to `TableSchema`
>>>> becuase
>>>>>> it
>>>>>>> is a part of schema.
>>>>>>>
>>>>>>>     CREATE TABLE xxx (
>>>>>>>        a int,
>>>>>>>        b varchar,
>>>>>>>        c bigint,
>>>>>>>        primary key (a, b)
>>>>>>>     ) with (...)
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I wanted to bring up the topic of primary key constraints that we
>>>> could
>>>>>>>> leverage for runtime optimizations. Please have a look at the
>>>> proposal
>>>>>>>> and I would especially draw your attention to the topic of
>>>> nullability
>>>>>>>> of columns that are part of a primary key. Some time in the future
>> we
>>>>>>>> will also be able to leverage it for upserting streaming tables.
>>>>>>>>
>>>>>>>> Here is the proposal:
>>>>>>>>
>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Dawid
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to