Hi Dawid,

Actually I'm not 100% sure about both choices: always enforce primary key
by Flink and not enforce at all.

As you said, always enforce primary key is hard to achieve since Flink
doesn't
own data most of the time. But with some certain cases, it's actually
doable, like
writing a new table in a batch job, or if we have upsert flag supported in
message
queue. So it's hard to say Flink has no chance to enforce primary key here.

On the other hand, not enforce primary key indeed save us a lot of work,
but also
would cause confusing sometimes. Especially when the table is written and
read
by Flink only. For example, some user created a table with primary key not
enforced,
and they will use it as source and sink table. This situation is quite
confusing, because
Flink didn't enforce primary key when writing, so it's hard to say we can
rely on such
information during read. It looks we are contradictory with ourself.

AFAIK, most databases will only do validity check only when writing to
table, and
simply trust such information during read. I think we could also follow
this way. Since
we are not sure yet what kind of behavior when writing to a table with
primary key, I would
suggest to defer this util we start to talk about the source and sink
behavior with primary
key, maybe as well as introducing update, delete flag.

To summarize, my proposal would be:
1. Add primary key but not introducing enforce flag now
2. Table with primary key can't be used as sink, and we will simply trust
pk info during read.

Best,
Kurt


On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[email protected]>
wrote:

> Hi Kurt,
>
> You are right that this is included in the SQL standard.
>
> The problem is we are not a database and we do not own the data. There
> is no way for us to truly enforce the uniqueness constraint. We are
> always writing to some external systems that, moreover we are not an
> exclusive produce/consumer for that data. We cannot ensure that external
> systems will not corrupt the data.
>
> The way RDBM systems ensure uniqueness constraint is that they keep the
> index of all keys for a table that they check against for every write.
> This works only if the system that maintains the index is the only
> writer to that table.
>
> Moreover some of the systems also allow loosening this restriction e.g.
>
> DB2:
>
> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
>
> HIVE(with a non standard syntax, they do it for the same reasons as
> listed above):
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
>
>
> What do you think?
>
> Best,
>
> Dawid
>
>
> On 20/11/2019 13:51, Kurt Young wrote:
> > Hi all,
> >
> > Thanks Dawid for bringing this up, this is very important property of SQL
> > and
> > I'm big +1 to have this.
> >
> > But before the discussion going deeply, I want to first mention that
> > according to
> > SQL standard, any unique key constraint which including primary key,
> should
> > be
> > always be enforced.
> >
> > Please see:
> >
> > 4.18.3.1 Introduction to table constraints
> >
> > "NOTE 50 — A unique constraint is always enforced. A table check
> constraint
> > or
> > a referential constraint can either be enforced or not enforced."
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <[email protected]
> >
> > wrote:
> >
> >> Hey,
> >>
> >> After an offline discussion I updated the FLIP slightly. In the initial
> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
> >> into the statement paranthesis. I also moved the primary key to the
> >> TableSchema class.
> >>
> >> I also extended sections about UNIQUE key support and how do the primary
> >> keys affect sources and sinks.
> >>
> >> Please have another look. If there are no further concerns, I would like
> >> to start the vote soon.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 20/11/2019 04:32, Jark Wu wrote:
> >>> Hi Dawid,
> >>>
> >>> Thanks for driving this. Primary key is a very important and useful
> >> feature
> >>> to Flink Table API / SQL.
> >>>
> >>> I'm +1 to the general design.
> >>>
> >>> And have some thoughts as following:
> >>>
> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
> >>>     What will we do when user specify an ENFORCED mode?  Shall we
> forbid
> >>> it?
> >>>
> >>> 2. > "When creating a table, creating a primary key constraint will
> alter
> >>> the columns nullability."
> >>>     I think we should force the columns to be declared NOT NULL,
> instead
> >> of
> >>> change the columns nullability implicitly.
> >>>     Otherwise, it may be confused what is the nullbility of the primary
> >> key
> >>> columns.
> >>>
> >>> 3. > "Support for Unique key is not part of the FLIP. It is just
> >> mentioned
> >>> to show how can we extend the primary key concept with more constraints
> >> in
> >>> the future."
> >>>     I think we can include Unique Key as part of the FLIP, as it is
> >> already
> >>> stated clearly. We can put the implemenation task of unique key in a
> >> lower
> >>> priority.
> >>>
> >>> 4. > Method for retrieving primary key constraint is in
> CatalogBaseTable.
> >>>     In SQL standard, primary key is declared in line or out of line. If
> >> it
> >>> is out of line, it is still in the schema part, i.e. besides column
> >>> definitions and in the parentheses.
> >>>     So I think maybe primary key should belong to `TableSchema` becuase
> >> it
> >>> is a part of schema.
> >>>
> >>>     CREATE TABLE xxx (
> >>>        a int,
> >>>        b varchar,
> >>>        c bigint,
> >>>        primary key (a, b)
> >>>     ) with (...)
> >>>
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I wanted to bring up the topic of primary key constraints that we
> could
> >>>> leverage for runtime optimizations. Please have a look at the proposal
> >>>> and I would especially draw your attention to the topic of nullability
> >>>> of columns that are part of a primary key. Some time in the future we
> >>>> will also be able to leverage it for upserting streaming tables.
> >>>>
> >>>> Here is the proposal:
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
> >>>> Best,
> >>>>
> >>>> Dawid
> >>>>
> >>>>
> >>>>
> >>
>
>

Reply via email to