[ 
https://issues.apache.org/jira/browse/FLINK-24050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536499#comment-17536499
 ] 

Hang Ruan commented on FLINK-24050:
-----------------------------------

I am interested in this issue. Maybe I can help to improve this part. But there 
are still some details to discuss.

I think there should be no limitation for the source table. But when the table 
is used as a sink table, we need to discuss how to deal with the metadata 
primary key, which may be virtual or not.
 * Virtual metadata: Virtual metadata can not be persisted in the target 
storage. It leads that the same metadata which is read from the same table may 
be different from the value when we write it.
 * Writable metadata(Not virtual): There should be no limitation for the 
writable metadata.

IMO, the strategy for the metadata primary key should be like this:
 * There should be no limitation for the source table to use a metadata as 
primary keys;
 * For sink table:
 ** Using virtual metadata as primary keys is meaningless. If virtual metadata 
is used in this way, we need to ignore this primary key and warn the user. 
 *** If the primary key only contains virtual metadata, just ignore this 
primary key.
 *** If the primary key contains virtual metadata and other columns, throw a 
validation exception.
 ** Using writable metadata as primary keys is allowed. The behavior when write 
these metadata to the target storage depends on the connector type.
 *** Take upsert-kafka tables as an example. The upsert-kafka tables will write 
primary keys to the key of the Kafka record. If upsert-kafka connector supports 
to use metadata as primay keys, whether the metadata is write to the key or not 
depends on the upsert-kafka connector's implementation.

> Support primary keys on metadata columns
> ----------------------------------------
>
>                 Key: FLINK-24050
>                 URL: https://issues.apache.org/jira/browse/FLINK-24050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API
>            Reporter: Ingo Bürk
>            Priority: Major
>
> Currently, primary keys are required to consist solely of physical columns. 
> However, there might be scenarios where the actual payload/records do not 
> contain a suitable primary key, but a unique identifier is available through 
> metadata. In this case it would make sense to define the primary key on such 
> a metadata column:
> {code:java}
> CREATE TABLE T (
>   uid STRING METADATA,
>   content STRING
>   PRIMARY KEY (uid) NOT ENFORCED
> ) WITH (…)
> {code}
> A simple example for this would be IMAP: there is nothing unique about any 
> single email as a record, but each email in a specific folder on an IMAP 
> server has a unique UID (I'm excluding some irrelevant technical details 
> here).
> See FLINK-24512 for another (probably better) use case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to