Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

Timo Walther Tue, 22 Jul 2025 07:04:14 -0700

Hi Mayank,

Thanks for updating the FLIP and clearly documenting our discussion.

+1 for moving forward with the vote, unless there are objections fromothers.


Cheers,
Timo

On 22.07.25 02:14, Mayank Juneja wrote:

Hi Ryan and Austin,

Thanks for your suggestions. I've updated the FLIP with the following
additional info -

1. *table.secret-store.kind* key to register the SecretStore in a yaml file
2. *updateSecret* method in WritableSecretStore interface

Thanks,
Mayank

On Thu, Jul 17, 2025 at 5:42 PM Austin Cawley-Edwards <
[email protected]> wrote:

Hey all,

Thanks for the nice flip all! I’m just reading through – had one question
on the ALTER CONNECTION implementation flow. Would it make sense for the
WritableSecretStore to expose a method for updating a secret by ID, so it
can be done atomically? Else, would we need to call delete and create
again, potentially introducing concurrent resolution errors?

Best,
Austin

On Thu, Jul 17, 2025 at 13:07 Ryan van Huuksloot
<[email protected]> wrote:

Hi Mayank,

Thanks for updating the FLIP. Overall it looks good to me.

One question I had related to how someone could choose the SecretStore

they

want to use if they use something like the SQL Gateway as the entrypoint

on

top of a remote Session cluster. I don't see an explicit way to set the
SecretStore in the FLIP.
I assume we'll do it similar to the CatalogStore but I wanted to call

this

out.

table.catalog-store.kind: filetable.catalog-store.file.path:
file:///path/to/catalog/store/

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email



On Wed, Jul 16, 2025 at 2:22 PM Mayank Juneja <[email protected]>
wrote:

Hi everyone,

Thanks for your valuable inputs. I have updated the FLIP with the ideas
proposed earlier in the thread. Looking forward to your feedback.
https://cwiki.apache.org/confluence/x/cYroF

Best,
Mayank

On Fri, Jun 27, 2025 at 2:59 AM Leonard Xu <[email protected]> wrote:

Quick response, thanks Mayank, Hao and Timo for the effort.  The new
proposal looks well, +1 from my side.

Could you draft(update) current FLIP docs thus we can have some

specific

discussions later?


Best,
Leonard

2025 6月 26 15:06，Timo Walther <[email protected]> 写道：

Hi everyone,

sorry for the late reply, feature freeze kept me busy. Mayank, Hao

and

synced offline and came up we an improved proposal. Before we update

the

FLIP let me summarize the most important key facts that hopefully

address

most concerns:


1) SecretStore
- Similar to CatalogStore, we introduce a SecretStore as the

highest

level in TableEnvironment.

- SecretStore is initialized with options and potentially

environment

variables. Including

EnvironmentSettings.withSecretStore(SecretStore).

- The SecretStore is pluggable and discovered using the regular

factory-approach.

- For example, it could implement Azure Key Vault or other cloud

provider secrets stores.

- Goal: Flink and Flink catalogs do not have to deal with sensitive

data.


2) Connections
- Connections are catalog objects identified with 3-part

identifiers.

3-part identifiers are crucial for managability of larger projects

and

align with existing catalog objects.

- They contain connection details, e.g. URL, query parameters, and

other

configuration.

- They do not contain secrets, but only pointers to secrets in the

SecretStore.


3) Connection DDL

CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH (
  'type' = 'basic' | 'bearer' | 'jwt' | 'oauth' | ...,
  ...
)

- Connection type is pluggable and discovered using the regular

factory-approach.

- The factory extracts secrets and puts them into SecretStore.
- The factory only leaves non-confidential options left that can be

stored in a catalog.


When executing:
CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH (
  'type' = 'basic',
  'url' = 'api.example.com',
  'username' = 'bob',
  'password' = 'xyz'
)

The catalog will receive something similar to:
CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH (
  'type' = 'basic',
  'url' = 'api.example.com',
  'secret.store' = 'azure-key-vault'
  'secret.id' = 'secretId'
)

- However, the exact property design is up to the connection

factory.


4) Connection Usage

CREATE TABLE t (...) USING CONNECTION mycat.mydb.OpenAPI;

- MODEL, FUNCTION, TABLE DDL will support USING CONNECTION keyword

similar to BigQuery.

- The connection will be provided in a table/model

provider/function

definition factory.


5) CatalogStore / Catalog Initialization

Catalog store or catalog can make use of SecretStore to retrieve

initial

credentials for bootstrapping. All objects lower then catalog

store/catalog

can then use connections. If you think we still need system level
connections, we can support CREATE SYSTEM CONNECTION GlobalName WITH

(..)

similar to SYSTEM functions directly store in a ConnectioManager in
TableEnvironment. But for now I would suggest to start simple with
per-catalog connections and later evolve the design.


Dealing with secrets is a very sensitive topic and I'm clearly not

an

expert on it. This is why we should try to push the problem to

existing

solutions and don't start storing secrets in Flink in any way. Thus,

the

interfaces will be defined very generic.


Looking forward to your feedback.

Cheers,
Timo





On 09.06.25 04:01, Leonard Xu wrote:

Thanks  Timo for joining this thread.
I agree that this feature is needed by the community; the current

disagreement is only about the implementation method or solution.

Your thoughts looks generally good to me, looking forward to your

proposal.

Best,
Leonard

2025 6月 6 22:46，Timo Walther <[email protected]> 写道：

Hi everyone,

thanks for this healthy discussion. Looking at high number of

participants, it looks like we definitely want this feature. We just

need

to figure out the "how".


This reminds me very much of the discussion we had for CREATE

FUNCTION. There, we discussed whether functions should be named

globally

or

catalog-specific. In the end, we decided for both `CREATE SYSTEM

FUNCTION`

and `CREATE FUNCTION`, satisfying both the data platform team of an
organization (which might provide system functions) and individual

data

teams or use cases (scoped by catalog/database).


Looking at other modern vendors like Snowflake there is SECRET

(scoped

to schema) [1] and API INTEGRATION [2] (scoped to account). So also

other

vendors offer global and per-team / per-use case connections details.


In general, I think fitting connections into the existing

concepts

for

catalog objects (with three-part identifier) makes managing them

easier.

But I also see the need for global defaults.


Btw keep in mind that a catalog implementation should only store

metadata. Similar how a CatalogTable doesn't store the actual data, a
CatalogConnection should not store the credentials. It should only

offer

factory that allows for storing and retrieving them. In real world
scenarios a factory is most likely backed by a product like Azure Key

Vault.


So code-wise having a ConnectionManager that behaves similar to

FunctionManager sounds reasonable.


+1 for having special syntax instead of using properties. This

allows

to access connections in tables, models, functions. And catalogs, if

we

agree to have global ones as well.


What do you think?

Let me spend some more thoughts on this and come back with a

concrete

proposal by early next week.


Cheers,
Timo

[1]

https://docs.snowflake.com/en/sql-reference/sql/create-secret

[2]

https://docs.snowflake.com/en/sql-reference/sql/create-api-integration


On 04.06.25 10:47, Leonard Xu wrote:

Hey，Mayank
Please see my feedback as following:
1. One of the motivations of this FLIP is to improve security.

However, the current design stores all connection information in the
catalog,

and each Flink SQL job reads from the catalog during

compilation.

The

connection information is passed between SQL Gateway and the

catalog in plaintext, which actually introduces new security

risks.

2. The name "Connection" should be changed to something like

ConnectionSpec to clearly indicate that it is a object containing

only

static

properties without a lifecycle. Putting aside the naming issue,

think the current model and hierarchy design is somewhat strange.

Storing

various kinds of connections (e.g., Kafka, MySQL) in the same

Catalog

with hierarchical identifiers like

catalog-name.db-name.connection-name

raises the following questions:
(1) What is the purpose of this hierarchical structure of

Connection

object ?

(2) If we can use a Connection to create a MySQL table, why

can't

we

use a Connection to create a MySQL Catalog?

3. Regarding the connector usage examples given in this FLIP:
```sql
1  -- Example 2: Using connection for jdbc tables
2  CREATE OR REPLACE CONNECTION mysql_customer_db
3  WITH (
4    'type' = 'jdbc',
5    'jdbc.url' = 'jdbc:mysql://

customer-db.example.com:3306/customerdb',

6    'jdbc.connection.ssl.enabled' = 'true'
7  );
8
9  CREATE TABLE customers (
10   customer_id INT,
11   PRIMARY KEY (customer_id) NOT ENFORCED
12 ) WITH (
13   'connector' = 'jdbc',
14   'jdbc.connection' = 'mysql_customer_db',
15   'jdbc.connection.ssl.enabled' = 'true',
16   'jdbc.connection.max-retry-timeout' = '60s',
17   'jdbc.table-name' = 'customers',
18   'jdbc.lookup.cache' = 'PARTIAL'
19 );
```
I see three issues from SQL semantics and Connector

compatibility

perspectives:

(1) Look at line 14: `mysql_customer_db` is an object identifier

of

CONNECTION defined in SQL. However, this identifier is referenced

     via a string value inside the table’s WITH clause, which

feel

hack for me.

(2) Look at lines 14–16: the use of the specific prefix

`jdbc.connection` will confuse users because `connection.xx` maybe

already

used as

  a prefix for existing configuration items.
(3) Look at lines 14–18: Why do all existing configuration

options

need to be prefixed with `jdbc`, even they’re not related to

Connection

properties?

This completely changes user habits — is it backward compatible?
  In my opinion, Connection should be a model independent of both

Catalog and Table, and can be referenced by all

catalog/table/udf/model

object.

It should be managed by a Component such as a ConnectionManager

to

enable reuse. For security purposes, authentication mechanisms could

be supported within the ConnectionManager.
Best,
Leonard

2025 6月 4 02:04，Martijn Visser <[email protected]> 写道：

Hi all,

First of all, I think having a Connection resource is something

that

will

be beneficial for Apache Flink. I could see that being extended

in

the

future to allow for easier secret handling [1].
In my mental mind, I'm comparing this proposal against SQL/MED

from

the ISO

standard [2]. I do think that SQL/MED isn't a very user

friendly

syntax

though, looking at Postgres for example [3].

I think it's a valid question if Connection should be

considered

with a

catalog or database-level scope. @Ryan can you share something

more,

since

you've mentioned "Note: I much prefer catalogs for this case.

Which

is what

we use internally to manage connection properties". It looks

like

there

isn't a strong favourable approach looking at other vendors

(like,

Databricks does scopes it on a Unity catalog, Snowflake on a

database

level).

Also looking forward to Leonard's input.

Best regards,

Martijn

[1] https://issues.apache.org/jira/browse/FLINK-36818
[2] https://www.iso.org/standard/84804.html
[3]

https://www.postgresql.org/docs/current/sql-createserver.html


On Fri, May 30, 2025 at 5:07 AM Leonard Xu <[email protected]>

wrote:

Hey Mayank.

Thanks for the FLIP, I went through this FLIP quickly and

found

some

issues which I think we
need to deep discuss later. As we’re on a short Dragon boat

Festival,

could you kindly hold
on this thread? and we will back to continue the FLIP discuss.

Best,
Leonard

2025 4月 29 23:07，Mayank Juneja <[email protected]>

写道：


Hi all,

I would like to open up for discussion a new FLIP-529 [1].

Motivation:
Currently, Flink SQL handles external connectivity by

defining

endpoints

and credentials in table configuration. This approach

prevents

reusability

of these connections and makes table definition less secure

by

exposing

sensitive information.
We propose the introduction of a new "connection" resource in

Flink. This

will be a pluggable resource configured with a remote

endpoint

and

associated access key. Once defined, connections can be

reused

across

table

definitions, and eventually for model definition (as

discussed

in

FLIP-437)

for inference, enabling seamless and secure integration with

external

systems.
The connection resource will provide a new, optional way to

manage

external

connectivity in Flink. Existing methods for table definitions

will

remain

unchanged.

[1] https://cwiki.apache.org/confluence/x/cYroF

Best Regards,
Mayank Juneja


--
*Mayank Juneja*
Product Manager | Data Streaming and AI

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

Reply via email to