Hi everyone,

Thanks for your valuable inputs. I have updated the FLIP with the ideas
proposed earlier in the thread. Looking forward to your feedback.
https://cwiki.apache.org/confluence/x/cYroF

Best,
Mayank

On Fri, Jun 27, 2025 at 2:59 AM Leonard Xu <xbjt...@gmail.com> wrote:

> Quick response, thanks Mayank, Hao and Timo for the effort.  The new
> proposal looks well, +1 from my side.
>
> Could you draft(update) current FLIP docs thus we can have some specific
> discussions later?
>
>
> Best,
> Leonard
>
>
> > 2025 6月 26 15:06,Timo Walther <twal...@apache.org> 写道:
> >
> > Hi everyone,
> >
> > sorry for the late reply, feature freeze kept me busy. Mayank, Hao and I
> synced offline and came up we an improved proposal. Before we update the
> FLIP let me summarize the most important key facts that hopefully address
> most concerns:
> >
> > 1) SecretStore
> > - Similar to CatalogStore, we introduce a SecretStore as the highest
> level in TableEnvironment.
> > - SecretStore is initialized with options and potentially environment
> variables. Including EnvironmentSettings.withSecretStore(SecretStore).
> > - The SecretStore is pluggable and discovered using the regular
> factory-approach.
> > - For example, it could implement Azure Key Vault or other cloud
> provider secrets stores.
> > - Goal: Flink and Flink catalogs do not have to deal with sensitive data.
> >
> > 2) Connections
> > - Connections are catalog objects identified with 3-part identifiers.
> 3-part identifiers are crucial for managability of larger projects and
> align with existing catalog objects.
> > - They contain connection details, e.g. URL, query parameters, and other
> configuration.
> > - They do not contain secrets, but only pointers to secrets in the
> SecretStore.
> >
> > 3) Connection DDL
> >
> > CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH (
> >  'type' = 'basic' | 'bearer' | 'jwt' | 'oauth' | ...,
> >  ...
> > )
> >
> > - Connection type is pluggable and discovered using the regular
> factory-approach.
> > - The factory extracts secrets and puts them into SecretStore.
> > - The factory only leaves non-confidential options left that can be
> stored in a catalog.
> >
> > When executing:
> > CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH (
> >  'type' = 'basic',
> >  'url' = 'api.example.com',
> >  'username' = 'bob',
> >  'password' = 'xyz'
> > )
> >
> > The catalog will receive something similar to:
> > CREATE [TEMPORARY] CONNECTION mycat.mydb.OpenAPI WITH (
> >  'type' = 'basic',
> >  'url' = 'api.example.com',
> >  'secret.store' = 'azure-key-vault'
> >  'secret.id' = 'secretId'
> > )
> >
> > - However, the exact property design is up to the connection factory.
> >
> > 4) Connection Usage
> >
> > CREATE TABLE t (...) USING CONNECTION mycat.mydb.OpenAPI;
> >
> > - MODEL, FUNCTION, TABLE DDL will support USING CONNECTION keyword
> similar to BigQuery.
> > - The connection will be provided in a table/model provider/function
> definition factory.
> >
> > 5) CatalogStore / Catalog Initialization
> >
> > Catalog store or catalog can make use of SecretStore to retrieve initial
> credentials for bootstrapping. All objects lower then catalog store/catalog
> can then use connections. If you think we still need system level
> connections, we can support CREATE SYSTEM CONNECTION GlobalName WITH (..)
> similar to SYSTEM functions directly store in a ConnectioManager in
> TableEnvironment. But for now I would suggest to start simple with
> per-catalog connections and later evolve the design.
> >
> > Dealing with secrets is a very sensitive topic and I'm clearly not an
> expert on it. This is why we should try to push the problem to existing
> solutions and don't start storing secrets in Flink in any way. Thus, the
> interfaces will be defined very generic.
> >
> > Looking forward to your feedback.
> >
> > Cheers,
> > Timo
> >
> >
> >
> >
> >
> > On 09.06.25 04:01, Leonard Xu wrote:
> >> Thanks  Timo for joining this thread.
> >> I agree that this feature is needed by the community; the current
> disagreement is only about the implementation method or solution.
> >> Your thoughts looks generally good to me, looking forward to your
> proposal.
> >> Best,
> >> Leonard
> >>> 2025 6月 6 22:46,Timo Walther <twal...@apache.org> 写道:
> >>>
> >>> Hi everyone,
> >>>
> >>> thanks for this healthy discussion. Looking at high number of
> participants, it looks like we definitely want this feature. We just need
> to figure out the "how".
> >>>
> >>> This reminds me very much of the discussion we had for CREATE
> FUNCTION. There, we discussed whether functions should be named globally or
> catalog-specific. In the end, we decided for both `CREATE SYSTEM FUNCTION`
> and `CREATE FUNCTION`, satisfying both the data platform team of an
> organization (which might provide system functions) and individual data
> teams or use cases (scoped by catalog/database).
> >>>
> >>> Looking at other modern vendors like Snowflake there is SECRET (scoped
> to schema) [1] and API INTEGRATION [2] (scoped to account). So also other
> vendors offer global and per-team / per-use case connections details.
> >>>
> >>> In general, I think fitting connections into the existing concepts for
> catalog objects (with three-part identifier) makes managing them easier.
> But I also see the need for global defaults.
> >>>
> >>> Btw keep in mind that a catalog implementation should only store
> metadata. Similar how a CatalogTable doesn't store the actual data, a
> CatalogConnection should not store the credentials. It should only offer a
> factory that allows for storing and retrieving them. In real world
> scenarios a factory is most likely backed by a product like Azure Key Vault.
> >>>
> >>> So code-wise having a ConnectionManager that behaves similar to
> FunctionManager sounds reasonable.
> >>>
> >>> +1 for having special syntax instead of using properties. This allows
> to access connections in tables, models, functions. And catalogs, if we
> agree to have global ones as well.
> >>>
> >>> What do you think?
> >>>
> >>> Let me spend some more thoughts on this and come back with a concrete
> proposal by early next week.
> >>>
> >>> Cheers,
> >>> Timo
> >>>
> >>> [1] https://docs.snowflake.com/en/sql-reference/sql/create-secret
> >>> [2]
> https://docs.snowflake.com/en/sql-reference/sql/create-api-integration
> >>>
> >>> On 04.06.25 10:47, Leonard Xu wrote:
> >>>> Hey,Mayank
> >>>> Please see my feedback as following:
> >>>> 1. One of the motivations of this FLIP is to improve security.
> However, the current design stores all connection information in the
> catalog,
> >>>> and each Flink SQL job reads from the catalog during compilation. The
> connection information is passed between SQL Gateway and the
> >>>> catalog in plaintext, which actually introduces new security risks.
> >>>> 2. The name "Connection" should be changed to something like
> ConnectionSpec to clearly indicate that it is a object containing only
> static
> >>>> properties without a lifecycle. Putting aside the naming issue, I
> think the current model and hierarchy design is somewhat strange. Storing
> >>>> various kinds of connections (e.g., Kafka, MySQL) in the same Catalog
> with hierarchical identifiers like catalog-name.db-name.connection-name
> >>>> raises the following questions:
> >>>> (1) What is the purpose of this hierarchical structure of Connection
> object ?
> >>>> (2) If we can use a Connection to create a MySQL table, why can't we
> use a Connection to create a MySQL Catalog?
> >>>> 3. Regarding the connector usage examples given in this FLIP:
> >>>> ```sql
> >>>> 1  -- Example 2: Using connection for jdbc tables
> >>>> 2  CREATE OR REPLACE CONNECTION mysql_customer_db
> >>>> 3  WITH (
> >>>> 4    'type' = 'jdbc',
> >>>> 5    'jdbc.url' = 'jdbc:mysql://
> customer-db.example.com:3306/customerdb',
> >>>> 6    'jdbc.connection.ssl.enabled' = 'true'
> >>>> 7  );
> >>>> 8
> >>>> 9  CREATE TABLE customers (
> >>>> 10   customer_id INT,
> >>>> 11   PRIMARY KEY (customer_id) NOT ENFORCED
> >>>> 12 ) WITH (
> >>>> 13   'connector' = 'jdbc',
> >>>> 14   'jdbc.connection' = 'mysql_customer_db',
> >>>> 15   'jdbc.connection.ssl.enabled' = 'true',
> >>>> 16   'jdbc.connection.max-retry-timeout' = '60s',
> >>>> 17   'jdbc.table-name' = 'customers',
> >>>> 18   'jdbc.lookup.cache' = 'PARTIAL'
> >>>> 19 );
> >>>> ```
> >>>> I see three issues from SQL semantics and Connector compatibility
> perspectives:
> >>>> (1) Look at line 14: `mysql_customer_db` is an object identifier of a
> CONNECTION defined in SQL. However, this identifier is referenced
> >>>>     via a string value inside the table’s WITH clause, which feel
> hack for me.
> >>>> (2) Look at lines 14–16: the use of the specific prefix
> `jdbc.connection` will confuse users because `connection.xx` maybe already
> used as
> >>>>  a prefix for existing configuration items.
> >>>> (3) Look at lines 14–18: Why do all existing configuration options
> need to be prefixed with `jdbc`, even they’re not related to Connection
> properties?
> >>>> This completely changes user habits — is it backward compatible?
> >>>>  In my opinion, Connection should be a model independent of both
> Catalog and Table, and can be referenced by all catalog/table/udf/model
> object.
> >>>> It should be managed by a Component such as a ConnectionManager to
> enable reuse. For security purposes, authentication mechanisms could
> >>>> be supported within the ConnectionManager.
> >>>> Best,
> >>>> Leonard
> >>>>> 2025 6月 4 02:04,Martijn Visser <martijnvis...@apache.org> 写道:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> First of all, I think having a Connection resource is something that
> will
> >>>>> be beneficial for Apache Flink. I could see that being extended in
> the
> >>>>> future to allow for easier secret handling [1].
> >>>>> In my mental mind, I'm comparing this proposal against SQL/MED from
> the ISO
> >>>>> standard [2]. I do think that SQL/MED isn't a very user friendly
> syntax
> >>>>> though, looking at Postgres for example [3].
> >>>>>
> >>>>> I think it's a valid question if Connection should be considered
> with a
> >>>>> catalog or database-level scope. @Ryan can you share something more,
> since
> >>>>> you've mentioned "Note: I much prefer catalogs for this case. Which
> is what
> >>>>> we use internally to manage connection properties". It looks like
> there
> >>>>> isn't a strong favourable approach looking at other vendors (like,
> >>>>> Databricks does scopes it on a Unity catalog, Snowflake on a database
> >>>>> level).
> >>>>>
> >>>>> Also looking forward to Leonard's input.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Martijn
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-36818
> >>>>> [2] https://www.iso.org/standard/84804.html
> >>>>> [3] https://www.postgresql.org/docs/current/sql-createserver.html
> >>>>>
> >>>>> On Fri, May 30, 2025 at 5:07 AM Leonard Xu <xbjt...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Hey Mayank.
> >>>>>>
> >>>>>> Thanks for the FLIP, I went through this FLIP quickly and found some
> >>>>>> issues which I think we
> >>>>>> need to deep discuss later. As we’re on a short Dragon boat
> Festival,
> >>>>>> could you kindly hold
> >>>>>> on this thread? and we will back to continue the FLIP discuss.
> >>>>>>
> >>>>>> Best,
> >>>>>> Leonard
> >>>>>>
> >>>>>>
> >>>>>>> 2025 4月 29 23:07,Mayank Juneja <mayankjunej...@gmail.com> 写道:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I would like to open up for discussion a new FLIP-529 [1].
> >>>>>>>
> >>>>>>> Motivation:
> >>>>>>> Currently, Flink SQL handles external connectivity by defining
> endpoints
> >>>>>>> and credentials in table configuration. This approach prevents
> >>>>>> reusability
> >>>>>>> of these connections and makes table definition less secure by
> exposing
> >>>>>>> sensitive information.
> >>>>>>> We propose the introduction of a new "connection" resource in
> Flink. This
> >>>>>>> will be a pluggable resource configured with a remote endpoint and
> >>>>>>> associated access key. Once defined, connections can be reused
> across
> >>>>>> table
> >>>>>>> definitions, and eventually for model definition (as discussed in
> >>>>>> FLIP-437)
> >>>>>>> for inference, enabling seamless and secure integration with
> external
> >>>>>>> systems.
> >>>>>>> The connection resource will provide a new, optional way to manage
> >>>>>> external
> >>>>>>> connectivity in Flink. Existing methods for table definitions will
> remain
> >>>>>>> unchanged.
> >>>>>>>
> >>>>>>> [1] https://cwiki.apache.org/confluence/x/cYroF
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>> Mayank Juneja
> >>>>>>
> >>>>>>
> >>>
> >
>
>

-- 
*Mayank Juneja*
Product Manager | Data Streaming and AI

Reply via email to