Re: [DISCUSS] assign SQL Table properties from environment variables

2022-12-16 Thread Martijn Visser
Hi Ferenc,

I think this is indeed an important topic to have a discussion about. It
probably also makes sense to create a FLIP and have a vote on this.

One potential option that we could have is to have a generic option to
refer to external data when specifying a connector option,
like ${file:path:key}. If you have an example like this:

CREATE TABLE mytable (
...
)
WITH (
'connector' = 'kafka',
'properties.bootstrap.servers' = '...:9092',
'topic' = '...',
'properties.ssl.keystore.password' = ,
'properties.ssl.keystore.location' = ...,
'properties.ssl.truststore.password' = ,
'properties.ssl.truststore.location' = ...,
);

You would end up with something like

CREATE TABLE mytable (
...
)
WITH (
'connector' = 'kafka',
'properties.bootstrap.servers' =
'${file:/connect/kafka.properties:BOOTSTRAP_SERVERS',
'topic' = '...',
'properties.ssl.keystore.password' =
'${file:/credentials/credential.properties:KEYSTORE_PASSWORD',
'properties.ssl.keystore.location' = '${file:/credentials/keystore.jks',
'properties.ssl.truststore.password' =
'${file:/credentials/credential.properties:TRUSTSTORE_PASSWORD',
'properties.ssl.truststore.location' = '${file:/credentials/truststore.jks'
);

With the bootstrap.servers defined in a kafka.properties file and
credentials in a credentials.properties file. You could also directly refer
to things like a keystore file.

Would be interesting to understand how your internal solution looks. It
would probably also be a good idea to have a look at how others have solved
this problem. And last but not least, we also need to think how potentially
auto-rotated secrets would be made available for SQL users.

Best regards,

Martijn

On Fri, Dec 2, 2022 at 1:37 AM Ferenc Csaky 
wrote:

> Hello devs,
>
> I'd like to revive this discussion. There is also a ticket about this
> effort for some time [1] and this thing also affects us as well. Right now
> we have a custom solution that is similar to "environment variables", but
> it only can be used in parts of our downstream product. The main thing for
> us to achieve would be to be able to use variables in DDLs (not necessarily
> for hiding sensitive props). I think it would be really handy to have the
> ability to reuse values in multiple tables.
>
> With that said, comes the temptation to hit two birds with one stone,
> although a sensitive property requires much more care than a regular one,
> so I think these two things should be handled separately. At least in the
> beginning. The tricky part of the "environment variables" are their scope,
> and if they are not coming from an external system, it will probably be
> necessary to persist them. Or keep them in memory, but that may be
> insufficient according to what is the scope of the "environment variables".
>
> Considering the sensitive props, I think a small step forward could be to
> hide the values in case of a "SHOW CREATE TABLE" op.
>
> For a varible to be used in a DDL I'd imagine it could apply for a whole
> catalog as starters. As long as the catalog is present, those variables
> would be valid.
>
> I did not check implementation details yet, so it is possible I'm missing
> something important or wrong in some places, but I wanted to get some
> feedback about the idea.
>
> WDYT?
>
> [1] https://issues.apache.org/jira/browse/FLINK-28028
>
> Best,
> F
>
>
> --- Original Message ---
> On Monday, April 4th, 2022 at 09:53, Timo Walther 
> wrote:
>
>
> >
> >
> > Hi Fred,
> >
> > thanks for starting this discussion. I totally agree that this an issue
> > that the community should solve. It popped up before and is still
> > unsolved today. Great that you offer your help here. So let's clarify
> > the implementation details.
> >
> > 1) Global vs. Local solution
> >
> > Is this a DDL-only problem? If yes, it would be easier to solve it in
> > the `FactoryUtil` that all Flink connectors and formats use.
> >
> > 2) Configruation vs. enviornment variables
> >
> > I agree with Qingsheng that environment variable are not always
> > straightforward to identify if you have a "pre-flight phase" and a
> > "cluster phase".
> > In the DynamicTableFactory, one has access to Flink configuration and
> > could resolve `${...}` variables.
> >
> >
> > What do you think?
> >
> > Regards,
> > Timo
> >
> >
> > Am 01.04.22 um 12:26 schrieb Qingsheng Ren:
> >
> > > Hi Fred,
> > >
> > > Thanks for raising the discussion! I think the definition of
> “environment variable” varies under different context. Under Flink on K8s
> it means the environment variable for a container, and if you are a SQL
> client user it could refer to environment variable of SQL client, or even
> the system properties on JVM. So using “environment variable” is a bit
> vague under different environments.
> > >
> > > A more generic solution in my mind is that we can take advantage of
> configurations in Flink, to pass table options dynamically by adding
> configs to TableConfig or even flink-conf.yaml. For example option
> 

Re: [DISCUSS] assign SQL Table properties from environment variables

2022-12-01 Thread Ferenc Csaky
Hello devs,

I'd like to revive this discussion. There is also a ticket about this effort 
for some time [1] and this thing also affects us as well. Right now we have a 
custom solution that is similar to "environment variables", but it only can be 
used in parts of our downstream product. The main thing for us to achieve would 
be to be able to use variables in DDLs (not necessarily for hiding sensitive 
props). I think it would be really handy to have the ability to reuse values in 
multiple tables.

With that said, comes the temptation to hit two birds with one stone, although 
a sensitive property requires much more care than a regular one, so I think 
these two things should be handled separately. At least in the beginning. The 
tricky part of the "environment variables" are their scope, and if they are not 
coming from an external system, it will probably be necessary to persist them. 
Or keep them in memory, but that may be insufficient according to what is the 
scope of the "environment variables".

Considering the sensitive props, I think a small step forward could be to hide 
the values in case of a "SHOW CREATE TABLE" op.

For a varible to be used in a DDL I'd imagine it could apply for a whole 
catalog as starters. As long as the catalog is present, those variables would 
be valid.

I did not check implementation details yet, so it is possible I'm missing 
something important or wrong in some places, but I wanted to get some feedback 
about the idea.

WDYT?

[1] https://issues.apache.org/jira/browse/FLINK-28028

Best,
F


--- Original Message ---
On Monday, April 4th, 2022 at 09:53, Timo Walther  wrote:


> 
> 
> Hi Fred,
> 
> thanks for starting this discussion. I totally agree that this an issue
> that the community should solve. It popped up before and is still
> unsolved today. Great that you offer your help here. So let's clarify
> the implementation details.
> 
> 1) Global vs. Local solution
> 
> Is this a DDL-only problem? If yes, it would be easier to solve it in
> the `FactoryUtil` that all Flink connectors and formats use.
> 
> 2) Configruation vs. enviornment variables
> 
> I agree with Qingsheng that environment variable are not always
> straightforward to identify if you have a "pre-flight phase" and a
> "cluster phase".
> In the DynamicTableFactory, one has access to Flink configuration and
> could resolve `${...}` variables.
> 
> 
> What do you think?
> 
> Regards,
> Timo
> 
> 
> Am 01.04.22 um 12:26 schrieb Qingsheng Ren:
> 
> > Hi Fred,
> > 
> > Thanks for raising the discussion! I think the definition of “environment 
> > variable” varies under different context. Under Flink on K8s it means the 
> > environment variable for a container, and if you are a SQL client user it 
> > could refer to environment variable of SQL client, or even the system 
> > properties on JVM. So using “environment variable” is a bit vague under 
> > different environments.
> > 
> > A more generic solution in my mind is that we can take advantage of 
> > configurations in Flink, to pass table options dynamically by adding 
> > configs to TableConfig or even flink-conf.yaml. For example option 
> > “table.dynamic.options.my_catalog.my_db_.my_table.accessId = foo” means 
> > adding table option “accessId = foo” to table “my_catalog.my_db.my_table”. 
> > By this way we could de-couple DDL statement with table options containing 
> > secret credentials. What do you think?
> > 
> > Best regards,
> > 
> > Qingsheng
> > 
> > > On Mar 30, 2022, at 16:25, Teunissen, F.G.J. (Fred) 
> > > fred.teunis...@ing.com.INVALID wrote:
> > > 
> > > Hi devs,
> > > 
> > > Some SQL Table properties contain sensitive data, like passwords that we 
> > > do not want to expose in the VVP ui to other users. Also, having them 
> > > clear text in a SQL statement is not secure. For example,
> > > 
> > > CREATE TABLE Orders (
> > > `user` BIGINT,
> > > product STRING,
> > > order_time TIMESTAMP(3)
> > > ) WITH (
> > > 'connector' = 'kafka',
> > > 
> > > 'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
> > > 'properties.security.protocol' = 'SSL',
> > > 'properties.ssl.key.password' = 'should-be-a-secret',
> > > 'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
> > > 'properties.ssl.keystore.password' = 'should-also-be-a-secret',
> > > 'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
> > > 'properties.ssl.truststore.password' = 'should-again-be-a-secret',
> > > 'scan.startup.mode' = 'earliest-offset'
> > > );
> > > 
> > > I would like to bring up for a discussion a proposal to provide these 
> > > secrets values via environment variables since these can be populated 
> > > from a K8s configMap or secrets.
> > > 
> > > For implementing the SQL Table properties, the ConfigOption class is 
> > > used in connectors and formatters. This class could be extended that it 
> > > checks whether the config-value contains certain tokens, like 
> > > ‘${env-var-name}’. If 

Re: [DISCUSS] assign SQL Table properties from environment variables

2022-04-04 Thread Timo Walther

Hi Fred,

thanks for starting this discussion. I totally agree that this an issue 
that the community should solve. It popped up before and is still 
unsolved today. Great that you offer your help here. So let's clarify 
the implementation details.


1) Global vs. Local solution

Is this a DDL-only problem? If yes, it would be easier to solve it in 
the `FactoryUtil` that all Flink connectors and formats use.


2) Configruation vs. enviornment variables

I agree with Qingsheng that environment variable are not always 
straightforward to identify if you have a "pre-flight phase" and a 
"cluster phase".
In the DynamicTableFactory, one has access to Flink configuration and 
could resolve `${...}` variables.



What do you think?

Regards,
Timo


Am 01.04.22 um 12:26 schrieb Qingsheng Ren:

Hi Fred,

Thanks for raising the discussion! I think the definition of “environment 
variable” varies under different context. Under Flink on K8s it means the 
environment variable for a container, and if you are a SQL client user it could 
refer to environment variable of SQL client, or even the system properties on 
JVM. So using “environment variable” is a bit vague under different 
environments.

A more generic solution in my mind is that we can take advantage of 
configurations in Flink, to pass table options dynamically by adding configs to 
TableConfig or even flink-conf.yaml. For example option 
“table.dynamic.options.my_catalog.my_db_.my_table.accessId = foo” means adding 
table option “accessId = foo” to table “my_catalog.my_db.my_table”. By this way 
we could de-couple DDL statement with table options containing secret 
credentials. What do you think?

Best regards,

Qingsheng


On Mar 30, 2022, at 16:25, Teunissen, F.G.J. (Fred) 
 wrote:

Hi devs,

Some SQL Table properties contain sensitive data, like passwords that we do not 
want to expose in the VVP ui to other users. Also, having them clear text in a 
SQL statement is not secure. For example,

CREATE TABLE Orders (
`user` BIGINT,
product STRING,
order_time TIMESTAMP(3)
) WITH (
'connector' = 'kafka',

'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
'properties.security.protocol' = 'SSL',
'properties.ssl.key.password' = 'should-be-a-secret',
'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
'properties.ssl.keystore.password' = 'should-also-be-a-secret',
'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
'properties.ssl.truststore.password' = 'should-again-be-a-secret',
'scan.startup.mode' = 'earliest-offset'
);

I would like to bring up for a discussion a proposal to provide these secrets 
values via environment variables since these can be populated from a K8s 
configMap or secrets.

For implementing the SQL Table properties, the ConfigOption class is used in 
connectors and formatters. This class could be extended that it checks whether the 
config-value contains certain tokens, like ‘${env-var-name}’. If it does, it could 
fetch the value from the environment variable and use that to replace that token in 
the config-value.

The above SQL statement would then look like,

CREATE TABLE Orders (
`user` BIGINT,
product STRING,
order_time TIMESTAMP(3)
) WITH (
'connector' = 'kafka',

'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
'properties.security.protocol' = 'SSL',
'properties.ssl.key.password' = '${secret_kafka_ssl_key_password}',
'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
'properties.ssl.keystore.password' = 
'${secret_kafka_ssl_keystore_password}',
'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
'properties.ssl.truststore.password' = 
'${secret_kafka_ssl_truststore_password}',
'scan.startup.mode' = 'earliest-offset'
);

For the purpose of secrets I don’t think you need any complex processing of 
tokens but perhaps there are other usages as well. For instance,

'properties.bootstrap.servers' = 
'kafka-${otap_env}-1:9093,kafka-${otap_env}-2:9093',

Because it is possible that (but I think unlikely) someone wants a property 
value like ‘${not-an-env-var}’ you need to be able to escape this ’$’ token 
like ‘$${not-an-env-var}’. This also means that in theory it would break 
compatibility.

Looking forward for your feedback!

Best,
Fred Teunissen

-
ATTENTION:
The information in this e-mail is confidential and only meant for the intended 
recipient. If you are not the intended recipient, don't use or disclose it in 
any way. Please let the sender know and delete the message immediately.
-





Re: [DISCUSS] assign SQL Table properties from environment variables

2022-04-01 Thread Qingsheng Ren
Hi Fred, 

Thanks for raising the discussion! I think the definition of “environment 
variable” varies under different context. Under Flink on K8s it means the 
environment variable for a container, and if you are a SQL client user it could 
refer to environment variable of SQL client, or even the system properties on 
JVM. So using “environment variable” is a bit vague under different 
environments. 

A more generic solution in my mind is that we can take advantage of 
configurations in Flink, to pass table options dynamically by adding configs to 
TableConfig or even flink-conf.yaml. For example option 
“table.dynamic.options.my_catalog.my_db_.my_table.accessId = foo” means adding 
table option “accessId = foo” to table “my_catalog.my_db.my_table”. By this way 
we could de-couple DDL statement with table options containing secret 
credentials. What do you think?

Best regards, 

Qingsheng

> On Mar 30, 2022, at 16:25, Teunissen, F.G.J. (Fred) 
>  wrote:
> 
> Hi devs,
> 
> Some SQL Table properties contain sensitive data, like passwords that we do 
> not want to expose in the VVP ui to other users. Also, having them clear text 
> in a SQL statement is not secure. For example,
> 
> CREATE TABLE Orders (
>`user` BIGINT,
>product STRING,
>order_time TIMESTAMP(3)
> ) WITH (
>'connector' = 'kafka',
> 
>'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
>'properties.security.protocol' = 'SSL',
>'properties.ssl.key.password' = 'should-be-a-secret',
>'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
>'properties.ssl.keystore.password' = 'should-also-be-a-secret',
>'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
>'properties.ssl.truststore.password' = 'should-again-be-a-secret',
>'scan.startup.mode' = 'earliest-offset'
> );
> 
> I would like to bring up for a discussion a proposal to provide these secrets 
> values via environment variables since these can be populated from a K8s 
> configMap or secrets.
> 
> For implementing the SQL Table properties, the ConfigOption class is used 
> in connectors and formatters. This class could be extended that it checks 
> whether the config-value contains certain tokens, like ‘${env-var-name}’. If 
> it does, it could fetch the value from the environment variable and use that 
> to replace that token in the config-value.
> 
> The above SQL statement would then look like,
> 
> CREATE TABLE Orders (
>`user` BIGINT,
>product STRING,
>order_time TIMESTAMP(3)
> ) WITH (
>'connector' = 'kafka',
> 
>'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
>'properties.security.protocol' = 'SSL',
>'properties.ssl.key.password' = '${secret_kafka_ssl_key_password}',
>'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
>'properties.ssl.keystore.password' = 
> '${secret_kafka_ssl_keystore_password}',
>'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
>'properties.ssl.truststore.password' = 
> '${secret_kafka_ssl_truststore_password}',
>'scan.startup.mode' = 'earliest-offset'
> );
> 
> For the purpose of secrets I don’t think you need any complex processing of 
> tokens but perhaps there are other usages as well. For instance,
> 
>'properties.bootstrap.servers' = 
> 'kafka-${otap_env}-1:9093,kafka-${otap_env}-2:9093',
> 
> Because it is possible that (but I think unlikely) someone wants a property 
> value like ‘${not-an-env-var}’ you need to be able to escape this ’$’ token 
> like ‘$${not-an-env-var}’. This also means that in theory it would break 
> compatibility.
> 
> Looking forward for your feedback!
> 
> Best,
> Fred Teunissen
> 
> -
> ATTENTION:
> The information in this e-mail is confidential and only meant for the 
> intended recipient. If you are not the intended recipient, don't use or 
> disclose it in any way. Please let the sender know and delete the message 
> immediately.
> -



Re: [DISCUSS] assign SQL Table properties from environment variables

2022-04-01 Thread Martijn Visser
Hi Fred,

Thanks for opening the discussion. There are probably multiple ways to
solve it, you could also think about addressing this problem in a catalogue
and hide any password/keywords in there. But then again, I do think your
solution could also work and is straightforward.

If there are no people who have a better alternative or think otherwise, I
think it should be fine to open a Jira ticket and if you're willing to make
a contribution, that would be great!

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Wed, 30 Mar 2022 at 10:25, Teunissen, F.G.J. (Fred)
 wrote:

> Hi devs,
>
> Some SQL Table properties contain sensitive data, like passwords that we
> do not want to expose in the VVP ui to other users. Also, having them clear
> text in a SQL statement is not secure. For example,
>
> CREATE TABLE Orders (
> `user` BIGINT,
> product STRING,
> order_time TIMESTAMP(3)
> ) WITH (
> 'connector' = 'kafka',
>
> 'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
> 'properties.security.protocol' = 'SSL',
> 'properties.ssl.key.password' = 'should-be-a-secret',
> 'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
> 'properties.ssl.keystore.password' = 'should-also-be-a-secret',
> 'properties.ssl.truststore.location' =
> '/tmp/secrets/my-truststore.jks',
> 'properties.ssl.truststore.password' = 'should-again-be-a-secret',
> 'scan.startup.mode' = 'earliest-offset'
> );
>
> I would like to bring up for a discussion a proposal to provide these
> secrets values via environment variables since these can be populated from
> a K8s configMap or secrets.
>
> For implementing the SQL Table properties, the ConfigOption class is
> used in connectors and formatters. This class could be extended that it
> checks whether the config-value contains certain tokens, like
> ‘${env-var-name}’. If it does, it could fetch the value from the
> environment variable and use that to replace that token in the config-value.
>
> The above SQL statement would then look like,
>
> CREATE TABLE Orders (
> `user` BIGINT,
> product STRING,
> order_time TIMESTAMP(3)
> ) WITH (
> 'connector' = 'kafka',
>
> 'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
> 'properties.security.protocol' = 'SSL',
> 'properties.ssl.key.password' = '${secret_kafka_ssl_key_password}',
> 'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
> 'properties.ssl.keystore.password' =
> '${secret_kafka_ssl_keystore_password}',
> 'properties.ssl.truststore.location' =
> '/tmp/secrets/my-truststore.jks',
> 'properties.ssl.truststore.password' =
> '${secret_kafka_ssl_truststore_password}',
> 'scan.startup.mode' = 'earliest-offset'
> );
>
> For the purpose of secrets I don’t think you need any complex processing
> of tokens but perhaps there are other usages as well. For instance,
>
> 'properties.bootstrap.servers' =
> 'kafka-${otap_env}-1:9093,kafka-${otap_env}-2:9093',
>
> Because it is possible that (but I think unlikely) someone wants a
> property value like ‘${not-an-env-var}’ you need to be able to escape this
> ’$’ token like ‘$${not-an-env-var}’. This also means that in theory it
> would break compatibility.
>
> Looking forward for your feedback!
>
> Best,
> Fred Teunissen
>
> -
> ATTENTION:
> The information in this e-mail is confidential and only meant for the
> intended recipient. If you are not the intended recipient, don't use or
> disclose it in any way. Please let the sender know and delete the message
> immediately.
> -
>


[DISCUSS] assign SQL Table properties from environment variables

2022-03-30 Thread Teunissen, F.G.J. (Fred)
Hi devs,

Some SQL Table properties contain sensitive data, like passwords that we do not 
want to expose in the VVP ui to other users. Also, having them clear text in a 
SQL statement is not secure. For example,

CREATE TABLE Orders (
`user` BIGINT,
product STRING,
order_time TIMESTAMP(3)
) WITH (
'connector' = 'kafka',

'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
'properties.security.protocol' = 'SSL',
'properties.ssl.key.password' = 'should-be-a-secret',
'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
'properties.ssl.keystore.password' = 'should-also-be-a-secret',
'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
'properties.ssl.truststore.password' = 'should-again-be-a-secret',
'scan.startup.mode' = 'earliest-offset'
);

I would like to bring up for a discussion a proposal to provide these secrets 
values via environment variables since these can be populated from a K8s 
configMap or secrets.

For implementing the SQL Table properties, the ConfigOption class is used in 
connectors and formatters. This class could be extended that it checks whether 
the config-value contains certain tokens, like ‘${env-var-name}’. If it does, 
it could fetch the value from the environment variable and use that to replace 
that token in the config-value.

The above SQL statement would then look like,

CREATE TABLE Orders (
`user` BIGINT,
product STRING,
order_time TIMESTAMP(3)
) WITH (
'connector' = 'kafka',

'properties.bootstrap.servers' = 'kafka-host-1:9093,kafka-host-2:9093',
'properties.security.protocol' = 'SSL',
'properties.ssl.key.password' = '${secret_kafka_ssl_key_password}',
'properties.ssl.keystore.location' = '/tmp/secrets/my-keystore.jks',
'properties.ssl.keystore.password' = 
'${secret_kafka_ssl_keystore_password}',
'properties.ssl.truststore.location' = '/tmp/secrets/my-truststore.jks',
'properties.ssl.truststore.password' = 
'${secret_kafka_ssl_truststore_password}',
'scan.startup.mode' = 'earliest-offset'
);

For the purpose of secrets I don’t think you need any complex processing of 
tokens but perhaps there are other usages as well. For instance,

'properties.bootstrap.servers' = 
'kafka-${otap_env}-1:9093,kafka-${otap_env}-2:9093',

Because it is possible that (but I think unlikely) someone wants a property 
value like ‘${not-an-env-var}’ you need to be able to escape this ’$’ token 
like ‘$${not-an-env-var}’. This also means that in theory it would break 
compatibility.

Looking forward for your feedback!

Best,
Fred Teunissen

-
ATTENTION:
The information in this e-mail is confidential and only meant for the intended 
recipient. If you are not the intended recipient, don't use or disclose it in 
any way. Please let the sender know and delete the message immediately.
-