[
https://issues.apache.org/jira/browse/FLINK-38035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004446#comment-18004446
]
Robert Metzger commented on FLINK-38035:
----------------------------------------
Thanks. I assigned you to the ticket.
I would go with this approach: "Redact known sensitive keys (SECRET, TOKEN,
KEY, PASSWORD, etc.) before logging.".
Afaik there's already some code in flink for redacting connector properties in
Flink SQL .. you can probably re-use some of the logic there.
> Security Vulnerability in PyFlink Logging Mechanism (PythonEnvUtils.java)
> -------------------------------------------------------------------------
>
> Key: FLINK-38035
> URL: https://issues.apache.org/jira/browse/FLINK-38035
> Project: Flink
> Issue Type: Bug
> Components: API / Python
> Affects Versions: 1.19.1, 1.20.1
> Reporter: Niha
> Assignee: Niha
> Priority: Major
>
> Potential security vulnerability in the logging statement within
> {{PythonEnvUtils.java}} that may expose environment variables — including
> Kubernetes-mounted secrets — during PyFlink job submission.
> The class
> [{{org.apache.flink.client.python.PythonEnvUtils}}|https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L372-L377]
> logs all environment variables at job startup with the following line:
>
> {{{}LOG.info("Starting Python process with environment variables: {}",
> environment);{}}}{{{{}}{}}}
> This line is problematic because it indiscriminately logs {*}all environment
> variables{*}, which may contain {*}sensitive credentials{*}.
> h4. *Context: Kubernetes Operator Users Are Especially at Risk*
> When Flink is deployed using the {*}Flink Kubernetes Operator{*}, secrets are
> commonly passed into pods as *environment variables* (via Kubernetes {{env}}
> or {{envFrom}} fields, e.g. from {{{}secretRef{}}}).
> This includes:
> * Database credentials
> * Cloud service keys (e.g., {{{}AWS_SECRET_ACCESS_KEY{}}})
> * Tokens and encryption keys
> * Custom user-defined secrets
> Logging these secrets in plain text within the Flink JobManager or
> TaskManager logs violates Kubernetes security best practices, which
> explicitly discourage exposing sensitive environment variables in logs, and
> poses a serious risk in production environments.
> h4. *Proposed Fix*
> * Redact known sensitive keys ({{{}SECRET{}}}, {{{}TOKEN{}}}, {{{}KEY{}}},
> {{{}PASSWORD{}}}, etc.) before logging.
> Example fix snippet:
> Map<String, String> redactedEnv = redactSensitive(environment);
> LOG.info("Starting Python process with environment variables: {}",
> redactedEnv);}}
> * Consider an opt-in mechanism (e.g., {{{}log.python.env=true{}}}) for full
> environment visibility in safe/test setups.
> h4. *Steps to Reproduce*
> # Set Kubernetes secrets as environment variables in a FlinkDeployment
> (e.g., via {{{}envFrom.secretRef{}}}).
> # Launch a PyFlink job using the Flink Kubernetes Operator.
> # Examine the JobManager logs.
> # Observe secrets printed via {{{}PythonEnvUtils.java{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)