[
https://issues.apache.org/jira/browse/FLINK-38035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012002#comment-18012002
]
Dian Fu commented on FLINK-38035:
---------------------------------
[~vsakuru] Since it's a security issue, it makes sense to backport to other
active branches.
Merged to:
- release-1.20 via 5218f0950de207452545671c596572d74ed10199
- release-2.0 via a06b43be4aef6df04be741d2be2a5cba2587a0f3
- release-2.1 via ce0265320180e0d139f4d87b4a40c7cbf6a6e55f
> Security Vulnerability in PyFlink Logging Mechanism (PythonEnvUtils.java)
> -------------------------------------------------------------------------
>
> Key: FLINK-38035
> URL: https://issues.apache.org/jira/browse/FLINK-38035
> Project: Flink
> Issue Type: Bug
> Components: API / Python
> Affects Versions: 1.19.1, 1.20.1
> Reporter: Niha
> Assignee: Niha
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0
>
>
> Potential security vulnerability in the logging statement within
> {{PythonEnvUtils.java}} that may expose environment variables — including
> Kubernetes-mounted secrets — during PyFlink job submission.
> The class
> [{{org.apache.flink.client.python.PythonEnvUtils}}|https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L372-L377]
> logs all environment variables at job startup with the following line:
>
> {{{}LOG.info("Starting Python process with environment variables: {}",
> environment);{}}}{{{{}}{}}}
> This line is problematic because it indiscriminately logs {*}all environment
> variables{*}, which may contain {*}sensitive credentials{*}.
> h4. *Context: Kubernetes Operator Users Are Especially at Risk*
> When Flink is deployed using the {*}Flink Kubernetes Operator{*}, secrets are
> commonly passed into pods as *environment variables* (via Kubernetes {{env}}
> or {{envFrom}} fields, e.g. from {{{}secretRef{}}}).
> This includes:
> * Database credentials
> * Cloud service keys (e.g., {{{}AWS_SECRET_ACCESS_KEY{}}})
> * Tokens and encryption keys
> * Custom user-defined secrets
> Logging these secrets in plain text within the Flink JobManager or
> TaskManager logs violates Kubernetes security best practices, which
> explicitly discourage exposing sensitive environment variables in logs, and
> poses a serious risk in production environments.
> h4. *Proposed Fix*
> * Redact known sensitive keys ({{{}SECRET{}}}, {{{}TOKEN{}}}, {{{}KEY{}}},
> {{{}PASSWORD{}}}, etc.) before logging.
> Example fix snippet:
> Map<String, String> redactedEnv = redactSensitive(environment);
> LOG.info("Starting Python process with environment variables: {}",
> redactedEnv);}}
> * Consider an opt-in mechanism (e.g., {{{}log.python.env=true{}}}) for full
> environment visibility in safe/test setups.
> h4. *Steps to Reproduce*
> # Set Kubernetes secrets as environment variables in a FlinkDeployment
> (e.g., via {{{}envFrom.secretRef{}}}).
> # Launch a PyFlink job using the Flink Kubernetes Operator.
> # Examine the JobManager logs.
> # Observe secrets printed via {{{}PythonEnvUtils.java{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)