Zingeryo opened a new issue, #67459:
URL: https://github.com/apache/airflow/issues/67459
### Under which category would you file this issue?
Providers
### Apache Airflow version
3.2.1
### What happened and how to reproduce it?
## Issue Description
SparkSubmitOperator shows sensitive data in Rendered Template when
truncated. The following rendered template will be shown with sensitive data in
place
```
Truncated. You can change this behaviour in [core]
. {'spark.driver.cores': '4', 'spark.driver.memory': '8g',
'spark.eventLog.dir': 's3a://spark/logs', 'spark.eventLog.enabled': 'true',
'spark.executor.cores': '4', 'spark.executor.instances': '10',
'spark.executor.memory': '8g', 'spark.executor.processTreeMetrics.enabled':
'true', 'spark.hadoop.fs.s3a.bucket.spark.access.key': SENSITIVE_DATA,
'spark.hadoop.fs.s3a.bucket.spark.aws.credentials.provider':
'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider',
'spark.hadoop.fs.s3a.bucket.spark.secret.key': 'SENSITIVE_DATA',
'spark.hadoop.fs.s3a.endpoint': 'https://s3.local', 'spark.hadoop.fs.s3a.impl':
'org.apache.hadoop.fs.s3a.S3AFileSystem',
'spark.hadoop.fs.s3a.path.style.access': 'true', 'spark.jars.ivy':
'/tmp/spark-ivy-cache',
'spark.kubernetes.authenticate.driver.serviceAccountName': 'spark',
'spark.kubernetes.authenticate.executor.serviceAccountName': 'spark',
'spark.kubernetes.container.image': 'spark:latest',
'spark.kubernetes.container.image.pullPolicy': 'IfNotPresent', '
spark.kubernetes.container.image.pullSecrets': 'regcred',
'spark.kubernetes.driver.container.image': 'spark:latest',
'spark.kubernetes.driver.container.image.pullSecrets': 'regcred',
'spark.kubernetes.driver.image.pullPolicy': 'IfNotPresent',
'spark.kubernetes.driver.limit.cores': '4',
'spark.kubernetes.driver.limit.memory': '8g',
'spark.kubernetes.executor.limit.cores': '4',
'spark.kubernetes.executor.limit.memory': '8g', 'spark.kubernetes.namespace':
'spark', 'spark.master': 'k8s://https://k8s-local:6443',
'spark.metrics.appStatusSource.enabled': 'true',
'spark.metrics.conf.*.sink.prometheusServlet.class':
'org.apache.spark.metrics.sink.PrometheusServlet',
'spark.metrics.conf.*.sink.prometheusServlet.path': '/metrics/prometheus',
'spark.pyspark.driver.python': 'python3', 'spark.pyspark.python': 'python3',
'spark.sql.adaptive.coalescePartitions.enabled': 'true',
'spark.sql.adaptive.enabled': 'true', 'spark.sql.adaptive.skewJoin.enabled':
'false', 'spark.sql.catalog.kometa': 'org.ap
ache.iceberg.spark.SparkCatalog', 'spark.sql.catalog.kometa.client.region':
'ru-central1', 'spark.sql.catalog.kometa.header.X-Iceberg-Access-Delegation':
'vended-credentials', 'spark.sql.catalog.kometa.io-impl':
'org.apache.iceberg.aws.s3.S3FileIO',
'spark.sql.catalog.kometa.oauth2-server-uri':
'https://keycloak.local/realms/local', 'spark.sql.catalog.kometa.prefix':
'main', 'spark.sql.catalog.kometa.s3.endpoint': 'https://s3.local,
'spark.sql.catalog.kometa.s3.path-style-access': 'true',
'spark.sql.catalog.kometa.s3.region': 'ru-central1',
'spark.sql.catalog.kometa.token': SENSITIVE_DATA
```
Editing variable `max_templated_field_length` didn't change the output.
Here is my DAG code for reference:
```
import json
import pendulum
from airflow.exceptions import AirflowException
from airflow.providers.apache.spark.operators.spark_submit import
SparkSubmitOperator
from airflow.providers.standard.operators.empty import EmptyOperator
from airflow.sdk import DAG, Variable
SPARK_DATALAKE_CATALOG = "kometa"
SPARK_LAKE_CATALOG_NAME = "lake"
TOKEN = "{{ var.value.get('JWT_TOKEN', {}).get('JWT_TOKEN', '') }}"
default_args = {
"owner": "owner",
"depends_on_past": False,
"start_date": pendulum.datetime(2026, 2, 27, 1, 0, tz="UTC"),
"retries": 4,
"retry_delay": pendulum.duration(minutes=3),
"retry_exponential_backoff": True,
}
SPARK_IMAGE = "spark:latest"
SPARK_CONF = {
"spark.pyspark.python": "python3",
"spark.pyspark.driver.python": "python3",
"spark.submit.deployMode": "cluster",
"spark.kubernetes.container.image": SPARK_IMAGE,
"spark.kubernetes.container.image.pullSecrets": "regcred",
"spark.kubernetes.container.image.pullPolicy": "IfNotPresent",
"spark.kubernetes.driver.image.pullPolicy": "IfNotPresent",
"spark.kubernetes.driver.container.image": SPARK_IMAGE,
"spark.kubernetes.driver.container.image.pullSecrets": "regcred",
"spark.kubernetes.driver.limit.cores": "4",
"spark.kubernetes.driver.limit.memory": "8g",
"spark.kubernetes.executor.limit.cores": "4",
"spark.kubernetes.executor.limit.memory": "8g",
"spark.kubernetes.namespace": "spark",
"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",
"spark.kubernetes.authenticate.executor.serviceAccountName": "spark",
"spark.executor.instances": "10",
"spark.executor.cores": "4",
"spark.executor.memory": "8g",
"spark.driver.cores": "4",
"spark.driver.memory": "8g",
"spark.sql.shuffle.partitions": "400",
"spark.jars.ivy": "/tmp/spark-ivy-cache",
}
def get_spark_lake_conf() -> dict[str, str]:
endpoints = Variable.get("SPARK_LAKE_ENDPOINTS", deserialize_json=True)
s3_spark_hs_secrets = Variable.get("S3_LAKE_SPARK_HS",
deserialize_json=True)
metastore = str(endpoints["METASTORE"]).rstrip("/")
conf = {
"spark.sql.warehouse.dir": "/tmp/spark-warehouse",
"spark.sql.extensions":
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.kometa": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.defaultCatalog": "kometa",
"spark.sql.catalog.kometa.type": "rest",
"spark.sql.catalog.kometa.io-impl":
"org.apache.iceberg.aws.s3.S3FileIO",
"spark.sql.catalog.kometa.uri":
f"{metastore}/{SPARK_LAKE_CATALOG_NAME}",
"spark.sql.catalog.kometa.prefix": "main",
"spark.sql.catalog.kometa.warehouse": str(endpoints["WAREHOUSE"]),
"spark.sql.catalog.kometa.s3.endpoint":
str(endpoints["S3_ENDPOINT"]),
"spark.master": str(endpoints["MASTER"]),
"spark.sql.catalog.kometa.s3.region": "ru-central1",
"spark.sql.catalog.kometa.client.region": "ru-central1",
"spark.sql.catalog.kometa.s3.path-style-access": "true",
"spark.sql.catalog.kometa.oauth2-server-uri":
str(endpoints["KEYCLOAK"]),
"spark.sql.catalog.kometa.token": TOKEN,
"spark.sql.catalog.kometa.header.X-Iceberg-Access-Delegation":
"vended-credentials",
"spark.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
"spark.sql.adaptive.enabled": "true",
"spark.sql.adaptive.coalescePartitions.enabled": "true",
"spark.sql.adaptive.skewJoin.enabled": "false",
"spark.eventLog.dir": "s3a://spark/logs",
"spark.hadoop.fs.s3a.bucket.spark.aws.credentials.provider":
"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider",
"spark.hadoop.fs.s3a.bucket.spark.access.key":
str(s3_spark_hs_secrets["S3_ACCES_KEY"]),
"spark.hadoop.fs.s3a.bucket.spark.secret.key":
str(s3_spark_hs_secrets["S3_SECRET_KEY"]),
"spark.hadoop.fs.s3a.endpoint": "https://s3.local",
"spark.hadoop.fs.s3a.path.style.access": "true",
"spark.eventLog.enabled": "true",
"spark.ui.prometheus.enabled": "true",
"spark.metrics.appStatusSource.enabled": "true",
"spark.executor.processTreeMetrics.enabled": "true",
"spark.metrics.conf.*.sink.prometheusServlet.class":
"org.apache.spark.metrics.sink.PrometheusServlet",
"spark.metrics.conf.*.sink.prometheusServlet.path":
"/metrics/prometheus",
}
return conf
predict_from_template = "{{ (data_interval_start -
macros.timedelta(hours=4)).strftime('%Y-%m-%d %H:%M:%S') }}"
predict_to_template = "{{ (data_interval_start).strftime('%Y-%m-%d
%H:%M:%S') }}"
with DAG(
dag_id="spark",
default_args=default_args,
catchup=False,
schedule="30 */4 * * *",
max_active_runs=1,
tags=["esc"],
description="DAG analytcs tasks",
) as dag:
start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")
feature_matrix = SparkSubmitOperator(
task_id="feature_matrix",
conn_id="SPARK_PROD",
application="local:///opt/spark/work-dir/src/main.py",
name="feature_matrix",
conf={**SPARK_CONF, **get_spark_lake_conf()},
execution_timeout=pendulum.duration(hours=3),
env_vars={
"DATE_START": predict_from_template,
"DATE_END": predict_to_template,
},
)
start >> feature_matrix >> end
```
This is the `conf` field in Rendered Template, that shows Sensitive Data
<img width="615" height="56" alt="Image"
src="https://github.com/user-attachments/assets/0700f22d-9269-4229-b6ff-3c06f26c111d"
/>
### What you think should happen instead?
Rendered Template shouldn't show sensitive data
### Operating System
_No response_
### Deployment
Docker-Compose
### Apache Airflow Provider(s)
apache-spark
### Versions of Apache Airflow Providers
```
apache-airflow==3.2.1
apache-airflow-core==3.2.1
apache-airflow-providers-amazon==9.27.0
apache-airflow-providers-apache-spark==5.5.0
apache-airflow-providers-celery==3.19.0
apache-airflow-providers-cncf-kubernetes==10.16.1
apache-airflow-providers-common-compat==1.14.3
apache-airflow-providers-common-io==1.7.2
apache-airflow-providers-common-sql==1.34.0
apache-airflow-providers-docker==4.5.5
apache-airflow-providers-fab==3.5.0
apache-airflow-providers-hashicorp==4.6.0
apache-airflow-providers-http==6.0.2
apache-airflow-providers-openlineage==2.15.0
apache-airflow-providers-smtp==2.4.5
apache-airflow-providers-standard==1.12.3
apache-airflow-providers-trino==6.5.2
apache-airflow-task-sdk==1.2.1
```
### Official Helm Chart version
Not Applicable
### Kubernetes Version
_No response_
### Helm Chart configuration
_No response_
### Docker Image customizations
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]