dkulichkin opened a new issue, #55561: URL: https://github.com/apache/airflow/issues/55561
### Apache Airflow Provider(s) cncf-kubernetes ### Versions of Apache Airflow Providers apache-airflow-providers-amazon 9.12.0 apache-airflow-providers-apache-spark 5.3.2 apache-airflow-providers-cncf-kubernetes 10.8.0 apache-airflow-providers-common-compat 1.7.3 apache-airflow-providers-common-io 1.6.0 apache-airflow-providers-common-sql 1.28.0 apache-airflow-providers-fab 2.4.1 apache-airflow-providers-hashicorp 3.8.0 apache-airflow-providers-http 5.3.4 apache-airflow-providers-postgres 6.2.3 apache-airflow-providers-smtp 2.2.1 apache-airflow-providers-standard 1.3.0 ### Apache Airflow version 3.0.6 ### Operating System Ubuntu 22.04.5 LTS (Jammy Jellyfish) ### Deployment Official Apache Airflow Helm Chart ### Deployment details version 1.18 ### What happened With the new way to run tasks introduced in Airflow 3 via python sdk and json payload the official Spark image with its container's [entrypoint](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L118) can't handle that anymore: <img width="1014" height="206" alt="Image" src="https://github.com/user-attachments/assets/30e7dc51-152f-4c8e-b347-b06ed6141735" /> the json ends up passed with double quotes being stripped: <img width="1397" height="490" alt="Image" src="https://github.com/user-attachments/assets/55b0a396-496d-4cb2-81bb-9be195a37f13" /> When I add escaping slashes it's able to move on: <img width="1424" height="213" alt="Image" src="https://github.com/user-attachments/assets/8659c643-dc63-4218-adea-743232b91b63" /> <img width="1424" height="411" alt="Image" src="https://github.com/user-attachments/assets/78e7bbc3-9386-4a73-b303-8f6a86f2fd83" /> I do not believe there's a fundamental problem with the k8s provider but still would appreciate any tips with how this can be handled. Ideally without patching the spark image entrypoint but as a last resort this also counts. Thanks ### What you think should happen instead Spark workload should be executed correctly in k8s environment ### How to reproduce The setup is somewhat long and complicated, hence just a high level description: 1. Create a dummy DAG with some hello-world Spark workload using SparkSubmitOperator, for example the Pi-number calculation. Should be looking something like this: ``` SparkSubmitOperator( task_id="run_spark_pi", application="/opt/spark/examples/jars/spark-examples_2.12-3.3.1.jar", java_class="org.apache.spark.examples.SparkPi", name="spark_pi_job", application_args=["10000"], # number of samples to use in Pi calculation verbose=True, ) ``` 2. Get a [spark image](https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images) built for executing Spark in a k8s environment. Add apache/airflow image on top of it with your DAGs as per in [the Airflow helm chart docs](https://airflow.apache.org/docs/helm-chart/stable/quick-start.html#extending-airflow-image). Make sure you are using the chart's version 1.18 where Airflow 3.0.6. will be installed. Make sure the image is pull-able in that k8s environment, f.e. if using kind cluster the image must be loaded there. 3. Specify this image as a default airflow image and deploy the chart: `helm upgrade $RELEASE_NAME apache-airflow/airflow --namespace $NAMESPACE \ --set images.airflow.repository=my-dags \ --set images.airflow.tag=0.0.1` 4. Trigger the DAG ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
