dkulichkin opened a new issue, #55561:
URL: https://github.com/apache/airflow/issues/55561

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon          9.12.0
   apache-airflow-providers-apache-spark    5.3.2
   apache-airflow-providers-cncf-kubernetes 10.8.0
   apache-airflow-providers-common-compat   1.7.3
   apache-airflow-providers-common-io       1.6.0
   apache-airflow-providers-common-sql      1.28.0
   apache-airflow-providers-fab             2.4.1
   apache-airflow-providers-hashicorp       3.8.0
   apache-airflow-providers-http            5.3.4
   apache-airflow-providers-postgres        6.2.3
   apache-airflow-providers-smtp            2.2.1
   apache-airflow-providers-standard        1.3.0
   
   ### Apache Airflow version
   
   3.0.6
   
   ### Operating System
   
   Ubuntu 22.04.5 LTS (Jammy Jellyfish)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   version 1.18
   
   ### What happened
   
   With the new way to run tasks introduced in Airflow 3 via python sdk and 
json payload the official Spark image with its container's 
[entrypoint](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L118)
 can't handle that anymore:
   
   <img width="1014" height="206" alt="Image" 
src="https://github.com/user-attachments/assets/30e7dc51-152f-4c8e-b347-b06ed6141735";
 />
   
   the json ends up passed with double quotes being stripped:
   
   <img width="1397" height="490" alt="Image" 
src="https://github.com/user-attachments/assets/55b0a396-496d-4cb2-81bb-9be195a37f13";
 />
   
   When I add escaping slashes it's able to move on:
   
   <img width="1424" height="213" alt="Image" 
src="https://github.com/user-attachments/assets/8659c643-dc63-4218-adea-743232b91b63";
 />
   
   <img width="1424" height="411" alt="Image" 
src="https://github.com/user-attachments/assets/78e7bbc3-9386-4a73-b303-8f6a86f2fd83";
 />
   
   I do not believe there's a fundamental problem with the k8s provider but 
still would appreciate any tips with how this can be handled. Ideally without 
patching the spark image entrypoint but as a last resort this also counts.
   
   Thanks
   
   ### What you think should happen instead
   
   Spark workload should be executed correctly in k8s environment
   
   ### How to reproduce
   
   The setup is somewhat long and complicated, hence just a high level 
description:
   
   1. Create a dummy DAG with some hello-world Spark workload using 
SparkSubmitOperator, for example the Pi-number calculation. Should be looking 
something like this:
   
   ```
   SparkSubmitOperator(
       task_id="run_spark_pi",    
       application="/opt/spark/examples/jars/spark-examples_2.12-3.3.1.jar",
       java_class="org.apache.spark.examples.SparkPi",
       name="spark_pi_job",    
       application_args=["10000"],  # number of samples to use in Pi calculation
       verbose=True,
   )
   ```
   
   2. Get a [spark 
image](https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images)
 built for executing Spark in a k8s environment. Add apache/airflow image on 
top of it with your DAGs as per in [the Airflow helm chart 
docs](https://airflow.apache.org/docs/helm-chart/stable/quick-start.html#extending-airflow-image).
 Make sure you are using the chart's version 1.18 where Airflow 3.0.6. will be 
installed. Make sure the image is pull-able in that k8s environment, f.e. if 
using kind cluster the image must be loaded there.
   
   3. Specify this image as a default airflow image and deploy the chart:
   
   `helm upgrade $RELEASE_NAME apache-airflow/airflow --namespace $NAMESPACE \
       --set images.airflow.repository=my-dags \
       --set images.airflow.tag=0.0.1`
   
   4. Trigger the DAG
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to