rlebreto opened a new issue, #46334:
URL: https://github.com/apache/airflow/issues/46334

   ### Apache Airflow Provider(s)
   
   apache-spark
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-apache-spark    5.0.0
   pyspark                                                       3.5.4
   
   
   ### Apache Airflow version
   
   2.10.3
   
   ### Operating System
   
   ubuntu server 22.04
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   NA
   
   ### What happened
   
   Hi
   
   SparkSubmitOperator fails when interacting with a spark standalone server in 
deploy-mode=cluster.
   
   In such condition, the _resolve_should_track_driver_status function in 
/usr/local/lib/python3.12/site-packages/airflow/providers/apache/spark/hooks/spark_submit.py
 returns True... 
   
   I have exactly the same pb as the one discussed here :
   https://github.com/apache/airflow/discussions/21799
   
   I did find a case opened for this issue.
   
   ### What you think should happen instead
   
   I have an error when _resolve_should_track_driver_status returns True.
   If I modify the function to return False, testing the task passes but in 
such case the final spark job status is not confirmed by the spark driver.
   
   ### How to reproduce
   
   ```
   cat sparkoperator_test.py
   import airflow
   from datetime import datetime, timedelta
   
   from airflow.providers.apache.spark.operators.spark_submit import 
SparkSubmitOperator
   
   default_args = {
       'owner': 'xxxxxxxx',
       'start_date': datetime(2025, 1, 31),
   }
   
   with airflow.DAG('sparkoperator_test',
                     default_args=default_args,
                     schedule_interval='@hourly',
                     description='test of sparkoperator in airflow by Omar and 
Regis',
                     ) as dag:
   
       spark_conf={
           'spark.standalone.submit.waitAppCompletion':'true',
       }
   
       spark_compute_pi = SparkSubmitOperator(
           task_id='spark-compute-pi',
           conn_id='spark_qa1',
           java_class='org.apache.spark.examples.SparkPi',
           application='/opt/spark/examples/jars/spark-examples_2.12-3.5.3.jar',
           name='compute_pi_from_airflow',
           application_args=["30"],
           total_executor_cores=4,
           executor_memory='1g',
           executor_cores=1,
           verbose='true',
           conf=spark_conf,
           execution_timeout=timedelta(minutes=10),
           retries=0,
       )
   
       spark_compute_pi
   
   ```
   
   spark version : 3.5.3
   1 single node (hosting the master + 1 worker node)
   
   The error happens when testing the task with `airflow tasks test ...`
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to