andallo opened a new issue, #41211:
URL: https://github.com/apache/airflow/issues/41211

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-cncf-kubernetes 8.3.1
   
   ### Apache Airflow version
   
   2.9.2
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When reattach_on_restart option is on, SparkKubernetesOperator tries to find 
already launched driver pod by labels: **dag_id**, **task_id**, **run_id**. But 
operator doesn't add these labels to the driver, so there is no guarantee it 
can find that driver in case it exists. Operator can find already launched 
driver only if mentioned labels were specified for driver in parameters of 
itself.
   
   ### What you think should happen instead
   
   SparkKubernetesOperator should add labels **dag_id**, **task_id**, 
**run_id** to specification of SparkApplication for driver and executor. 
Specification come from application_file or template_spec parameters, and then 
it become template_body parameter. It is easy to add labels to template_body 
parameter, because operator has a context that keep all values for mentioned 
labels.
   
   ### How to reproduce
   
   Start SparkApplication using SparkKubernetesOperator. Don't specify 
**dag_id**, **task_id**, **run_id** labels in parameters for driver and 
executor (for example in application_file parameter). Then task pod submitting 
SparkApplication will have mentioned labels, but driver and executors pods will 
not. 
   
   That is a problem for reattach_on_restart logic, because it is searching 
driver by that labels.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to