Re: [I] Track SparkSubmitHook Yarn Cluster application with Yarn CLI [airflow]

via GitHub Thu, 21 May 2026 21:48:45 -0700


amoghrajesh commented on issue #24171:
URL: https://github.com/apache/airflow/issues/24171#issuecomment-4515061930


   Yes, this is directly related. Quick summary of what I think can be done:
   
   https://github.com/apache/airflow/pull/65991 solves the memory problem here, 
terminate the spark-submit early after YARN accepts the submission, poll via 
`yarn application -status`. That is the "non-blocking submit" split.
   
   https://github.com/apache/airflow/issues/67168 will be intending to solve 
the crash recovery problem on top of that split — once `spark-submit` returns 
the app ID immediately, we persist it to `task_state` and reconnect on retry 
instead of resubmitting a duplicate job.
   
   The two are complementary layers. #65991 is a prerequisite for my work in a 
sens, ie: it makes the hook return the app ID early, which is what we need to 
persist.
   
   One coordination point worth discussing: #65991 uses `yarn application 
-status` (CLI subprocess) for polling. My plan was to use the YARN RM REST API 
(GET `/ws/v1/cluster/apps/{id}`). REST avoids spawning a subprocess and does 
not require yarn CLI on the worker, but it is worth aligning rather than having 
two different polling mechanisms in the same codebase.
   
   Happy to sync if useful @nailo2c


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Track SparkSubmitHook Yarn Cluster application with Yarn CLI [airflow]

Reply via email to