nailo2c commented on PR #65991:
URL: https://github.com/apache/airflow/pull/65991#issuecomment-4522416066
Hi @amoghrajesh, I've refactored the code to use the YARN RM REST API.
Tested it e2e on a local Hadoop docker cluster + breeze Airflow, and the run
looked clean, details are in the updated PR description.
Two design choices I'd like to sync on:
1. **Auth defaults to none.**
I followed the same pattern as
[`LivyHook`](https://github.com/apache/airflow/blob/0920c770c8e13e844b53fce93b7de4daf4390c0f/providers/apache/livy/src/airflow/providers/apache/livy/hooks/livy.py#L52):
no auth by default, with an `yarn_rm_auth: requests.auth.AuthBase | None =
None` parameter so Kerberized clusters can pass `HTTPKerberosAuth()`
themselves. No new `requests-kerberos` dependency.
2. **RM URL goes in the Spark connection's `extra`** under key
`yarn_resourcemanager_webapp_address`
Required when `yarn_track_via_rm_api=True`. Any preference on the key
name?
Please let me know if there's anything that needs to be modified on my side.
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]