Copilot commented on code in PR #63775:
URL: https://github.com/apache/airflow/pull/63775#discussion_r3025383633
##########
providers/databricks/src/airflow/providers/databricks/utils/openlineage.py:
##########
@@ -24,7 +24,7 @@
import requests
from airflow.providers.common.compat.openlineage.check import
require_openlineage_version
-from airflow.utils import timezone
+from airflow.utils import timezone # type: ignore[attr-defined]
Review Comment:
`from airflow.utils import timezone # type: ignore[attr-defined]` relies on
Airflow’s runtime deprecation redirect and silences typing. Elsewhere in the
repo the cross-version pattern is to import timezone via
`airflow.providers.common.compat.sdk` (or `try: from airflow.sdk import
timezone`). Consider switching to the compat import to avoid the type ignore
and keep a consistent Airflow 2/3 import style.
```suggestion
from airflow.providers.common.compat.sdk import timezone
```
##########
providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:
##########
@@ -1156,18 +1214,25 @@ async def _a_do_api_call(self, endpoint_info:
tuple[str, str], json: dict[str, A
method, endpoint = endpoint_info
full_endpoint = f"api/{endpoint}"
- url = self._endpoint_url(full_endpoint)
+ url = await self._a_endpoint_url(full_endpoint)
aad_headers = await self._a_get_aad_headers()
headers = {**self.user_agent_header, **aad_headers}
- auth: aiohttp.BasicAuth
+ auth: aiohttp.BasicAuth | BearerAuth
token = await self._a_get_token()
if token:
auth = BearerAuth(token)
else:
self.log.info("Using basic auth.")
- auth = aiohttp.BasicAuth(self._get_connection_attr("login"),
self.databricks_conn.password)
+ conn = await self.a_databricks_conn()
+
+ # We access login and password directly from the connection object
to avoid
+ # calling self.databricks_conn which would trigger sync
get_connection().
+ # Behavior matches sync _get_connection_attr() but without the
cache-clash.
Review Comment:
The inline comment mentions a “cache-clash”, but the actual issue being
avoided is triggering the synchronous `get_connection()` path (and AsyncToSync
failures) inside the triggerer event loop. Consider rewording this comment to
reflect that motivation, and optionally reuse `_get_connection_attr("login",
conn)` here for consistency with other async auth paths.
```suggestion
# Access login and password directly on the async connection
object to avoid
# calling the synchronous get_connection() path (and AsyncToSync
failures)
# inside the triggerer event loop. This mirrors sync
_get_connection_attr()
# behavior without invoking the sync hook property.
```
##########
providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:
##########
@@ -1156,18 +1214,25 @@ async def _a_do_api_call(self, endpoint_info:
tuple[str, str], json: dict[str, A
method, endpoint = endpoint_info
full_endpoint = f"api/{endpoint}"
- url = self._endpoint_url(full_endpoint)
+ url = await self._a_endpoint_url(full_endpoint)
aad_headers = await self._a_get_aad_headers()
headers = {**self.user_agent_header, **aad_headers}
- auth: aiohttp.BasicAuth
+ auth: aiohttp.BasicAuth | BearerAuth
token = await self._a_get_token()
if token:
auth = BearerAuth(token)
else:
self.log.info("Using basic auth.")
- auth = aiohttp.BasicAuth(self._get_connection_attr("login"),
self.databricks_conn.password)
+ conn = await self.a_databricks_conn()
+
+ # We access login and password directly from the connection object
to avoid
+ # calling self.databricks_conn which would trigger sync
get_connection().
+ # Behavior matches sync _get_connection_attr() but without the
cache-clash.
Review Comment:
The comment says this avoids a “cache-clash”, but the main concern here is
avoiding the sync `get_connection()` path inside the triggerer event loop.
Consider updating the wording to reflect the actual risk (AsyncToSync in a
running event loop) and/or reference that `a_databricks_conn()` primes the
cached_property.
```suggestion
# We access login and password directly from the async
connection object to avoid
# going through self.databricks_conn, which would invoke the
sync get_connection()
# (AsyncToSync) path inside the triggerer event loop.
a_databricks_conn() also primes
# the databricks_conn cached_property so later access does not
re-enter get_connection().
```
##########
providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:
##########
@@ -1156,18 +1214,25 @@ async def _a_do_api_call(self, endpoint_info:
tuple[str, str], json: dict[str, A
method, endpoint = endpoint_info
full_endpoint = f"api/{endpoint}"
- url = self._endpoint_url(full_endpoint)
+ url = await self._a_endpoint_url(full_endpoint)
aad_headers = await self._a_get_aad_headers()
headers = {**self.user_agent_header, **aad_headers}
Review Comment:
`headers` is already computed as `{**self.user_agent_header,
**aad_headers}`; later, the request passes `headers={**headers,
**self.user_agent_header}`, re-merging the user-agent header and making
precedence harder to reason about. Consider passing `headers=headers` (or
explicitly document why user-agent should override AAD headers).
##########
providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:
##########
@@ -1156,18 +1214,25 @@ async def _a_do_api_call(self, endpoint_info:
tuple[str, str], json: dict[str, A
method, endpoint = endpoint_info
full_endpoint = f"api/{endpoint}"
- url = self._endpoint_url(full_endpoint)
+ url = await self._a_endpoint_url(full_endpoint)
aad_headers = await self._a_get_aad_headers()
headers = {**self.user_agent_header, **aad_headers}
Review Comment:
`headers` is built as `{**self.user_agent_header, **aad_headers}` but later
the request call re-merges `self.user_agent_header` again. Consider making the
header precedence explicit in one place (either keep the merge here and pass
`headers=headers`, or build `headers` in the final form once) to avoid
redundant dict copies and reduce ambiguity about which values win.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]