rawwar opened a new issue, #43080:
URL: https://github.com/apache/airflow/issues/43080

   ### Description
   
   When there are SSL handshake issues(And usually intermittent), All 
deferrable Databricks operators fail in deferrable mode without retrying as 
`aiohttp.client_exceptions.ClientConnectorError` is not a retryable error. 
   
   As of now, we only consider `aiohttp.ClientResponseError` to be retryable.  
I would like to make `aiohttp.client_exceptions.ClientConnectorError` error to 
be retryable.
   
   ### Use case/motivation
   
   When SSL handshake takes longer(usually 60 seconds by default), it fails 
with the below error:
   
   ```
   2024-10-16, 09:27:20 UTC] {warnings.py:109} WARNING - 
/usr/local/lib/python3.10/site-packages/airflow/models/baseoperator.py:1214: 
AirflowProviderDeprecationWarning: Call to deprecated class 
DatabricksRunNowDeferrableOperator. (`DatabricksRunNowDeferrableOperator` has 
been deprecated. Please use 
`airflow.providers.databricks.operators.DatabricksRunNowOperator` with 
`deferrable=True` instead.)
     result = cls.__new__(cls)
   [2024-10-16, 09:27:20 UTC] {taskinstance.py:1598} ERROR - Trigger failed:
   Traceback (most recent call last):
     File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 
1098, in _wrap_create_connection
       return await self._loop.create_connection(*args, **kwargs, sock=sock)
     File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1103, in 
create_connection
       transport, protocol = await self._create_connection_transport(
     File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1133, in 
_create_connection_transport
       await waiter
   ConnectionAbortedError: SSL handshake is taking longer than 60.0 seconds: 
aborting the connection
   The above exception was the direct cause of the following exception:
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.10/site-packages/airflow/jobs/triggerer_job_runner.py", 
line 529, in cleanup_finished_triggers
       result = details["task"].result()
     File 
"/usr/local/lib/python3.10/site-packages/airflow/jobs/triggerer_job_runner.py", 
line 607, in run_trigger
       async for event in trigger.run():
     File 
"/usr/local/lib/python3.10/site-packages/airflow/providers/databricks/triggers/databricks.py",
 line 86, in run
       run_state = await self.hook.a_get_run_state(self.run_id)
     File 
"/usr/local/lib/python3.10/site-packages/airflow/providers/databricks/hooks/databricks.py",
 line 417, in a_get_run_state
       response = await self._a_do_api_call(GET_RUN_ENDPOINT, json)
     File 
"/usr/local/lib/python3.10/site-packages/airflow/providers/databricks/hooks/databricks_base.py",
 line 651, in _a_do_api_call
       async for attempt in self._a_get_retry_object():
     File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 
71, in __anext__
       do = self.iter(retry_state=self._retry_state)
     File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 
314, in iter
       return fut.result()
     File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in 
__get_result
       raise self._exception
     File 
"/usr/local/lib/python3.10/site-packages/airflow/providers/databricks/hooks/databricks_base.py",
 line 653, in _a_do_api_call
       async with request_func(
     File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 
1359, in __aenter__
       self._resp: _RetType = await self._coro
     File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 
663, in _request
       conn = await self._connector.connect(
     File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 
563, in connect
       proto = await self._create_connection(req, traces, timeout)
     File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 
1032, in _create_connection
       _, proto = await self._create_direct_connection(req, traces, timeout)
     File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 
1366, in _create_direct_connection
       raise last_exc
     File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 
1335, in _create_direct_connection
       transp, proto = await self._wrap_create_connection(
     File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 
1106, in _wrap_create_connection
       raise client_error(req.connection_key, exc) from exc
       aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 
adb-******.***REDACTED****.azuredatabricks.net:443 ssl:default [None]
   ```
   
   And, that's intermittent. Making this retryable will help
   
   ### Related issues
   
   NA
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to