johnhoran commented on code in PR #61778:
URL: https://github.com/apache/airflow/pull/61778#discussion_r2803250607


##########
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/triggers/pod.py:
##########
@@ -183,7 +184,7 @@ async def run(self) -> AsyncIterator[TriggerEvent]:
                 event = await self._wait_for_container_completion()
             yield event
             return
-        except PodLaunchTimeoutException as e:
+        except (PodLaunchTimeoutException, PodLaunchFailedException) as e:

Review Comment:
   No that shouldn't happen.  In that scenario what would happen is the 
triggerer would exit, because of `detect_pod_terminate_early_issues` it would 
happen on the first time in saw the image pull failure, and before the 
`startup_timeout` expires, with a timeout state.  The timeout state basically 
then accounts for the gap in time between the triggerer exiting and the 
operator starting back up and does a final check to see if the pod is in a 
running or terminal state.  In this scenario it wouldn't be, so the task fails. 
 
   
   I think there is a case for renaming the timeout state.  Basically the 
triggerer can return one of `error`, `fatal`, `timeout` and `success`.  Timeout 
is essentially for situations where the pod didn't start up in time, but if it 
has started when it gets to the operator, I think its better to let it run 
rather than fail the task and retry.  So if I could think of a pithy name for 
"fatal unless recovered" then I'd rename it to that.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to