krisfur commented on code in PR #34627:
URL: https://github.com/apache/airflow/pull/34627#discussion_r1338860691


##########
airflow/providers/microsoft/azure/operators/container_instances.py:
##########
@@ -319,6 +319,9 @@ def _monitor_logging(self, resource_group: str, name: str) 
-> int:
                 if state in ["Running", "Terminated", "Succeeded"]:
                     try:
                         logs = self._ci_hook.get_logs(resource_group, name)
+                        if logs is None:
+                            self.log.exception("Container log is broken, 
marking as failed.")

Review Comment:
   I agree, it is possible that returning None logs is just a symptom of a more 
complex failure of getting something from the instance view or beyond? I tried 
debugging those bits but fell short of finding anything useful. 
   
   The solution in this PR stops the results of this issue from causing an 
infinite loop of restarts (which incurs costs if it happens in prod), but I 
agree it probably doesn't resolve the underlying logic problem, which I fail to 
investigate fully with my limited skills of a data engineer and limited 
knowledge of the airflow code.
   
   Another solution (which worked) that I tested is checking if last_state == 
Running and state == Pending which also catches the problem, but this state 
change only happens after those errors from None logs are thrown, so stopping 
it at the error point seems better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to