Re: [PR] Handle SystemExit raised in the task. [airflow]

via GitHub Wed, 24 Jan 2024 12:31:57 -0800


potiuk commented on code in PR #36986:
URL: https://github.com/apache/airflow/pull/36986#discussion_r1465501481



##########
airflow/models/taskinstance.py:
##########
@@ -2393,6 +2393,12 @@ def _run_raw_task(
                 self.handle_failure(e, test_mode, context, session=session)
                 session.commit()
                 raise
+            except SystemExit as code:
+                self.handle_failure(
+                    f"Task failed due to SystemExit({code})", test_mode, 
context, session=session
+                )
+                session.commit()
+                raise

Review Comment:
   It's not the matter if we use it or not. But our users can choose to do so. 
Airflow task is very special in the sense that it has to be rather resilient to 
anything our users (often in a very inventive way) do in their tasks. If 
someone chooses to do `sys.exit(0)` in their PythonOperator's callable to 
finish their task quickly, they are perfectly OK to do so and we cannot do 
anything about it. 
   
   And if we handle it properly in this try/finally this is also perfectly 
valid from Airflow point of view. There is a reason why `sys.exit()` - works 
the way it works, by raising SystemExit exception. This is done specifically to 
have all the try/finally clauses to be able to finalize whatever they were 
supposed to finalize if the code was called from "somewhere else" (this is o 
top of the fact that we are also forking when executing tasks but this is a 
differnet story for another day). 
   
   Raising AirflowException is neither forced not even required, nor 
AirflowException is handled in any other way than any other exception. It's 
merely a base class that all our "specialized" exceptions derive from (for 
example AirflowSkipException when raised causing the task to be skipped or 
AirflowFailException that causes task to not retry. But there is nowhere 
expectation or requirement that user's code must or even should try 
AirflowException in case oi error. It can throw any kind of exception or it can 
call sys.exit(1) . Similarly there is no strong requirement that `execute()` 
method  returns "as usual". Yes, if you want to return some value (and then 
store it as xcom automatically), surely the best way is to just return it from 
the `execute()` method (BTW. you can also manually save the xcom value manually 
and then sys.exit(0) so this is also not the only way). But if you do not want 
to do it, doiing sys.exit(0) is perfectly good way for the users to finish 
their task and s
 ignal success. 
   
   BTW. Side commen. There is this great `core.py` podcast by Łukasz Langa and 
Pablo Galindo and the last episode (whicih I just started listening) called 
"The Exceptional Episode" is all about that - how exceptions are handled in 
Python. On spotify here https://open.spotify.com/episode/76eZ8LVhzQLHrS2GRhOjG7.



##########
airflow/models/taskinstance.py:
##########
@@ -2393,6 +2393,12 @@ def _run_raw_task(
                 self.handle_failure(e, test_mode, context, session=session)
                 session.commit()
                 raise
+            except SystemExit as code:
+                self.handle_failure(
+                    f"Task failed due to SystemExit({code})", test_mode, 
context, session=session
+                )
+                session.commit()
+                raise

Review Comment:
   It's not the matter if we use it or not. But our users can choose to do so. 
Airflow task is very special in the sense that it has to be rather resilient to 
anything our users (often in a very inventive way) do in their tasks. If 
someone chooses to do `sys.system.exit(0)` in their PythonOperator's callable 
to finish their task quickly, they are perfectly OK to do so and we cannot do 
anything about it. 
   
   And if we handle it properly in this try/finally this is also perfectly 
valid from Airflow point of view. There is a reason why `sys.exit()` - works 
the way it works, by raising SystemExit exception. This is done specifically to 
have all the try/finally clauses to be able to finalize whatever they were 
supposed to finalize if the code was called from "somewhere else" (this is o 
top of the fact that we are also forking when executing tasks but this is a 
differnet story for another day). 
   
   Raising AirflowException is neither forced not even required, nor 
AirflowException is handled in any other way than any other exception. It's 
merely a base class that all our "specialized" exceptions derive from (for 
example AirflowSkipException when raised causing the task to be skipped or 
AirflowFailException that causes task to not retry. But there is nowhere 
expectation or requirement that user's code must or even should try 
AirflowException in case oi error. It can throw any kind of exception or it can 
call sys.exit(1) . Similarly there is no strong requirement that `execute()` 
method  returns "as usual". Yes, if you want to return some value (and then 
store it as xcom automatically), surely the best way is to just return it from 
the `execute()` method (BTW. you can also manually save the xcom value manually 
and then sys.exit(0) so this is also not the only way). But if you do not want 
to do it, doiing sys.exit(0) is perfectly good way for the users to finish 
their task and s
 ignal success. 
   
   BTW. Side commen. There is this great `core.py` podcast by Łukasz Langa and 
Pablo Galindo and the last episode (whicih I just started listening) called 
"The Exceptional Episode" is all about that - how exceptions are handled in 
Python. On spotify here https://open.spotify.com/episode/76eZ8LVhzQLHrS2GRhOjG7.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Handle SystemExit raised in the task. [airflow]

Reply via email to