I-am-Uchenna commented on code in PR #67901:
URL: https://github.com/apache/airflow/pull/67901#discussion_r3342579478


##########
providers/openlineage/src/airflow/providers/openlineage/plugins/listener.py:
##########
@@ -834,7 +836,17 @@ def _terminate_with_wait(self, process: psutil.Process):
 
     def _fork_execute(self, callable, callable_name: str):
         self.log.debug("Will fork to execute OpenLineage process.")
-        pid = os.fork()
+        with warnings.catch_warnings():
+            # On Python 3.12+, os.fork() in a multi-threaded process emits a
+            # DeprecationWarning.  The fork here is intentional and the child
+            # takes precautions (ORM reconfiguration, os._exit) so the warning
+            # is safe to suppress.
+            warnings.filterwarnings(
+                "ignore",
+                message=".*use of fork\\(\\) may lead to deadlocks in the 
child",
+                category=DeprecationWarning,
+            )

Review Comment:
   You're right, suppressing the warning is not the proper fix. I'll rework 
this.
   
   Looking at the code, `_fork_execute` uses `os.fork()` because the inner 
callables (closures capturing ORM models, extractors, etc.) are not picklable, 
which rules out `spawn`/`forkserver`. But on Python 3.12+, forking in a 
multi-threaded process risks deadlocks when a thread holds a lock at fork time.
   
   The approach I'm considering: replace `_fork_execute` with a 
`threading.Thread`-based implementation. A daemon thread with 
`join(timeout=...)` gives the same timeout protection as the current fork, 
threads share memory so no pickling is needed, and there is no `os.fork()` call 
to trigger the warning.
   
   The trade-off is weaker isolation compared to a subprocess (a crash in the 
extractor could affect the parent), but the existing code already catches 
exceptions, and the real risk of `os.fork()` in a multi-threaded process 
(deadlock from copied lock state) is arguably worse.
   
   I'll push an updated commit shortly. Does this direction make sense, or 
would you prefer a different approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to