anmolxlight opened a new pull request, #67400:
URL: https://github.com/apache/airflow/pull/67400

   ## Summary
   
   The OpenLineage listener uses a `ProcessPoolExecutor` to asynchronously emit 
lineage events from the scheduler. When a child process in the pool terminates 
abruptly, Python's `concurrent.futures` marks the pool as permanently broken. 
After that point, every subsequent OpenLineage event fails with 
`BrokenProcessPool` and lineage data stops flowing indefinitely — only a 
scheduler restart recovers it.
   
   ## Fix
   
   `submit_callable` now catches `BrokenProcessPool`, shuts down the broken 
executor, creates a fresh one, and retries the submission. This makes the 
listener self-healing: lineage reporting recovers automatically without a 
scheduler restart.
   
   ### Changes
   
   - `listener.py`: catch `BrokenProcessPool` in `submit_callable`, recreate 
the executor, and retry
   - `test_listener.py`: add 
`test_submit_callable_recreates_executor_on_broken_pool` that verifies the 
broken pool is shut down, a new executor is created, and the submission is 
retried
   
   ## Test Plan
   
   - [x] New unit test passes
   - [x] All existing OpenLineage listener unit tests pass (26 passed, 35 
skipped, 0 failed)
   
   Closes #67283
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to