sergiobuj opened a new issue, #67283:
URL: https://github.com/apache/airflow/issues/67283

   ### Under which category would you file this issue?
   
   Providers
   
   ### Apache Airflow version
   
   3.1.8+astro.1
   
   ### What happened and how to reproduce it?
   
   The OpenLineage listener plugin uses a `ProcessPoolExecutor` to emit lineage 
events asynchronously from the scheduler. When a child process in the pool 
terminates abruptly, Python's `concurrent.futures` marks the pool as 
permanently broken. After this point, **every subsequent OpenLineage event 
fails** with `BrokenProcessPool` until the scheduler process is restarted.
   
   This causes extended periods of missing lineage data with no self-recovery. 
The warning is logged but the pool is never recreated, so the problem persists 
indefinitely.
   
   
   ### Scheduler logs showing the error
   
   ```
    2026-05-21T08:01:02.690533Z [warning] OpenLineage received exception in 
method on_dag_run_success
        [airflow.providers.openlineage.plugins.listener] loc=listener.py:918
   
    Traceback (most recent call last):
      File ".../airflow/providers/openlineage/plugins/listener.py", line 896, 
in on_dag_run_success
        self.submit_callable(
      File ".../airflow/providers/openlineage/plugins/listener.py", line 974, 
in submit_callable
        fut = self.executor.submit(callable, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/concurrent/futures/process.py", line 805, 
in submit
        raise BrokenProcessPool(self._broken)
    concurrent.futures.process.BrokenProcessPool: A child process terminated 
abruptly, the process pool is not usable anymore
   ```
   
   ### What you think should happen instead?
   
   The OpenLineage integration could be self-healing and prevent extended 
outages in lineage reporting.
   
   When a `BrokenProcessPool` exception is raised in `submit_callable`, the 
listener could detect the broken pool state, create a new `ProcessPoolExecutor` 
instance, and retry the submission.
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm) — Linux 5.15.0-1110-azure (containerized on 
Azure)
   
   ### Deployment
   
   Astronomer
   
   ### Apache Airflow Provider(s)
   
   openlineage
   
   ### Versions of Apache Airflow Providers
   
   I think this are the relevant ones from freeze:
   ```
   openlineage-integration-common==1.41.0
   openlineage-python==1.45.0
   openlineage_sql==1.41.0
   ```
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   Not Applicable
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   ```Dockerfile
   FROM astrocrpublic.azurecr.io/runtime:3.1-14
   
   ENV AIRFLOW__CORE__MAX_MAP_LENGTH=3072
   ENV AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_CLASS_IN_EXTRA=true
   ENV AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_PATH_IN_EXTRA=true
   ENV JAVA_HOME=/usr/lib/jvm/default-java
   ENV AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=50
   ENV AIRFLOW__CORE__ALLOWED_DESERIALIZATION_CLASSES="[redacted]"
   
   # .jar copies
   # [redacted COPY]
   # apt installs of ODBC Driver
   # [readacted apt-get]
   
   USER astro
   ```
   
   ### Anything else?
   
   **Environment details:**
   - Python 3.12.13
   - Running on Astronomer Runtime (Medium: Scheduler (1 vCPU, 2GiB RAM), DAG 
Processor (1 vCPU, 2GiB RAM))
   
   **Impact:** Downstream consumers of OpenLineage events see extended periods 
of zero events. Since the only recovery is a scheduler restart (Git deploy on 
Astro), and the scheduler otherwise functions normally (task execution is 
unaffected).
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to