sergiobuj opened a new issue, #67283:
URL: https://github.com/apache/airflow/issues/67283
### Under which category would you file this issue?
Providers
### Apache Airflow version
3.1.8+astro.1
### What happened and how to reproduce it?
The OpenLineage listener plugin uses a `ProcessPoolExecutor` to emit lineage
events asynchronously from the scheduler. When a child process in the pool
terminates abruptly, Python's `concurrent.futures` marks the pool as
permanently broken. After this point, **every subsequent OpenLineage event
fails** with `BrokenProcessPool` until the scheduler process is restarted.
This causes extended periods of missing lineage data with no self-recovery.
The warning is logged but the pool is never recreated, so the problem persists
indefinitely.
### Scheduler logs showing the error
```
2026-05-21T08:01:02.690533Z [warning] OpenLineage received exception in
method on_dag_run_success
[airflow.providers.openlineage.plugins.listener] loc=listener.py:918
Traceback (most recent call last):
File ".../airflow/providers/openlineage/plugins/listener.py", line 896,
in on_dag_run_success
self.submit_callable(
File ".../airflow/providers/openlineage/plugins/listener.py", line 974,
in submit_callable
fut = self.executor.submit(callable, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/process.py", line 805,
in submit
raise BrokenProcessPool(self._broken)
concurrent.futures.process.BrokenProcessPool: A child process terminated
abruptly, the process pool is not usable anymore
```
### What you think should happen instead?
The OpenLineage integration could be self-healing and prevent extended
outages in lineage reporting.
When a `BrokenProcessPool` exception is raised in `submit_callable`, the
listener could detect the broken pool state, create a new `ProcessPoolExecutor`
instance, and retry the submission.
### Operating System
Debian GNU/Linux 12 (bookworm) — Linux 5.15.0-1110-azure (containerized on
Azure)
### Deployment
Astronomer
### Apache Airflow Provider(s)
openlineage
### Versions of Apache Airflow Providers
I think this are the relevant ones from freeze:
```
openlineage-integration-common==1.41.0
openlineage-python==1.45.0
openlineage_sql==1.41.0
```
### Official Helm Chart version
Not Applicable
### Kubernetes Version
Not Applicable
### Helm Chart configuration
_No response_
### Docker Image customizations
```Dockerfile
FROM astrocrpublic.azurecr.io/runtime:3.1-14
ENV AIRFLOW__CORE__MAX_MAP_LENGTH=3072
ENV AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_CLASS_IN_EXTRA=true
ENV AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_PATH_IN_EXTRA=true
ENV JAVA_HOME=/usr/lib/jvm/default-java
ENV AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=50
ENV AIRFLOW__CORE__ALLOWED_DESERIALIZATION_CLASSES="[redacted]"
# .jar copies
# [redacted COPY]
# apt installs of ODBC Driver
# [readacted apt-get]
USER astro
```
### Anything else?
**Environment details:**
- Python 3.12.13
- Running on Astronomer Runtime (Medium: Scheduler (1 vCPU, 2GiB RAM), DAG
Processor (1 vCPU, 2GiB RAM))
**Impact:** Downstream consumers of OpenLineage events see extended periods
of zero events. Since the only recovery is a scheduler restart (Git deploy on
Astro), and the scheduler otherwise functions normally (task execution is
unaffected).
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]