kaxil opened a new pull request, #68735:
URL: https://github.com/apache/airflow/pull/68735
On Airflow < 3.2, `common.compat`'s `get_hook_lineage_collector()` polyfills
`add_extra` by
monkeypatching the `collected_assets` and `has_collected` **class**
properties of the lineage
collector. `_add_extra_polyfill` was not idempotent: the trigger gate
(`_lacks_add_extra_method`)
checks **instance-level** `_extra`, so a fresh collector instance of an
already-patched class
re-enters the polyfill and re-wraps the class property, capturing the
previously installed wrapper
as the "original". Each call stacks another layer onto the getter chain, so
once enough collectors
are created the next `collected_assets` / `has_collected` access exceeds the
recursion limit and
raises `RecursionError`.
## Where it shows up
In production the global collector is a process singleton (the accessor is
`@cache`-decorated) and
the polyfill only runs on Airflow < 3.2, so it is applied exactly once -- no
impact. The failure
surfaces on the **Compat 3.1.x provider test matrix**, where a fresh
`HookLineageCollector` is built
per test: the class accumulates one wrapper per test until the lineage
edge-case tests blow the
recursion limit. This is a test-suite/robustness fix, not a production
incident.
## Fix
Make `_add_extra_polyfill` idempotent -- patch each collector class exactly
once:
- Skip re-patching when the class's own `__dict__` already carries a
`_compat_extra_polyfilled`
marker (checked on the exact class, not via inheritance, so a subclass
that overrides these
properties is still patched in its own right).
- Set the marker only after all three patches (`collected_assets`,
`has_collected`, `add_extra`)
are installed, so a failure mid-patch leaves the class unmarked and
retryable rather than
half-patched.
- Initialize the instance `_extra` / `_extra_counts` only when missing, so
re-applying never clears
a collector that already accumulated extras.
Behavior is otherwise unchanged: every collector still gets `add_extra` and
the extended
`collected_assets` / `has_collected`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]