jason810496 opened a new issue, #66413: URL: https://github.com/apache/airflow/issues/66413
Follow-up from PR #66002 ([review thread](https://github.com/apache/airflow/pull/66002#issuecomment-4377932354)). ### Background `airflow-core/tests/unit/serialization/test_encoders.py` has a `TestTriggerHashConsistency` suite that asserts the DAG-side hash (`BaseEventTrigger.hash(classpath, encode_trigger(trigger)["kwargs"])`) matches the DB-side hash after a real DB round-trip through `encrypt_kwargs` / `_decrypt_kwargs`. This mirrors the comparison done in `AssetModelOperation.add_asset_trigger_references` (`airflow-core/src/airflow/dag_processing/collection.py`) — if the two hashes diverge, the scheduler keeps recreating trigger rows on every heartbeat. ### Gap 1 — symmetric-but-wrong serialization is not caught The current hash assertion only catches *asymmetric* breakage between the two sides. If `encode_trigger` ever produced kwargs that contained non-JSON-safe Python objects but both sides happened to serialize them identically (e.g. via `repr()` or pickle-flavoured fallback), the hash check would still pass while violating the actual contract — that encoded kwargs are fully primitive / JSON-serializable, which is what `Trigger.encrypt_kwargs` (`serde.serialize → json.dumps → fernet`) requires. ### Gap 2 — `_TRIGGER_PARAMS` coverage is too narrow `_TRIGGER_PARAMS` today only exercises a handful of shapes: - `FileDeleteTrigger(filepath, poke_interval)` — primitive-only - `_CallableKwargsTrigger(topics=())` — empty tuple - `_CallableKwargsTrigger(topics=("fizz_buzz",), poll_timeout=1.0)` — single-element tuple - `_CallableKwargsTrigger(topics=[...], apply_function=..., apply_function_args=[...], apply_function_kwargs={...}, poll_timeout=3)` — mixed list/dict/string This leaves several realistic trigger-construction shapes unverified. We should expand fixtures to cover at least: - `datetime` / `timedelta` kwargs (the canonical `DateTimeTrigger` / `TimeDeltaTrigger` shapes). - Nested containers (`list[dict]`, `dict[str, list]`, `tuple[tuple, ...]`). - `set` / `frozenset` kwargs (these go through their own serde path). - `None` and falsy primitive values (`0`, `""`, `False`). - Enum values and dataclass / attrs instances if any trigger constructor accepts them. - Path-like (`pathlib.Path`) values. The intent is that any trigger shape a real provider could plausibly construct should be in `_TRIGGER_PARAMS`, so the JSON-guard from Gap 1 is exercised broadly. ### Proposed change (test-only) In `airflow-core/tests/unit/serialization/test_encoders.py`: 1. Add a small `_assert_fully_serialized(encoded_kwargs)` helper that runs `json.dumps(encoded_kwargs)` and asserts it succeeds (and, if useful, that the structure stays under the existing wrapper-depth invariant — the `<` guard discussed in the review thread). 2. Call it from each parametrized test in `TestEncodeTrigger` and `TestTriggerHashConsistency` driven by `_TRIGGER_PARAMS`, so every fixture is checked. 3. Extend `_TRIGGER_PARAMS` with additional fixtures covering the shapes listed in Gap 2. 4. No production-code changes are expected — this is a test-strengthening change only. ### Acceptance criteria - [ ] All `_TRIGGER_PARAMS` cases pass through `json.dumps` on the encoded kwargs without raising. - [ ] The new guard fires (i.e. fails) if a regression reintroduces non-JSON-safe values into `encode_trigger` output. - [ ] `_TRIGGER_PARAMS` is broadened to cover datetime/timedelta, nested containers, set/frozenset, falsy primitives, and any other realistic trigger-construction shapes. - [ ] No change to `airflow-core/src/`; the PR touches only `airflow-core/tests/unit/serialization/test_encoders.py` (and possibly a tiny shared helper if reused elsewhere). ### Out of scope - Any change to `encode_trigger`, `BaseEventTrigger.hash`, or `_decode_start_trigger_args` semantics — those are covered by PR #66002. --- Drafted-by: Claude Code (Opus 4.7); reviewed by @jason810496 before posting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
