jason810496 opened a new issue, #66413:
URL: https://github.com/apache/airflow/issues/66413

   Follow-up from PR #66002 ([review 
thread](https://github.com/apache/airflow/pull/66002#issuecomment-4377932354)).
   
   ### Background
   
   `airflow-core/tests/unit/serialization/test_encoders.py` has a 
`TestTriggerHashConsistency` suite that asserts the DAG-side hash 
(`BaseEventTrigger.hash(classpath, encode_trigger(trigger)["kwargs"])`) matches 
the DB-side hash after a real DB round-trip through `encrypt_kwargs` / 
`_decrypt_kwargs`. This mirrors the comparison done in 
`AssetModelOperation.add_asset_trigger_references` 
(`airflow-core/src/airflow/dag_processing/collection.py`) — if the two hashes 
diverge, the scheduler keeps recreating trigger rows on every heartbeat.
   
   ### Gap 1 — symmetric-but-wrong serialization is not caught
   
   The current hash assertion only catches *asymmetric* breakage between the 
two sides. If `encode_trigger` ever produced kwargs that contained 
non-JSON-safe Python objects but both sides happened to serialize them 
identically (e.g. via `repr()` or pickle-flavoured fallback), the hash check 
would still pass while violating the actual contract — that encoded kwargs are 
fully primitive / JSON-serializable, which is what `Trigger.encrypt_kwargs` 
(`serde.serialize → json.dumps → fernet`) requires.
   
   ### Gap 2 — `_TRIGGER_PARAMS` coverage is too narrow
   
   `_TRIGGER_PARAMS` today only exercises a handful of shapes:
   
   - `FileDeleteTrigger(filepath, poke_interval)` — primitive-only
   - `_CallableKwargsTrigger(topics=())` — empty tuple
   - `_CallableKwargsTrigger(topics=("fizz_buzz",), poll_timeout=1.0)` — 
single-element tuple
   - `_CallableKwargsTrigger(topics=[...], apply_function=..., 
apply_function_args=[...], apply_function_kwargs={...}, poll_timeout=3)` — 
mixed list/dict/string
   
   This leaves several realistic trigger-construction shapes unverified. We 
should expand fixtures to cover at least:
   
   - `datetime` / `timedelta` kwargs (the canonical `DateTimeTrigger` / 
`TimeDeltaTrigger` shapes).
   - Nested containers (`list[dict]`, `dict[str, list]`, `tuple[tuple, ...]`).
   - `set` / `frozenset` kwargs (these go through their own serde path).
   - `None` and falsy primitive values (`0`, `""`, `False`).
   - Enum values and dataclass / attrs instances if any trigger constructor 
accepts them.
   - Path-like (`pathlib.Path`) values.
   
   The intent is that any trigger shape a real provider could plausibly 
construct should be in `_TRIGGER_PARAMS`, so the JSON-guard from Gap 1 is 
exercised broadly.
   
   ### Proposed change (test-only)
   
   In `airflow-core/tests/unit/serialization/test_encoders.py`:
   
   1. Add a small `_assert_fully_serialized(encoded_kwargs)` helper that runs 
`json.dumps(encoded_kwargs)` and asserts it succeeds (and, if useful, that the 
structure stays under the existing wrapper-depth invariant — the `<` guard 
discussed in the review thread).
   2. Call it from each parametrized test in `TestEncodeTrigger` and 
`TestTriggerHashConsistency` driven by `_TRIGGER_PARAMS`, so every fixture is 
checked.
   3. Extend `_TRIGGER_PARAMS` with additional fixtures covering the shapes 
listed in Gap 2.
   4. No production-code changes are expected — this is a test-strengthening 
change only.
   
   ### Acceptance criteria
   
   - [ ] All `_TRIGGER_PARAMS` cases pass through `json.dumps` on the encoded 
kwargs without raising.
   - [ ] The new guard fires (i.e. fails) if a regression reintroduces 
non-JSON-safe values into `encode_trigger` output.
   - [ ] `_TRIGGER_PARAMS` is broadened to cover datetime/timedelta, nested 
containers, set/frozenset, falsy primitives, and any other realistic 
trigger-construction shapes.
   - [ ] No change to `airflow-core/src/`; the PR touches only 
`airflow-core/tests/unit/serialization/test_encoders.py` (and possibly a tiny 
shared helper if reused elsewhere).
   
   ### Out of scope
   
   - Any change to `encode_trigger`, `BaseEventTrigger.hash`, or 
`_decode_start_trigger_args` semantics — those are covered by PR #66002.
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @jason810496 before posting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to