zachliu commented on PR #68268:
URL: https://github.com/apache/airflow/pull/68268#issuecomment-4671726163

   +1 on this! we just had a production incident that shows why blocking the 
save matters.
   
   A malformed Variable (a single trailing comma in JSON) got saved through 
this form because the `Invalid JSON` warning is easy to miss and `Save` stays 
enabled. The variable drives parse-time task generation in many of our DAGs, so 
on the next parse those DAGs serialized with zero tasks, and every scheduled 
run silently `succeeded` in ~0.1s with all task instances marked removed. Blast 
radius was far larger than the one variable suggested, and root-causing it took 
a while because nothing actually errored.
   
   Guarding `Save` behind JSON validation, as in this PR, would have stopped it 
at the source.
   
   One tangential note in case it's useful: even after we fixed the variable, 
tasks defined inside a `TaskGroup` stayed `removed`, while bare top-level tasks 
recovered on reparse. We only got them back by forcing a new serialized DAG 
version (a structural change) and clearing the runs. That smells like a 
possible `TaskGroup` reconciliation bug in Airflow 3.x, separate from this UI 
fix, happy to file a dedicated issue if it'd help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to