patniharshit opened a new issue, #44738: URL: https://github.com/apache/airflow/issues/44738
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.6.3 ### What happened? We inadvertently committed a change which introduced some airflow task names that were longer than 250 characters. This caused `validate_key` check ([code: search for validate_key](https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/utils/helpers.html)) to fail in the Airflow scheduler that required all task names to be less than 250 characters. A corrupted version of the serialised DAG was cached in the airflow DB. The scheduler was stuck in a loop of continuously hitting this exception even when the long task names were reverted and scheduler was restarted. This was a very confusing state for us to be in as it didn't make sense my reverting change and restarting wouldn't fix it. The issue was eventually resolved by running cmd cmd `airflow dags reserialize -v -S <path-to-dags>` to force reserialization outside of scheduler process. https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#reserialize ### What you think should happen instead? There are two main problems here: 1. Scheduler should never get caught in a crash loop - problems associated with bad serialization should go away when bad changes are reverted and scheduler restarted. 2. The issue stems from a bug in Airflow's task validation process. Specifically, the validation check for task integrity was not performed before the task was serialized and persisted into the DagBag. As a result, an invalid task was written to the DagBag, creating a corrupted state. Minor: 1. Improve error messages, the error message from `validate_keys` that we were seeing was `airflow.exceptions.AirflowException: The key has to be less than 250 characters`. It should also print the name of offending key for easy debugging for user. ### How to reproduce * Create task with task_id greator than 250 characters. * Let scheduler pick this up * Notice crash in scheduler logs that shows validation errors from `validate_keys` function * Revert original change to restore task_id to less than 250 characters * Scheduler still keeps crashing with same errors * Run `airflow dags reserialize -v -S <path-to-dags>` to fix this ### Operating System Debian GNU/Linux 11.7 (Bullseye) ### Versions of Apache Airflow Providers ``` >>> pip freeze | g apache-airflow-providers apache-airflow-providers-celery==3.5.1 apache-airflow-providers-common-sql==1.10.0 apache-airflow-providers-elasticsearch==4.4.0 apache-airflow-providers-ftp==3.7.0 apache-airflow-providers-http==4.8.0 apache-airflow-providers-imap==3.5.0 apache-airflow-providers-sftp==4.8.1 apache-airflow-providers-sqlite==3.7.0 apache-airflow-providers-ssh==3.10.0 ``` ### Deployment Virtualenv installation ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
