patniharshit opened a new issue, #44738:
URL: https://github.com/apache/airflow/issues/44738

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.6.3
   
   ### What happened?
   
   We inadvertently committed a change which introduced some airflow task names 
that were longer than 250 characters. This caused `validate_key` check ([code: 
search for 
validate_key](https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/utils/helpers.html))
 to fail in the Airflow scheduler that required all task names to be less than 
250 characters. 
   
   A corrupted version of the serialised DAG was cached in the airflow DB. The 
scheduler was stuck in a loop of continuously hitting this exception even when 
the long task names were reverted and scheduler was restarted. This was a very 
confusing state for us to be in as it didn't make sense my reverting change and 
restarting wouldn't fix it.
   
   The issue was eventually resolved by running cmd cmd `airflow dags 
reserialize -v -S <path-to-dags>`  to force reserialization outside of 
scheduler process. 
   
https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#reserialize
   
    
   
   ### What you think should happen instead?
   
   There are two main problems here:
   1. Scheduler should never get caught in a crash loop - problems associated 
with bad serialization should go away when bad changes are reverted and 
scheduler restarted.
   2. The issue stems from a bug in Airflow's task validation process. 
Specifically, the validation check for task integrity was not performed before 
the task was serialized and persisted into the DagBag. As a result, an invalid 
task was written to the DagBag, creating a corrupted state.
   
   Minor: 
   1. Improve error messages, the error message from `validate_keys` that we 
were seeing was `airflow.exceptions.AirflowException: The key has to be less 
than 250 characters`. It should also print the name of offending key for easy 
debugging for user.
   
   ### How to reproduce
   
   * Create task with task_id greator than 250 characters.
   * Let scheduler pick this up
   * Notice crash in scheduler logs that shows validation errors from 
`validate_keys` function
   * Revert original change to restore task_id to less than 250 characters
   * Scheduler still keeps crashing with same errors
   * Run `airflow dags reserialize -v -S <path-to-dags>` to fix this 
   
   ### Operating System
   
   Debian GNU/Linux 11.7 (Bullseye)
   
   ### Versions of Apache Airflow Providers
   
   ```
   >>> pip freeze  | g apache-airflow-providers
   apache-airflow-providers-celery==3.5.1
   apache-airflow-providers-common-sql==1.10.0
   apache-airflow-providers-elasticsearch==4.4.0
   apache-airflow-providers-ftp==3.7.0
   apache-airflow-providers-http==4.8.0
   apache-airflow-providers-imap==3.5.0
   apache-airflow-providers-sftp==4.8.1
   apache-airflow-providers-sqlite==3.7.0
   apache-airflow-providers-ssh==3.10.0
   ```
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to