ersalil opened a new issue, #68018:
URL: https://github.com/apache/airflow/issues/68018

   ### Under which category would you file this issue?
   
   Airflow Core
   
   ### Apache Airflow version
   
   3.x (confirmed on Airflow v3.1.8)
   
   ### What happened and how to reproduce it?
   
   When OTel metrics are enabled and a DAG contains a TaskGroup or Task ID
   with a non-ASCII character (e.g. `ç`, `ã`, `é`), the scheduler enters
   CrashLoopBackOff on every restart instead of skipping the invalid metric
   and continuing normally.
   
   The crash happens in the scheduler's critical section when it tries to
   queue task instances and emit a state-change metric for them:
   ```
   File "airflow/jobs/scheduler_job_runner.py", in 
_executable_task_instances_to_queued
   ti.emit_state_change_metric(TaskInstanceState.QUEUED)
   File "airflow/models/taskinstance.py", in emit_state_change_metric
   Stats.timing(f"dag.{self.dag_id}.{self.task_id}.{metric_name}", timing)
   File "airflow_shared/observability/metrics/otel_logger.py", in timing
   if self.metrics_validator.test(stat) and name_is_otel_safe(self.prefix, 
stat):
   File "airflow_shared/observability/metrics/otel_logger.py", in 
name_is_otel_safe
   return bool(stat_name_otel_handler(prefix, name, 
max_length=OTEL_NAME_MAX_LENGTH))
   File "airflow_shared/observability/metrics/validators.py", in 
stat_name_otel_handler
   stat_name_default_handler(proposed_stat_name, ...)
   File "airflow_shared/observability/metrics/validators.py", in 
stat_name_default_handler
   raise InvalidStatsNameException(...)
   
   airflow.exceptions.InvalidStatsNameException: The stat name
   
(airflow.dag.sql_server_el.kmt_lista_preços_tasks.run_kmt_lista_precos_pipeline.scheduled_duration)
   has to be composed of ASCII alphabets, numbers, or the underscore, dot, or 
dash characters.
   ```
   
   ### How to reproduce
   
   1. Enable OTel metrics on an Airflow 3.x deployment.
   2. Create a DAG with a TaskGroup or task ID containing a non-ASCII character
      (e.g. `ç`, `ã`, `ö`):
   
   ```python
   with TaskGroup("kmt_lista_preços_tasks") as tg:
       ...
   Let the DAG create a scheduled run so task instances exist in the database.
   Observe the scheduler entering CrashLoopBackOff with 
InvalidStatsNameException.
   
   ### What you think should happen instead?
   
   `name_is_otel_safe()` has a return type of `-> bool`. A function with that
   contract must never raise, it should return `False` for invalid names and
   allow the caller to skip the metric. The scheduler must never crash due to a
   metric emission failure.
   
   The validation logic in `stat_name_otel_handler` and 
`stat_name_default_handler`
   is correct, non-ASCII characters are genuinely invalid per the OTel 
instrument
   name spec. The problem is that 
[`name_is_otel_safe()`](https://github.com/apache/airflow/blob/c9d7f367ac4c7043454502b16d41340b3ec2c66d/shared/observability/src/airflow_shared/observability/metrics/otel_logger.py#L99)
 does not catch the
   `InvalidStatsNameException` that 
[`stat_name_otel_handler`](https://github.com/apache/airflow/blob/eeb5c9250417e761c401df8cfd421db576a0dbd1/shared/observability/src/airflow_shared/observability/metrics/validators.py#L135-L150)
 is documented to raise.
   
   ### Operating System
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to