kisssam opened a new issue, #39567:
URL: https://github.com/apache/airflow/issues/39567

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==10.17.0
   
   ### Apache Airflow version
   
   airflow-2.7.3   
   
   ### Operating System
   
   Running on Google Cloud Composer
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   apache-airflow-providers-google==10.17.0
   
   ### What happened
   
   When a task using BigQueryInsertJobOperator  has exactly 64 characters in 
its `task_id`, the task fails with the following error:
   
   ```
   [2024-05-10TXX:XX:XX.XXX+0000] {standard_task_runner.py:104} ERROR - Failed 
to execute job XXXXXXXX for task 
task_id_with_exactly_64_characters_00000000000000000000000000000 (400 POST 
https://bigquery.googleapis.com/bigquery/v2/projects/<PROJECT_ID>/jobs?prettyPrint=false:
 Label value "task_id_with_exactly_64_characters_00000000000000000000000000000" 
has invalid characters.
   
   ```
   
   when the provider package `apache-airflow-providers-google` is of version 
10.17.0.
   
   
   
   ### What you think should happen instead
   
   If the task_id does not follow the conditions for BQ label values, i.e., 
Values can be empty, and have a maximum length of 63 characters and can contain 
only lowercase letters, numeric characters, underscores, and dashes - then the 
BigQuery job should still get created successfully without the default labels 
not being added , instead of failing as currently observed in case of task_id 
with 64 characters.
   
   ### How to reproduce
   
   * Create a Airflow environment with apache-airflow-providers-google==10.17.0 
.
   
   * Create a task with the task_id as 
"task_id_with_exactly_64_characters_00000000000000000000000000000" using the 
BigQueryInsertJobOperator to create any BQ query job.
   
   * Observe that the job fails with the error `Label value 
"task_id_with_exactly_64_characters_00000000000000000000000000000" has invalid 
characters.`
   
   ### Anything else
   
   This is occurring as a result of the validation introduced in #37736.
   
   #37736 automatically sets the `airflow-dag` and `airflow-task` as job labels 
for the BigQuery job created as long as these identifiers follow the regex 
pattern `LABEL_REGEX = re.compile(r"^[a-z][\w-]{0,63}$")` - which means that 
the task_id name regex matches a pattern starting  with a lowercase letter and 
has a maximum length of 64 characters and contain only alphanumeric characters, 
underscores, or hyphens. 
   Otherwise, the BigQueryInsertJobOperator will create a job without adding 
any default labels (for example, in the case of task_id greater than 64 
characters).
   
   However, as per the [BigQuery documentation for 
labels](https://cloud.google.com/bigquery/docs/labels-intro#requirements), 
Values can be empty, and have a maximum length of 63 characters and can contain 
only lowercase letters, numeric characters, underscores, and dashes.
   
   Hence, the current validation regex `LABEL_REGEX` does not satisfy the 
conditions for BigQuery label values.
   
   For the edge case of a task_id with 64 characters, this passes the 
validation in `LABEL_REGEX` but because BigQuery label values only support upto 
63 characters, the BigQuery job creation fails.
   
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to