junyeong0619 opened a new issue, #67474:
URL: https://github.com/apache/airflow/issues/67474

   ### Description
   
   airflow.utils.sqlalchemy.random_db_uuid is a FunctionElement that compiles 
to different SQL expressions per dialect. The current name suggests a uuid 
return, but only PostgreSQL actually returns a native uuid — MySQL returns 
CHAR(36), SQLite returns TEXT. The function's own type = String() annotation 
also contradicts the PostgreSQL runtime type.
   
   I'd like to discuss renaming the function to something that accurately 
reflects what it returns across all three dialects (e.g. random_db_identifier).
   
   ### Use case/motivation
   
   The name `random_db_uuid` is honest only on PostgreSQL. The `type = 
String()` annotation is honest only on MySQL/SQLite. They disagree with each 
other:
   
   | Dialect    | Compiled SQL        | Runtime type | Matches "uuid" in name? 
| Matches `type = String()`? |
   
|------------|---------------------|--------------|-------------------------|----------------------------|
   | PostgreSQL | `gen_random_uuid()` | `uuid`       | yes                     
| no                         |
   | MySQL      | `UUID()`            | `CHAR(36)`   | no                      
| yes                        |
   | SQLite     | `uuid4()`           | `TEXT`       | no                      
| yes                        |
   
   The fix in #67388 addressed a PostgreSQL-specific symptom — the `CASE` 
THEN/ELSE arms had mismatched types (`uuid` from `gen_random_uuid()` vs. `text` 
from the `external_executor_id` column), and PostgreSQL refused to unify them. 
Wrapping the call site with `sql_cast(..., Text)` resolved it. MySQL coerces 
`CHAR(36)` and `TEXT` permissively, and SQLite treats both as `TEXT`, so 
neither surfaced the issue.
   
   This issue is a separate observation I made while reading that PR: the 
function name promises a uuid return type that only PostgreSQL actually 
provides, and the `type = String()` annotation breaks that promise on 
PostgreSQL. The rename is **not** a fix for #67388 — that's already merged. 
It's a proposal to clean up the latent inconsistency between the name, the type 
annotation, and the actual runtime behavior.
   
   Two approaches were discussed in the #67388 review:
   
   1. Wrap each call site with a cast (`sql_cast(random_db_uuid(), Text)`) — 
the merged approach.
   2. Bake `::text` into the PostgreSQL `@compiles` function — proposed by 
kaxil, declined by ashb on the principle that *"a function called uuid should 
return exactly that — a uuid."*
   
   Both positions are defensible under the current name. If the function were 
renamed to express its role (identifier) rather than its implementation (uuid) 
— e.g. `random_db_identifier` — the type contract @ashb was protecting would no 
longer apply, and the compile-time cast @kaxil suggested would become a natural 
follow-up. That follow-up is intentionally out of scope here; this issue is 
about the rename only.
   
   **Goals:**
   - A name that accurately describes what every dialect actually returns.
   - Resolve the contradiction with the `type = String()` annotation.
   - Leave room for future identifier schemes (ULID, snowflake, etc.) without 
the name lying.
   
   **Scope of the change is small:**
   - `airflow-core/src/airflow/utils/sqlalchemy.py` — class definition + 3 
`@compiles` registrations.
   - `airflow-core/src/airflow/jobs/scheduler_job_runner.py` — 1 import + 3 
call sites.
   - No documentation references, no provider imports. Internal utility only.
   
   I might be missing context for why the current name was chosen — happy to 
hear it.
   
   
   ### Related issues
   
   - #67388 — Correctly pre-allocate `external_executor_id` with multiple 
executors — the PR where the naming question first came up in review.
   - #67385 — Scheduler fails in setting external executor if multiple 
executors in use — the bug report that motivated #67388.
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to