kaxil opened a new issue, #62734:
URL: https://github.com/apache/airflow/issues/62734
## LLMSchemaCompareOperator / @task.llm_schema_compare
Cross-system schema drift detection powered by LLM reasoning.
### What
Compare schemas across different database systems (PostgreSQL, Snowflake, S3
Parquet, etc.) and identify mismatches that would break data loading. The LLM
handles complex cross-system type mapping that simple equality checks miss
(e.g., `varchar(255)` vs `string`, `timestamp` vs `timestamptz`).
### Design
- Accepts multiple `data_sources` (or `db_conn_ids` + `table_names`) for
cross-system comparison
- Schema introspection from each source via the appropriate hook (DbApiHook,
S3Hook, etc.)
- System prompt includes schema context from all sources with clear labeling
(database name, dialect)
- `reasoning_mode=True` strongly recommended — complex cross-system type
mapping benefits from step-by-step analysis
- `context_strategy="full"` for thorough analysis (includes constraints,
indexes, clustering keys)
- Structured output: list of mismatches, severity, suggested migration
actions
### Use Cases
- Detect breaking schema changes before ETL runs
- Generate migration plans during maintenance windows
- Validate schema consistency across data warehouse replicas
- Compare source system schemas against downstream expectations
### Example
```python
schema_drift = LLMSchemaCompareOperator(
task_id="detect_schema_drift",
data_sources=[customer_s3, customer_postgres, customer_snowflake],
prompt="Identify schema mismatches that would break data loading between
systems",
reasoning_mode=True,
context_strategy="full",
llm_conn_id="openai_default",
)
# Decorator version
@task.llm_schema_compare(
db_conn_ids=["postgres_source", "snowflake_target"],
table_names=["customers"],
)
def check_migration_readiness():
is_maintenance = check_migration_window()
if is_maintenance:
return "Compare schemas and generate migration plan for maintenance
window"
return "Compare schemas and flag breaking changes — no migrations
allowed"
```
### Dependencies
- LLMOperator (merged)
- Multi-datasource support (for cross-database introspection)
### Phase
Phase 3
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]