kaxil opened a new issue, #62734:
URL: https://github.com/apache/airflow/issues/62734

   ## LLMSchemaCompareOperator / @task.llm_schema_compare
   
   Cross-system schema drift detection powered by LLM reasoning.
   
   ### What
   
   Compare schemas across different database systems (PostgreSQL, Snowflake, S3 
Parquet, etc.) and identify mismatches that would break data loading. The LLM 
handles complex cross-system type mapping that simple equality checks miss 
(e.g., `varchar(255)` vs `string`, `timestamp` vs `timestamptz`).
   
   ### Design
   
   - Accepts multiple `data_sources` (or `db_conn_ids` + `table_names`) for 
cross-system comparison
   - Schema introspection from each source via the appropriate hook (DbApiHook, 
S3Hook, etc.)
   - System prompt includes schema context from all sources with clear labeling 
(database name, dialect)
   - `reasoning_mode=True` strongly recommended — complex cross-system type 
mapping benefits from step-by-step analysis
   - `context_strategy="full"` for thorough analysis (includes constraints, 
indexes, clustering keys)
   - Structured output: list of mismatches, severity, suggested migration 
actions
   
   ### Use Cases
   
   - Detect breaking schema changes before ETL runs
   - Generate migration plans during maintenance windows
   - Validate schema consistency across data warehouse replicas
   - Compare source system schemas against downstream expectations
   
   ### Example
   
   ```python
   schema_drift = LLMSchemaCompareOperator(
       task_id="detect_schema_drift",
       data_sources=[customer_s3, customer_postgres, customer_snowflake],
       prompt="Identify schema mismatches that would break data loading between 
systems",
       reasoning_mode=True,
       context_strategy="full",
       llm_conn_id="openai_default",
   )
   
   # Decorator version
   @task.llm_schema_compare(
       db_conn_ids=["postgres_source", "snowflake_target"],
       table_names=["customers"],
   )
   def check_migration_readiness():
       is_maintenance = check_migration_window()
       if is_maintenance:
           return "Compare schemas and generate migration plan for maintenance 
window"
       return "Compare schemas and flag breaking changes — no migrations 
allowed"
   ```
   
   ### Dependencies
   
   - LLMOperator (merged)
   - Multi-datasource support (for cross-database introspection)
   
   ### Phase
   
   Phase 3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to