SameerMesiah97 opened a new pull request, #67045:
URL: https://github.com/apache/airflow/pull/67045

   **Description**
   
   This change adds a new `PostgresHook.upsert_rows` method that provides 
native PostgreSQL UPSERT support using `INSERT ... ON CONFLICT`.
   
   The new method supports configurable conflict targets through 
`conflict_fields` and selective updates through `update_fields`. When 
`update_fields` is omitted or empty, conflicting rows are ignored using `DO 
NOTHING`.
   
   `upsert_rows` reuses the existing batching, transaction handling, 
serialization, and lineage behavior used by `insert_rows`, while introducing 
PostgreSQL-specific UPSERT semantics that are not currently exposed through the 
generic insert abstraction.
   
   This PR is dependent on PR #66893 merging first.
   
   **Rationale**
   
   `DbApiHook.insert_rows` currently supports a generic `replace=True` 
abstraction delegated through dialect-specific SQL generation. However, 
PostgreSQL UPSERT semantics require additional concepts that are not 
representable through the existing API, including explicit conflict targets and 
selective update columns.
   
   Supporting PostgreSQL-native UPSERT behavior through `insert_rows` would 
require introducing PostgreSQL-specific arguments such as `conflict_fields` and 
`update_fields` into the shared public `DbApiHook.insert_rows` API. Since 
`DbApiHook` is inherited broadly across providers, expanding the generic insert 
abstraction with provider-specific UPSERT semantics would increase API 
complexity and introduce ambiguous behavior for non-PostgreSQL hooks.
   
   Adding a dedicated `PostgresHook.upsert_rows` method keeps PostgreSQL `ON 
CONFLICT` semantics explicit and self-contained while avoiding backwards 
compatibility and abstraction concerns in the shared `DbApiHook` interface.
   
   The implementation uses PostgreSQL-native `INSERT ... ON CONFLICT` semantics 
rather than `MERGE`, since `ON CONFLICT` is the established and more broadly 
compatible UPSERT mechanism across supported PostgreSQL versions.
   
   **Tests**
   
   Added unit tests verifying that:
   
   * Standard UPSERT operations correctly generate `ON CONFLICT DO UPDATE` SQL.
   * UPSERT operations correctly support single and composite conflict fields.
   * UPSERT operations correctly support single and multiple update fields.
   * `DO NOTHING` behavior is generated when `update_fields` is omitted.
   * `fast_executemany=True` uses `psycopg2.extras.execute_batch`.
   * `commit_every` correctly chunks UPSERT operations across transactions.
   * Empty row collections do not generate SQL or emit lineage.
   * Empty or invalid `target_fields` and `conflict_fields` raise validation 
errors.
   
   **Backwards Compatibility**
   
   This change introduces a new provider-specific API and does not modify 
existing `insert_rows` behavior or shared `DbApiHook` interfaces.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to