davidzollo commented on issue #10293:
URL: https://github.com/apache/seatunnel/issues/10293#issuecomment-3723958192

   
   Thanks for the feature request.
   
   Today SeaTunnel’s PostgreSQL CDC source enforces `REPLICA IDENTITY FULL` for 
every captured table (see `PostgresDialect#checkAllTablesEnabledCapture`). 
   ```
       public void checkAllTablesEnabledCapture(JdbcConnection jdbcConnection, 
List<TableId> tableIds)
               throws SQLException {
           PostgresConnection postgresConnection = (PostgresConnection) 
jdbcConnection;
           for (TableId tableId : tableIds) {
               ServerInfo.ReplicaIdentity replicaIdentity =
                       postgresConnection.readReplicaIdentityInfo(tableId);
               if (!ServerInfo.ReplicaIdentity.FULL.equals(replicaIdentity)) {
                   throw new SeaTunnelException(
                           String.format(
                                   "Table %s does not have a full replica 
identity, please execute: ALTER TABLE %s REPLICA IDENTITY FULL;",
                                   tableId, tableId));
               }
           }
       }
   ```
   
   The intention is correctness for UPDATE/DELETE: without FULL, PostgreSQL may 
only emit key columns (or otherwise insufficient information depending on 
replica identity), which can break downstream upsert/delete semantics and lead 
to incomplete row images.
   That said, your outbox use case (append-only from the consumer perspective) 
is valid: forcing FULL can indeed increase WAL volume during periodic cleanup 
DELETEs without adding value if you plan to ignore non-INSERT changes.
   
   ### Current workarounds
   
   - Keep the current requirement and run:
     - `ALTER TABLE <schema>.<table> REPLICA IDENTITY FULL;`
   - If you want to ignore cleanup and only forward inserts, add a transform to 
keep INSERT only (because cleanup still produces DELETE events):
     - `FilterRowKind { include_kinds = [INSERT] }`
     - If the table truly never produces UPDATE/DELETE at all, then you don’t 
need this transform—but with “regular cleanup” you typically do.
   
   ### Proposed SeaTunnel improvement
   
   - Add a config switch like `require_replica_identity_full = true|false` 
(default `true` for compatibility). When set to `false`, skip the hard check 
and document the implications:
     - UPDATE/DELETE row images may be partial, depending on PostgreSQL replica 
identity settings.
     - Tables without a proper identity may not be safely replayable to sinks 
that require keys.
   - Optionally, relax the check to allow `DEFAULT/USING INDEX` when a primary 
key (or replica identity index) exists, and only require FULL when no suitable 
identity is available.
   
   What do you think?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to