davidzollo commented on issue #10293:
URL: https://github.com/apache/seatunnel/issues/10293#issuecomment-3723958192
Thanks for the feature request.
Today SeaTunnel’s PostgreSQL CDC source enforces `REPLICA IDENTITY FULL` for
every captured table (see `PostgresDialect#checkAllTablesEnabledCapture`).
```
public void checkAllTablesEnabledCapture(JdbcConnection jdbcConnection,
List<TableId> tableIds)
throws SQLException {
PostgresConnection postgresConnection = (PostgresConnection)
jdbcConnection;
for (TableId tableId : tableIds) {
ServerInfo.ReplicaIdentity replicaIdentity =
postgresConnection.readReplicaIdentityInfo(tableId);
if (!ServerInfo.ReplicaIdentity.FULL.equals(replicaIdentity)) {
throw new SeaTunnelException(
String.format(
"Table %s does not have a full replica
identity, please execute: ALTER TABLE %s REPLICA IDENTITY FULL;",
tableId, tableId));
}
}
}
```
The intention is correctness for UPDATE/DELETE: without FULL, PostgreSQL may
only emit key columns (or otherwise insufficient information depending on
replica identity), which can break downstream upsert/delete semantics and lead
to incomplete row images.
That said, your outbox use case (append-only from the consumer perspective)
is valid: forcing FULL can indeed increase WAL volume during periodic cleanup
DELETEs without adding value if you plan to ignore non-INSERT changes.
### Current workarounds
- Keep the current requirement and run:
- `ALTER TABLE <schema>.<table> REPLICA IDENTITY FULL;`
- If you want to ignore cleanup and only forward inserts, add a transform to
keep INSERT only (because cleanup still produces DELETE events):
- `FilterRowKind { include_kinds = [INSERT] }`
- If the table truly never produces UPDATE/DELETE at all, then you don’t
need this transform—but with “regular cleanup” you typically do.
### Proposed SeaTunnel improvement
- Add a config switch like `require_replica_identity_full = true|false`
(default `true` for compatibility). When set to `false`, skip the hard check
and document the implications:
- UPDATE/DELETE row images may be partial, depending on PostgreSQL replica
identity settings.
- Tables without a proper identity may not be safely replayable to sinks
that require keys.
- Optionally, relax the check to allow `DEFAULT/USING INDEX` when a primary
key (or replica identity index) exists, and only require FULL when no suitable
identity is available.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]