JNSimba opened a new pull request, #61267:
URL: https://github.com/apache/doris/pull/61267
### What problem does this PR solve?
Add column-level filtering support for PostgreSQL CDC streaming jobs via the
`table.<tableName>.exclude_columns` property. Users can specify a
comma-separated
list of columns to exclude from synchronization.
**Syntax example:**
```sql
CREATE JOB my_job
ON STREAMING
FROM POSTGRES (
...
"table.my_table.exclude_columns" = "secret,internal_col"
)
TO DATABASE my_db (...)
```
#### Changes
FE (validation & table creation)
- DataSourceConfigKeys: add TABLE and TABLE_EXCLUDE_COLUMNS_SUFFIX
constants
- DataSourceConfigValidator: recognize table.<name>.exclude_columns as a
valid
per-table config key (using suffix allowlist)
- StreamingJobUtils.generateCreateTableCmds(): parse excluded columns,
validate
they exist in the upstream PG table and are not PK columns, then exclude
them
from the Doris CREATE TABLE statement
cdc_client (DML filtering & schema change handling)
- ConfigUtil: add parseExcludeColumns(config, tableName) utility
- DebeziumJsonDeserializer: skip excluded fields when building
INSERT/UPDATE/DELETE rows
- PostgresDebeziumJsonDeserializer: skip DROP/ADD DDL for excluded columns
during
schema change detection, so the Doris table is never modified for columns
it
was never meant to have
#### Behavior
| Scenario | Behavior
|
|--------------------------------|------------------------------------------------------------|
| Snapshot / incremental DML | Excluded column values are not written to
Doris |
| PG DROP excluded column | DDL skipped; stored schema updated; sync
continues |
| PG ADD excluded column back | DDL skipped; sync continues; Doris never
gains the column |
| Exclude non-existent column | CREATE JOB fails with clear error
|
| Exclude PK column | CREATE JOB fails with clear error
|
#### Tests
- test_streaming_postgres_job_col_filter.groovy: covers validation errors,
snapshot filtering, incremental DML filtering, DROP excluded column, re-ADD
excluded column; uses Awaitility polling instead of fixed sleeps
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]