yzeng1618 opened a new pull request, #10093: URL: https://github.com/apache/seatunnel/pull/10093
https://github.com/apache/seatunnel/issues/10092 ### Purpose of this pull request This pull request fixes a bug where `CatalogUtils.getCatalogTable(Connection, String, JdbcDialectTypeMapper)` did not populate primary key information when the JDBC source was configured only with `query`. As a result, downstream connectors such as Doris, which rely on primary key metadata in their default `save_mode_create_template`, failed with COMMON-24 when trying to auto-create tables. The fix makes `CatalogUtils.getCatalogTable(Connection, String, JdbcDialectTypeMapper)`: - Use `ResultSetMetaData` to build the base `CatalogTable` (preserving existing behavior); - Attempt to resolve the underlying table name (catalog/schema/table) from `ResultSetMetaData`; - Use `DatabaseMetaData.getPrimaryKeys(...)` (via existing `CatalogUtils.getPrimaryKey(...)`) to obtain the primary key; - Apply the primary key to the `CatalogTable` **only if** all primary key columns are present in the query result; - Fall back to the original behavior when the table name cannot be resolved, when the driver does not provide metadata, or when the query does not include all primary key columns. This allows Doris sink (and potentially other sinks) to use the default UNIQUE KEY template with `${rowtype_primary_key}` even when the upstream JDBC source is configured using `query` instead of `table_path`. ### Does this PR introduce any user-facing change? Yes. **Previous behavior** - For JDBC sources configured only with `query`, the `CatalogTable` produced by `CatalogUtils.getCatalogTable(Connection, String, JdbcDialectTypeMapper)` had no primary key information. - Doris sink using the default `save_mode_create_template` failed with COMMON-24 when the Doris table did not exist, even if the upstream table had a primary key. **New behavior** - For simple single-table queries where the JDBC driver provides table metadata via `ResultSetMetaData` and `DatabaseMetaData.getPrimaryKeys(...)`, `CatalogTable` will now contain the primary key (if all PK columns are included in the query). - Doris sink can successfully auto-create tables using the default UNIQUE KEY template. - If the table name cannot be determined, or the query does not include all primary key columns, the behavior remains unchanged (no primary key is set). This is a user-facing improvement but should be fully backward compatible for drivers or queries where primary key metadata cannot be resolved. ### How was this patch tested? 1. **Unit tests** Added unit tests in `seatunnel-connectors-v2/connector-jdbc/src/test/java/org/apache/seatunnel/connectors/seatunnel/jdbc/catalog/utils/CatalogUtilsTest.java`: - `testGetCatalogTableWithPrimaryKeyFromQuery` - Mocks `Connection`, `PreparedStatement`, and `ResultSetMetaData` so that: - `ResultSetMetaData` returns a table name (`test_table`) with columns `id` and `name`; - `connection.getMetaData()` uses existing `TestDatabaseMetaData`, which returns a primary key on `id`. - Verifies that `CatalogUtils.getCatalogTable(connection, "select id, name from test_table", typeMapper)` returns a `CatalogTable` whose `TableSchema` contains a non-null primary key named `testfdawe_` on column `id`. - `testGetCatalogTableNotApplyPrimaryKeyWhenMissingColumns` - Mocks the same table, but the query result contains only column `name` (no `id`); - Verifies that the returned `CatalogTable` does **not** have a primary key, ensuring we only apply PK when all PK columns are present in the query result. 2. **E2E tests** Extended `JdbcHanaIT` in `seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-6`: - Existing test: `testCatalog()` already verifies that `CatalogUtils.getCatalogTable(connection, TablePath.of(SOURCE_TABLE), new SapHanaTypeMapper())` can obtain the correct primary key and columns. - New test: `testCatalogWithQuery()` - Builds a simple query: `SELECT * FROM TEST.ALLDATATYPES`; - Calls `CatalogUtils.getCatalogTable(connection, query, new SapHanaTypeMapper())`; - Asserts that: - The returned `TableSchema` has a non-null primary key with exactly one column; - The number of columns matches the table schema (25 columns). - This verifies the query-only path against a real HANA database and driver with save-mode create table enabled. 3. **Manual verification (outside of this PR)** - Manually ran a MySQL → Doris pipeline where: - JDBC source is configured only with `query` on a table that has `PRIMARY KEY(id)`; - Doris sink uses the default `save_mode_create_template` and `CREATE_SCHEMA_WHEN_NOT_EXIST`. - Before this patch, the job failed with COMMON-24 and did not create the Doris table. - After this patch, the job successfully created the Doris table and completed normally. ### Check list * [x] No new jar binary packages are introduced. * [x] No documentation changes are strictly required, but we may consider adding a note in JDBC/Doris connector docs that primary key metadata is now inferred for query-only sources when possible. * [x] No incompatibilities are introduced; behavior is additive and best-effort. * [x] Connector e2e tests are updated (`JdbcHanaIT`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
