Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/19383 )
Change subject: WIP IMPALA-11809: Support non unique primary key for Kudu ...................................................................... Patch Set 10: (13 comments) Added some comments on testing and the commit message. http://gerrit.cloudera.org:8080/#/c/19383/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19383/10//COMMIT_MSG@30 PS10, Line 30: Kudu assign values for auto-incrementing column automatically : when inserting rows so insertion statements don't need to specify : values for auto-incrementing column. Kudu recently enables the auto-incrementing column feature. The feature works by appending a system generated column to the primary key columns to guarantee the uniqueness on primary key, when the primary key columns can be non-unique. This auto column is of type big int and the assignment to it during insertion is automatic. http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365 PS10, Line 2365: partition by hash may add some more test cases with the following. 1. No partition by clause 2. Non unique primary key includes all selected columns It seems even when primary key list is empty, the auto incrementing column can still function as the primary key. Do we allow such a case? http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java: http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@88 PS10, Line 88: AnalyzesOk("create table tab (x int non unique primary key) partition by hash(x) " + : "partitions 8 stored as kudu", isExternalPurgeTbl); : AnalyzesOk("create table tab (x int, non unique primary key(x)) " + : "partition by hash(x) partitions 8 stored as kudu", isExternalPurgeTbl); : AnalyzesOk("create table tab (x int, y int, non unique primary key (x, y)) " + : "partition by hash(x, y) partitions 8 stored as kudu", isExternalPurgeTbl); : AnalyzesOk("create table tab (x int, y int, non unique primary key (x)) " + : "partition by hash(x) partitions 8 stored as kudu", isExternalPurgeTbl); : AnalyzesOk("create table tab (x int, y int, non unique primary key(x, y)) " + : "partition by hash(y) partitions 8 stored as kudu", isExternalPurgeTbl); : AnalyzesOk("create table tab (x timestamp, y timestamp, non unique primary key(x)) "+ : "partition by hash(x) partitions 8 stored as kudu", isExternalPurgeTbl); : // Promote all partition columns as non unique primary key columns if primary keys : // are not declared, but partition columns must be the first columns in the table. : AnalyzesOk("create table tab (x int, y int) partition by hash(x) partitions 8 " + : "stored as kudu", isExternalPurgeTbl); : AnalyzesOk("create table tab (x int, y int) partition by hash(x, y) partitions 8 " + : "stored as kudu", isExternalPurgeTbl); : AnalysisError("create table tab (x int, y int) partition by hash(y) partitions 8 " + : "stored as kudu", "Specify primary key or non unique primary key for the Kudu " + : "table, or create partitions with the beginning columns of the table.", : isExternalPurgeTbl); In some of the new tests, may try partition by range and no partition to cover more cases. See https://kudu.apache.org/docs/kudu_impala_integration.html. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@204 PS10, Line 204: order by id Wonder if we can order DESC by auto_incrementing_id which can be useful to get the latest rows. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@709 PS10, Line 709: auto-incrementing column auto_incrementing_id nit. The column name appeared twice. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@714 PS10, Line 714: primary key nit. system generated primary key http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@512 PS10, Line 512: a int, b string, May add a new test case where the type of a and b is swapped (i.e., a string, b int). Do we allow create able <t> (a string, non unique primary key(a))? http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test File testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test@416 PS10, Line 416: by hash (id) Wonder if partition by range (auto_incrementing_id) is allowed, which can be quite useful to precisely control the rows in each partition. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test File testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test@139 PS10, Line 139: bigint It may be useful to allow the type of the auto column to be specified by the user, for performance reasons. For example, a type of int on the auto column only reads in 4 bytes instead of 8 bytes. Not sure if Kudu allows such a flexibility. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test@646 PS10, Line 646: (1, 10, 'ten'), (2, 20, 'twenty'); No duplication on id. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test File testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test@43 PS10, Line 43: Computing stats Should we also get a report on the stats for the auto column here? The stats is useful as the auto column can appear in predicates or in select list. http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_update.test File testdata/workloads/functional-query/queries/QueryTest/kudu_update.test: http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_update.test@431 PS10, Line 431: Key column 'auto_incrementing_id' cannot be updated. This is a cool test. nit. System generated primary key column http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_update.test@441 PS10, Line 441: order by nit. May test the auto column in GROUP by, ORDER BY, or in sequence functions. -- To view, visit http://gerrit.cloudera.org:8080/19383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d7882bf3d01a3492cc9827c072d1f3200d9eebd Gerrit-Change-Number: 19383 Gerrit-PatchSet: 10 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Marton Greber <greber...@gmail.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 17 Jan 2023 15:50:36 +0000 Gerrit-HasComments: Yes