Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: WIP IMPALA-11809: Support non unique primary key for Kudu
......................................................................


Patch Set 10:

(13 comments)

Added some comments on testing and the commit message.

http://gerrit.cloudera.org:8080/#/c/19383/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19383/10//COMMIT_MSG@30
PS10, Line 30: Kudu assign values for auto-incrementing column automatically
             : when inserting rows so insertion statements don't need to specify
             : values for auto-incrementing column.
Kudu recently enables the auto-incrementing column feature.  The feature works 
by appending a system generated column to the primary key columns to guarantee 
the uniqueness on primary key, when the primary key columns can be non-unique.

This auto column is of type big int and the assignment to it during insertion 
is automatic.


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365
PS10, Line 2365: partition by hash
may add some more test cases with the following.

1. No partition by clause
2. Non unique primary key includes all selected columns

It seems even when primary key list is empty, the auto incrementing column can 
still  function as the primary key.  Do we allow such a case?


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@88
PS10, Line 88: AnalyzesOk("create table tab (x int non unique primary key) 
partition by hash(x) " +
             :         "partitions 8 stored as kudu", isExternalPurgeTbl);
             :     AnalyzesOk("create table tab (x int, non unique primary 
key(x)) " +
             :         "partition by hash(x) partitions 8 stored as kudu", 
isExternalPurgeTbl);
             :     AnalyzesOk("create table tab (x int, y int, non unique 
primary key (x, y)) " +
             :         "partition by hash(x, y) partitions 8 stored as kudu", 
isExternalPurgeTbl);
             :     AnalyzesOk("create table tab (x int, y int, non unique 
primary key (x)) " +
             :         "partition by hash(x) partitions 8 stored as kudu", 
isExternalPurgeTbl);
             :     AnalyzesOk("create table tab (x int, y int, non unique 
primary key(x, y)) " +
             :         "partition by hash(y) partitions 8 stored as kudu", 
isExternalPurgeTbl);
             :     AnalyzesOk("create table tab (x timestamp, y timestamp, non 
unique primary key(x)) "+
             :         "partition by hash(x) partitions 8 stored as kudu", 
isExternalPurgeTbl);
             :     // Promote all partition columns as non unique primary key 
columns if primary keys
             :     // are not declared, but partition columns must be the first 
columns in the table.
             :     AnalyzesOk("create table tab (x int, y int) partition by 
hash(x) partitions 8 " +
             :         "stored as kudu", isExternalPurgeTbl);
             :     AnalyzesOk("create table tab (x int, y int) partition by 
hash(x, y) partitions 8 " +
             :         "stored as kudu", isExternalPurgeTbl);
             :     AnalysisError("create table tab (x int, y int) partition by 
hash(y) partitions 8 " +
             :         "stored as kudu", "Specify primary key or non unique 
primary key for the Kudu " +
             :         "table, or create partitions with the beginning columns 
of the table.",
             :         isExternalPurgeTbl);
In some of the new tests, may try partition by range and no partition to cover 
more cases.  See https://kudu.apache.org/docs/kudu_impala_integration.html.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@204
PS10, Line 204: order by id
Wonder if we can order DESC by auto_incrementing_id which can be useful to get 
the latest rows.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@709
PS10, Line 709: auto-incrementing column auto_incrementing_id
nit. The column name appeared twice.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@714
PS10, Line 714: primary key
nit. system generated primary key


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@512
PS10, Line 512: a int, b string,
May  add a new test case where the type of a and b is swapped (i.e., a string, 
b int).

Do we allow
create able <t> (a string, non unique primary key(a))?


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test@416
PS10, Line 416: by hash (id)
Wonder if partition by range (auto_incrementing_id) is allowed, which can be 
quite useful to precisely control the rows in each partition.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test@139
PS10, Line 139: bigint
It may be useful to allow the type of the auto column to be specified by the 
user, for performance reasons.  For example, a type of int on the auto column 
only reads in 4 bytes instead of 8 bytes. Not sure if Kudu allows such a 
flexibility.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test@646
PS10, Line 646: (1, 10, 'ten'), (2, 20, 'twenty');
No duplication on id.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test@43
PS10, Line 43: Computing stats
Should we also get a report on the stats for the auto column here? The stats is 
useful as the auto column can appear in predicates or in select list.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_update.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_update.test@431
PS10, Line 431: Key column 'auto_incrementing_id' cannot be updated.
This is a cool test.

nit. System generated primary key column


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_update.test@441
PS10, Line 441: order by
nit. May test the auto column in GROUP by, ORDER BY, or in sequence functions.



--
To view, visit http://gerrit.cloudera.org:8080/19383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d7882bf3d01a3492cc9827c072d1f3200d9eebd
Gerrit-Change-Number: 19383
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Marton Greber <greber...@gmail.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Jan 2023 15:50:36 +0000
Gerrit-HasComments: Yes

Reply via email to