[ https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745372#comment-17745372 ]
ASF subversion and git services commented on KUDU-1945: ------------------------------------------------------- Commit 94f246fa82026888bd6a5c730eaf90a912a6f3c1 in kudu's branch refs/heads/branch-1.17.x from Marton Greber [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=94f246fa8 ] KUDU-1945 Add Python example for non-unique PK In commit 3019848d00ac721b30d98a1aeb77bc205352b7b5 a C++ client example has been added to showcase the usage of the non-unique PK. This patch translates this particular example to Python. The C++ examples are tested during upstream submission, however the Java and Python client examples are not tested automatically. Therefore, I did manual testing and created a ticket to track this improvement: KUDU-3478. Manual testing has been done using the following configurations: Ubuntu 18.04 x86_64: Python2.7 Ubuntu 18.04 x86_64: Python3.7 macOS Monterey M1(arm64): Python3.7 (I couldn't setup the Python2.7 interpreter properly on my mac) STDOUT of the example: kudu.Schema { non_unique_key int32 NOT NULL auto_incrementing_id int64 NOT NULL int_val int32 NOT NULL PRIMARY KEY (non_unique_key, auto_incrementing_id) } Demonstrating scanning ... (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 3, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 3, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 2, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 2, auto_incrementing_id: 5, int_val: 1) (non_unique_key: 2, auto_incrementing_id: 6, int_val: 2) Scanned some row(s) WHERE non_unique_key = 1 (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 2) Demonstrating UPDATE ... Updated row(s) WHERE non_unique_key = 1 AND int_val = 2 to int_val = 98 (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 3, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 3, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 98) (non_unique_key: 2, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 2, auto_incrementing_id: 5, int_val: 1) (non_unique_key: 2, auto_incrementing_id: 6, int_val: 2) Updated row(s) WHERE non_unique_key = 2 to int_val = 99 (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 3, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 3, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 98) (non_unique_key: 2, auto_incrementing_id: 4, int_val: 99) (non_unique_key: 2, auto_incrementing_id: 5, int_val: 99) (non_unique_key: 2, auto_incrementing_id: 6, int_val: 99) Updated row(s) WHERE non_unique_key = 2 AND auto_incrementing_id = 5\ to int_val = 100 (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 3, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 3, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 98) (non_unique_key: 2, auto_incrementing_id: 4, int_val: 99) (non_unique_key: 2, auto_incrementing_id: 5, int_val: 100) (non_unique_key: 2, auto_incrementing_id: 6, int_val: 99) Demonstrating DELETE ... Deleted row(s) WHERE non_unique_key = 3 AND int_val = 1 (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 3, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 98) (non_unique_key: 2, auto_incrementing_id: 4, int_val: 99) (non_unique_key: 2, auto_incrementing_id: 5, int_val: 100) (non_unique_key: 2, auto_incrementing_id: 6, int_val: 99) Deleted row(s) WHERE non_unique_key = 2 (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 3, auto_incrementing_id: 3, int_val: 2) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 98) Deleted row(s) WHERE non_unique_key = 3 AND auto_incrementing_id = 3 (non_unique_key: 3, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 4, auto_incrementing_id: 4, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 1, int_val: 0) (non_unique_key: 1, auto_incrementing_id: 2, int_val: 1) (non_unique_key: 1, auto_incrementing_id: 3, int_val: 98) Deleted the table Done Change-Id: I1dd862c4f26b3d79ec268727363fae3426135f52 Reviewed-on: http://gerrit.cloudera.org:8080/19874 Tested-by: Kudu Jenkins Reviewed-by: Wenzhe Zhou <wz...@cloudera.com> Reviewed-by: Attila Bukor <abu...@apache.org> (cherry picked from commit 6d48cd6f90e7820fbabd0560139ad48fe9d66a29) Reviewed-on: http://gerrit.cloudera.org:8080/20226 Reviewed-by: Marton Greber <greber...@gmail.com> Reviewed-by: Yifan Zhang <chinazhangyi...@163.com> Tested-by: Yingchun Lai <laiyingc...@apache.org> > Support generation of surrogate primary keys (or tables with no PK) > ------------------------------------------------------------------- > > Key: KUDU-1945 > URL: https://issues.apache.org/jira/browse/KUDU-1945 > Project: Kudu > Issue Type: New Feature > Components: client, master, tablet > Reporter: Todd Lipcon > Priority: Major > Labels: roadmap-candidate > > Many use cases have data where there is no "natural" primary key. For > example, a web log use case mostly cares about partitioning and not about > precise sorting by timestamp, and timestamps themselves are not necessarily > unique. Rather than forcing users to come up with their own surrogate primary > keys, Kudu should support some kind of "auto_increment" equivalent which > generates primary keys on insertion. Alternatively, Kudu could support tables > which are partitioned but not internally sorted. > The advantages would be: > - Kudu can pick primary keys on insertion to guarantee that there is no > compaction required on the table (eg always assign a new key higher than any > existing key in the local tablet). This can improve write throughput > substantially, especially compared to naive PK generation schemes that a user > might pick such as UUID, which would generate a uniform random-insert > workload (worst case for performance) > - Make Kudu easier to use for such use cases (no extra client code necessary) -- This message was sent by Atlassian Jira (v8.20.10#820010)