[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

ASF subversion and git services (Jira) Thu, 20 Jul 2023 21:14:09 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745372#comment-17745372
 ]


ASF subversion and git services commented on KUDU-1945:
-------------------------------------------------------

Commit 94f246fa82026888bd6a5c730eaf90a912a6f3c1 in kudu's branch 
refs/heads/branch-1.17.x from Marton Greber
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=94f246fa8 ]

KUDU-1945 Add Python example for non-unique PK

In commit 3019848d00ac721b30d98a1aeb77bc205352b7b5 a C++ client example
has been added to showcase the usage of the non-unique PK. This patch
translates this particular example to Python.

The C++ examples are tested during upstream submission, however the Java
and Python client examples are not tested automatically. Therefore,
I did manual testing and created a ticket to track this improvement:
KUDU-3478.

Manual testing has been done using the following configurations:
Ubuntu 18.04 x86_64: Python2.7
Ubuntu 18.04 x86_64: Python3.7
macOS Monterey M1(arm64): Python3.7
(I couldn't setup the Python2.7 interpreter properly on my mac)

STDOUT of the example:
kudu.Schema {
  non_unique_key        int32 NOT NULL
  auto_incrementing_id  int64 NOT NULL
  int_val               int32 NOT NULL
  PRIMARY KEY (non_unique_key, auto_incrementing_id)
}
Demonstrating scanning ...
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 3, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 3, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 2, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 2, auto_incrementing_id: 5, int_val: 1)
(non_unique_key: 2, auto_incrementing_id: 6, int_val: 2)
Scanned some row(s) WHERE non_unique_key = 1
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 2)
Demonstrating UPDATE ...
Updated row(s) WHERE non_unique_key = 1 AND int_val = 2 to int_val = 98
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 3, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 3, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 98)
(non_unique_key: 2, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 2, auto_incrementing_id: 5, int_val: 1)
(non_unique_key: 2, auto_incrementing_id: 6, int_val: 2)
Updated row(s) WHERE non_unique_key = 2 to int_val = 99
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 3, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 3, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 98)
(non_unique_key: 2, auto_incrementing_id: 4, int_val: 99)
(non_unique_key: 2, auto_incrementing_id: 5, int_val: 99)
(non_unique_key: 2, auto_incrementing_id: 6, int_val: 99)
Updated row(s) WHERE non_unique_key = 2 AND auto_incrementing_id = 5\
to int_val = 100
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 3, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 3, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 98)
(non_unique_key: 2, auto_incrementing_id: 4, int_val: 99)
(non_unique_key: 2, auto_incrementing_id: 5, int_val: 100)
(non_unique_key: 2, auto_incrementing_id: 6, int_val: 99)
Demonstrating DELETE ...
Deleted row(s) WHERE non_unique_key = 3 AND int_val = 1
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 3, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 98)
(non_unique_key: 2, auto_incrementing_id: 4, int_val: 99)
(non_unique_key: 2, auto_incrementing_id: 5, int_val: 100)
(non_unique_key: 2, auto_incrementing_id: 6, int_val: 99)
Deleted row(s) WHERE non_unique_key = 2
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 3, auto_incrementing_id: 3, int_val: 2)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 98)
Deleted row(s) WHERE non_unique_key = 3 AND auto_incrementing_id = 3
(non_unique_key: 3, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 4, auto_incrementing_id: 4, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 1, int_val: 0)
(non_unique_key: 1, auto_incrementing_id: 2, int_val: 1)
(non_unique_key: 1, auto_incrementing_id: 3, int_val: 98)
Deleted the table
Done

Change-Id: I1dd862c4f26b3d79ec268727363fae3426135f52
Reviewed-on: http://gerrit.cloudera.org:8080/19874
Tested-by: Kudu Jenkins
Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
Reviewed-by: Attila Bukor <abu...@apache.org>
(cherry picked from commit 6d48cd6f90e7820fbabd0560139ad48fe9d66a29)
Reviewed-on: http://gerrit.cloudera.org:8080/20226
Reviewed-by: Marton Greber <greber...@gmail.com>
Reviewed-by: Yifan Zhang <chinazhangyi...@163.com>
Tested-by: Yingchun Lai <laiyingc...@apache.org>


> Support generation of surrogate primary keys (or tables with no PK)
> -------------------------------------------------------------------
>
>                 Key: KUDU-1945
>                 URL: https://issues.apache.org/jira/browse/KUDU-1945
>             Project: Kudu
>          Issue Type: New Feature
>          Components: client, master, tablet
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: roadmap-candidate
>
> Many use cases have data where there is no "natural" primary key. For 
> example, a web log use case mostly cares about partitioning and not about 
> precise sorting by timestamp, and timestamps themselves are not necessarily 
> unique. Rather than forcing users to come up with their own surrogate primary 
> keys, Kudu should support some kind of "auto_increment" equivalent which 
> generates primary keys on insertion. Alternatively, Kudu could support tables 
> which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no 
> compaction required on the table (eg always assign a new key higher than any 
> existing key in the local tablet). This can improve write throughput 
> substantially, especially compared to naive PK generation schemes that a user 
> might pick such as UUID, which would generate a uniform random-insert 
> workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

Reply via email to