[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

ASF subversion and git services (Jira) Tue, 23 May 2023 23:25:09 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725646#comment-17725646
 ]


ASF subversion and git services commented on KUDU-1945:
-------------------------------------------------------

Commit 65bf631fabc39be35afde7305a0e610a383f65c4 in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=65bf631fa ]

[tools] KUDU-1945: Kudu table copy and perf loadgen

This patch introduces support for 'table copy' CLI tool for the tables
with auto-incrementing columns. While scanning the source table, we do
not scan the auto incrementing column and while writing back the
rows to the destination table, we write all the scanned rows. The
auto-incrementing column is then populated at the server side during
each row's write.

It also adds support for 'perf loadgen' CLI tool to insert into tables
with auto incrementing columns. This currently only works with insert
and insert_ignore write operation. upsert and upsert_ignore are not
supported currently.

Change-Id: I754a7e84c16d1f3b2d52be937e1eb50b3d00d759
Reviewed-on: http://gerrit.cloudera.org:8080/19890
Reviewed-by: Alexey Serbin <ale...@apache.org>
Tested-by: Kudu Jenkins


> Support generation of surrogate primary keys (or tables with no PK)
> -------------------------------------------------------------------
>
>                 Key: KUDU-1945
>                 URL: https://issues.apache.org/jira/browse/KUDU-1945
>             Project: Kudu
>          Issue Type: New Feature
>          Components: client, master, tablet
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: roadmap-candidate
>
> Many use cases have data where there is no "natural" primary key. For 
> example, a web log use case mostly cares about partitioning and not about 
> precise sorting by timestamp, and timestamps themselves are not necessarily 
> unique. Rather than forcing users to come up with their own surrogate primary 
> keys, Kudu should support some kind of "auto_increment" equivalent which 
> generates primary keys on insertion. Alternatively, Kudu could support tables 
> which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no 
> compaction required on the table (eg always assign a new key higher than any 
> existing key in the local tablet). This can improve write throughput 
> substantially, especially compared to naive PK generation schemes that a user 
> might pick such as UUID, which would generate a uniform random-insert 
> workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

Reply via email to