This is an automated email from the ASF dual-hosted git repository. alexey pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push: new cf550d6d7 KUDU-2671: Update upstream docs cf550d6d7 is described below commit cf550d6d7cdd61f6c65f9ef75a1706cb91839876 Author: Mahesh Reddy <mre...@cloudera.com> AuthorDate: Tue Mar 5 15:20:33 2024 -0800 KUDU-2671: Update upstream docs This patch updates the upstream docs to include range specific hash schemas within the partitioning section. An example with the proper sql syntax is also included in the kudu impala integration doc. Change-Id: I8da554851a124d1d357be65d8bcc2c6c37875dcc Reviewed-on: http://gerrit.cloudera.org:8080/21108 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <ale...@apache.org> --- docs/kudu_impala_integration.adoc | 42 +++++++++++++++++++++++++++++++++++++++ docs/schema_design.adoc | 16 +++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc index 0def0477c..de01c3d59 100755 --- a/docs/kudu_impala_integration.adoc +++ b/docs/kudu_impala_integration.adoc @@ -485,6 +485,48 @@ The example creates 16 partitions. You could also use `HASH (id, sku) PARTITIONS However, a scan for `sku` values would almost always impact all 16 partitions, rather than possibly being limited to 4. +.Range-Specific Hash Schemas +As of 1.17, Kudu supports range-specific hash schemas for tables. It's possible to +add ranges with a hash schema independent of the table-wide hash schema. This can be +done while creating or altering the table. The number of hash partition levels must +be the same across all ranges in a table. + +[source, sql] +---- +CREATE TABLE cust_behavior ( + id BIGINT, + sku STRING, + salary STRING, + edu_level INT, + usergender STRING, + `group` STRING, + city STRING, + postcode STRING, + last_purchase_price FLOAT, + last_purchase_date BIGINT, + category STRING, + rating INT, + fulfilled_date BIGINT, + PRIMARY KEY (id, sku) +) +PARTITION BY HASH (id) PARTITIONS 4 +RANGE (sku) +( + PARTITION VALUES < 'g' + PARTITION 'g' <= VALUES < 'o' + HASH (id) PARTITIONS 6 + PARTITION 'o' <= VALUES < 'u' + HASH (id) PARTITIONS 8 + PARTITION 'u' <= VALUES +) +STORED AS KUDU; +---- + +This example uses the range-specific hash schema feature for the middle two +ranges. The table-wide hash schema has 4 buckets while the hash schemas +for the middle two ranges have 6 and 8 buckets respectively. This can be done +in cases where we expect a higher workload in such ranges. + .Non-Covering Range Partitions Kudu 1.0 and higher supports the use of non-covering range partitions, which address scenarios like the following: diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc index 95d4d251c..906682b86 100644 --- a/docs/schema_design.adoc +++ b/docs/schema_design.adoc @@ -435,6 +435,22 @@ NOTE: see the <<hash-range-partitioning-example>> and the <<hash-hash-partitioning-example>> for further discussion of multilevel partitioning. +[[flexible-partitioning]] +=== Flexible Partitioning + +As of 1.17, Kudu supports range-specific hash schema for tables. It's now +possible to add ranges with their own unique hash schema independent of the +table-wide hash schema. This can be done while creating or altering the table. +This feature helps mitigate potential hotspotting as more buckets can be +added for a hash schema of a range that expects more workload. + +[[same-number-of-hash-levels]] +[IMPORTANT] +.Same Number of Hash Levels +The number of hash partition levels must be the same across for all the ranges +in a table. See <<multilevel-partitioning>> for more details on hash partition +levels. + [[partition-pruning]] === Partition Pruning