This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/master by this push:
     new cf550d6d7 KUDU-2671: Update upstream docs
cf550d6d7 is described below

commit cf550d6d7cdd61f6c65f9ef75a1706cb91839876
Author: Mahesh Reddy <mre...@cloudera.com>
AuthorDate: Tue Mar 5 15:20:33 2024 -0800

    KUDU-2671: Update upstream docs
    
    This patch updates the upstream docs to include range specific
    hash schemas within the partitioning section. An example
    with the proper sql syntax is also included in the kudu impala
    integration doc.
    
    Change-Id: I8da554851a124d1d357be65d8bcc2c6c37875dcc
    Reviewed-on: http://gerrit.cloudera.org:8080/21108
    Tested-by: Kudu Jenkins
    Reviewed-by: Alexey Serbin <ale...@apache.org>
---
 docs/kudu_impala_integration.adoc | 42 +++++++++++++++++++++++++++++++++++++++
 docs/schema_design.adoc           | 16 +++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/docs/kudu_impala_integration.adoc 
b/docs/kudu_impala_integration.adoc
index 0def0477c..de01c3d59 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -485,6 +485,48 @@ The example creates 16 partitions. You could also use 
`HASH (id, sku) PARTITIONS
 However, a scan for `sku` values would almost always impact all 16 partitions, 
rather
 than possibly being limited to 4.
 
+.Range-Specific Hash Schemas
+As of 1.17, Kudu supports range-specific hash schemas for tables. It's 
possible to
+add ranges with a hash schema independent of the table-wide hash schema. This 
can be
+done while creating or altering the table. The number of hash partition levels 
must
+be the same across all ranges in a table.
+
+[source, sql]
+----
+CREATE TABLE cust_behavior (
+  id BIGINT,
+  sku STRING,
+  salary STRING,
+  edu_level INT,
+  usergender STRING,
+  `group` STRING,
+  city STRING,
+  postcode STRING,
+  last_purchase_price FLOAT,
+  last_purchase_date BIGINT,
+  category STRING,
+  rating INT,
+  fulfilled_date BIGINT,
+  PRIMARY KEY (id, sku)
+)
+PARTITION BY HASH (id) PARTITIONS 4
+RANGE (sku)
+(
+  PARTITION VALUES < 'g'
+  PARTITION 'g' <= VALUES < 'o'
+      HASH (id) PARTITIONS 6
+  PARTITION 'o' <= VALUES < 'u'
+      HASH (id) PARTITIONS 8
+  PARTITION 'u' <= VALUES
+)
+STORED AS KUDU;
+----
+
+This example uses the range-specific hash schema feature for the middle two
+ranges. The table-wide hash schema has 4 buckets while the hash schemas
+for the middle two ranges have 6 and 8 buckets respectively. This can be done
+in cases where we expect a higher workload in such ranges.
+
 .Non-Covering Range Partitions
 Kudu 1.0 and higher supports the use of non-covering range partitions,
 which address scenarios like the following:
diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc
index 95d4d251c..906682b86 100644
--- a/docs/schema_design.adoc
+++ b/docs/schema_design.adoc
@@ -435,6 +435,22 @@ NOTE: see the <<hash-range-partitioning-example>> and the
 <<hash-hash-partitioning-example>> for further discussion of multilevel
 partitioning.
 
+[[flexible-partitioning]]
+=== Flexible Partitioning
+
+As of 1.17, Kudu supports range-specific hash schema for tables. It's now
+possible to add ranges with their own unique hash schema independent of the
+table-wide hash schema. This can be done while creating or altering the table.
+This feature helps mitigate potential hotspotting as more buckets can be
+added for a hash schema of a range that expects more workload.
+
+[[same-number-of-hash-levels]]
+[IMPORTANT]
+.Same Number of Hash Levels
+The number of hash partition levels must be the same across for all the ranges
+in a table. See <<multilevel-partitioning>> for more details on hash partition
+levels.
+
 [[partition-pruning]]
 === Partition Pruning
 

Reply via email to