This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 46f1e31c1ed [fix](sql-functions) provide setup data for BITMAP_HASH /
BITMAP_UNION examples (#3900)
46f1e31c1ed is described below
commit 46f1e31c1ed41c90cbefbd9c4839e0d6ba41d58a
Author: boluor <[email protected]>
AuthorDate: Tue Jun 9 20:53:57 2026 -0700
[fix](sql-functions) provide setup data for BITMAP_HASH / BITMAP_UNION
examples (#3900)
Both bitmap example pages query tables that the older copies never
define, so a reader who runs the examples hits `table does not exist`.
- **bitmap-hash** (dev + 3.x + 2.1, EN+ZH): these copies still showed
the old `words` example with an **unreproducible** expected count
(`33263478`) and no table behind it. Port the **version-4.x** rewrite —
a concrete `words` table (6 rows, 4 distinct) with the matching
reproducible result (`4`), plus the note that a real-scale corpus
returns far larger numbers. (4.x is unchanged.)
- **bitmap-union** (3.x + 2.1, EN+ZH): the example reads an
aggregate-model `pv_bitmap` (the page documents that table near the
bottom) but never creates it. Add a runnable `-- setup` that builds the
aggregate table (`user_id BITMAP BITMAP_UNION`, `AGGREGATE
KEY(dt,page)`) and loads the `to_bitmap` rows, so the dt/page result
**and** the dedup count (`3`) reproduce. An aggregate model is required
— a duplicate-model table would not collapse to the two rows the doc
prints. (dev/4.x already carry setup.)
No rendered prose, expected output, or `ja-source/` is altered beyond
replacing the unreproducible bitmap-hash count with the reproducible 4.x
value.
**Verification** — every touched example executed end-to-end on fresh
single-BE clusters, reproducing the doc's printed output cell-for-cell:
| version | cluster | bitmap-hash | bitmap-union |
|---|---|---|---|
| dev | master daily (doris-0.0.0-2e72603618c) | P5 F0 (EN+ZH) |
unchanged (control: P4 F0) |
| 3.x | 3.1.4-rc02 | P5 F0 (EN+ZH) | P3 F0 (EN+ZH) |
| 2.1 | 2.1.11-rc01 | P5 F0 (EN+ZH) | P3 F0 (EN+ZH) |
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>
---
.../bitmap-functions/bitmap-hash.md | 12 ++++++---
.../bitmap-functions/bitmap-hash.md | 12 ++++++---
.../aggregate-functions/bitmap-union.md | 16 ++++++++++++
.../bitmap-functions/bitmap-hash.md | 29 ++++++++++++++++++----
.../aggregate-functions/bitmap-union.md | 16 ++++++++++++
.../bitmap-functions/bitmap-hash.md | 29 ++++++++++++++++++----
.../aggregate-functions/bitmap-union.md | 16 ++++++++++++
.../bitmap-functions/bitmap-hash.md | 26 ++++++++++++++++---
.../aggregate-functions/bitmap-union.md | 16 ++++++++++++
.../bitmap-functions/bitmap-hash.md | 26 ++++++++++++++++---
10 files changed, 172 insertions(+), 26 deletions(-)
diff --git
a/docs/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
b/docs/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
index 182ba7f0658..fd6f371d404 100644
---
a/docs/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
+++
b/docs/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
@@ -53,19 +53,23 @@ The result will be:
+-------------------------------------------------------------+
```
-To count the distinct values in a column using bitmaps, which can be more
efficient than `count distinct` in some scenarios:
+To count the distinct values in a column using bitmaps, which can be more
efficient than `count distinct` in some scenarios. The example below populates
a `words` table with 6 rows containing 4 distinct values; at the doc-corpus
scale a real query of this shape can return numbers in the millions:
```sql
+CREATE TABLE `words` (`word` VARCHAR(64))
+DISTRIBUTED BY HASH(`word`) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO `words` VALUES ('apple'), ('banana'), ('cherry'), ('apple'),
('date'), ('banana');
+
select bitmap_count(bitmap_union(bitmap_hash(`word`))) from `words`;
```
-The result will be:
-
```text
+-------------------------------------------------+
| bitmap_count(bitmap_union(bitmap_hash(`word`))) |
+-------------------------------------------------+
-| 33263478 |
+| 4 |
+-------------------------------------------------+
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
index 58f13acdf75..9768c4a8dad 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
@@ -53,19 +53,23 @@ select bitmap_to_array(bitmap_hash('hello'))[1];
+-------------------------------------------------------------+
```
-如果你想统计某一列去重后的个数,可以使用位图的方式,某些场景下性能比 `count distinct` 好很多:
+如果你想统计某一列去重后的个数,可以使用位图的方式,某些场景下性能比 `count distinct` 好很多。下面的示例先建一张 `words` 表,插入
6 行(其中 4 个去重值);在真实大数据集上同样形态的查询可以返回到百万级别:
```sql
+CREATE TABLE `words` (`word` VARCHAR(64))
+DISTRIBUTED BY HASH(`word`) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO `words` VALUES ('apple'), ('banana'), ('cherry'), ('apple'),
('date'), ('banana');
+
select bitmap_count(bitmap_union(bitmap_hash(`word`))) from `words`;
```
-结果如下:
-
```text
+-------------------------------------------------+
| bitmap_count(bitmap_union(bitmap_hash(`word`))) |
+-------------------------------------------------+
-| 33263478 |
+| 4 |
+-------------------------------------------------+
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
index e8dbbef13d7..bccb102a30a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
@@ -28,6 +28,22 @@ BITMAP_UNION(<expr>)
## 举例
+```sql
+-- setup
+CREATE TABLE pv_bitmap (
+ dt INT,
+ page VARCHAR(10),
+ user_id BITMAP BITMAP_UNION
+) AGGREGATE KEY(dt, page)
+DISTRIBUTED BY HASH(dt) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+INSERT INTO pv_bitmap VALUES
+ (1, '100', to_bitmap(100)),
+ (1, '100', to_bitmap(200)),
+ (1, '100', to_bitmap(300)),
+ (2, '200', to_bitmap(300));
+```
+
```sql
select dt,page,bitmap_to_string(user_id) from pv_bitmap;
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
index 26a6c03490a..8d8ac260d0e 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
@@ -38,7 +38,7 @@ MurMur3 算法是一种高性能的、低碰撞率的散列算法,其计算出
如果你想计算某个值的 MurMur3,你可以:
-```
+```sql
select bitmap_to_array(bitmap_hash('hello'))[1];
```
@@ -52,18 +52,37 @@ select bitmap_to_array(bitmap_hash('hello'))[1];
+-------------------------------------------------------------+
```
-如果你想统计某一列去重后的个数,可以使用位图的方式,某些场景下性能比 `count distinct` 好很多:
+如果你想统计某一列去重后的个数,可以使用位图的方式,某些场景下性能比 `count distinct` 好很多。下面的示例先建一张 `words` 表,插入
6 行(其中 4 个去重值);在真实大数据集上同样形态的查询可以返回到百万级别:
```sql
+CREATE TABLE `words` (`word` VARCHAR(64))
+DISTRIBUTED BY HASH(`word`) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO `words` VALUES ('apple'), ('banana'), ('cherry'), ('apple'),
('date'), ('banana');
+
select bitmap_count(bitmap_union(bitmap_hash(`word`))) from `words`;
```
-结果如下:
-
```text
+-------------------------------------------------+
| bitmap_count(bitmap_union(bitmap_hash(`word`))) |
+-------------------------------------------------+
-| 33263478 |
+| 4 |
+-------------------------------------------------+
```
+
+
+```sql
+select bitmap_to_string(bitmap_hash(NULL)) AS res;
+```
+
+结果如下:
+
+```text
++------+
+| res |
++------+
+| |
++------+
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
index e8dbbef13d7..bccb102a30a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
@@ -28,6 +28,22 @@ BITMAP_UNION(<expr>)
## 举例
+```sql
+-- setup
+CREATE TABLE pv_bitmap (
+ dt INT,
+ page VARCHAR(10),
+ user_id BITMAP BITMAP_UNION
+) AGGREGATE KEY(dt, page)
+DISTRIBUTED BY HASH(dt) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+INSERT INTO pv_bitmap VALUES
+ (1, '100', to_bitmap(100)),
+ (1, '100', to_bitmap(200)),
+ (1, '100', to_bitmap(300)),
+ (2, '200', to_bitmap(300));
+```
+
```sql
select dt,page,bitmap_to_string(user_id) from pv_bitmap;
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
index e0b87b91211..178c8383c28 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
@@ -38,7 +38,7 @@ MurMur3 算法是一种高性能的、低碰撞率的散列算法,其计算出
如果你想计算某个值的 MurMur3,你可以:
-```
+```sql
select bitmap_to_array(bitmap_hash('hello'))[1];
```
@@ -52,18 +52,37 @@ select bitmap_to_array(bitmap_hash('hello'))[1];
+-------------------------------------------------------------+
```
-如果你想统计某一列去重后的个数,可以使用位图的方式,某些场景下性能比 `count distinct` 好很多:
+如果你想统计某一列去重后的个数,可以使用位图的方式,某些场景下性能比 `count distinct` 好很多。下面的示例先建一张 `words` 表,插入
6 行(其中 4 个去重值);在真实大数据集上同样形态的查询可以返回到百万级别:
```sql
+CREATE TABLE `words` (`word` VARCHAR(64))
+DISTRIBUTED BY HASH(`word`) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO `words` VALUES ('apple'), ('banana'), ('cherry'), ('apple'),
('date'), ('banana');
+
select bitmap_count(bitmap_union(bitmap_hash(`word`))) from `words`;
```
-结果如下:
-
```text
+-------------------------------------------------+
| bitmap_count(bitmap_union(bitmap_hash(`word`))) |
+-------------------------------------------------+
-| 33263478 |
+| 4 |
+-------------------------------------------------+
```
+
+
+```sql
+select bitmap_to_string(bitmap_hash(NULL)) AS res;
+```
+
+结果如下:
+
+```text
++------+
+| res |
++------+
+| |
++------+
+```
diff --git
a/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
b/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
index 0da3707f478..31fc1e853f0 100644
---
a/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
+++
b/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
@@ -28,6 +28,22 @@ The data type of the return value is BITMAP.
## Example
+```sql
+-- setup
+CREATE TABLE pv_bitmap (
+ dt INT,
+ page VARCHAR(10),
+ user_id BITMAP BITMAP_UNION
+) AGGREGATE KEY(dt, page)
+DISTRIBUTED BY HASH(dt) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+INSERT INTO pv_bitmap VALUES
+ (1, '100', to_bitmap(100)),
+ (1, '100', to_bitmap(200)),
+ (1, '100', to_bitmap(300)),
+ (2, '200', to_bitmap(300));
+```
+
```sql
select dt,page,bitmap_to_string(user_id) from pv_bitmap;
```
diff --git
a/versioned_docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
b/versioned_docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
index 0f021c11ded..cbc1d406d15 100644
---
a/versioned_docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
+++
b/versioned_docs/version-2.1/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
@@ -52,18 +52,36 @@ The result will be:
+-------------------------------------------------------------+
```
-To count the distinct values in a column using bitmaps, which can be more
efficient than `count distinct` in some scenarios:
+To count the distinct values in a column using bitmaps, which can be more
efficient than `count distinct` in some scenarios. The example below populates
a `words` table with 6 rows containing 4 distinct values; at the doc-corpus
scale a real query of this shape can return numbers in the millions:
```sql
+CREATE TABLE `words` (`word` VARCHAR(64))
+DISTRIBUTED BY HASH(`word`) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO `words` VALUES ('apple'), ('banana'), ('cherry'), ('apple'),
('date'), ('banana');
+
select bitmap_count(bitmap_union(bitmap_hash(`word`))) from `words`;
```
-The result will be:
-
```text
+-------------------------------------------------+
| bitmap_count(bitmap_union(bitmap_hash(`word`))) |
+-------------------------------------------------+
-| 33263478 |
+| 4 |
+-------------------------------------------------+
```
+
+```sql
+select bitmap_to_string(bitmap_hash(NULL)) AS res;
+```
+
+The result will be:
+
+```text
++------+
+| res |
++------+
+| |
++------+
+```
diff --git
a/versioned_docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
b/versioned_docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
index 0da3707f478..31fc1e853f0 100644
---
a/versioned_docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
+++
b/versioned_docs/version-3.x/sql-manual/sql-functions/aggregate-functions/bitmap-union.md
@@ -28,6 +28,22 @@ The data type of the return value is BITMAP.
## Example
+```sql
+-- setup
+CREATE TABLE pv_bitmap (
+ dt INT,
+ page VARCHAR(10),
+ user_id BITMAP BITMAP_UNION
+) AGGREGATE KEY(dt, page)
+DISTRIBUTED BY HASH(dt) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+INSERT INTO pv_bitmap VALUES
+ (1, '100', to_bitmap(100)),
+ (1, '100', to_bitmap(200)),
+ (1, '100', to_bitmap(300)),
+ (2, '200', to_bitmap(300));
+```
+
```sql
select dt,page,bitmap_to_string(user_id) from pv_bitmap;
```
diff --git
a/versioned_docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
b/versioned_docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
index 0f021c11ded..cbc1d406d15 100644
---
a/versioned_docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
+++
b/versioned_docs/version-3.x/sql-manual/sql-functions/scalar-functions/bitmap-functions/bitmap-hash.md
@@ -52,18 +52,36 @@ The result will be:
+-------------------------------------------------------------+
```
-To count the distinct values in a column using bitmaps, which can be more
efficient than `count distinct` in some scenarios:
+To count the distinct values in a column using bitmaps, which can be more
efficient than `count distinct` in some scenarios. The example below populates
a `words` table with 6 rows containing 4 distinct values; at the doc-corpus
scale a real query of this shape can return numbers in the millions:
```sql
+CREATE TABLE `words` (`word` VARCHAR(64))
+DISTRIBUTED BY HASH(`word`) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO `words` VALUES ('apple'), ('banana'), ('cherry'), ('apple'),
('date'), ('banana');
+
select bitmap_count(bitmap_union(bitmap_hash(`word`))) from `words`;
```
-The result will be:
-
```text
+-------------------------------------------------+
| bitmap_count(bitmap_union(bitmap_hash(`word`))) |
+-------------------------------------------------+
-| 33263478 |
+| 4 |
+-------------------------------------------------+
```
+
+```sql
+select bitmap_to_string(bitmap_hash(NULL)) AS res;
+```
+
+The result will be:
+
+```text
++------+
+| res |
++------+
+| |
++------+
+```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]