This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new f16d73bc64e add best practice for fact table (#3441)
f16d73bc64e is described below
commit f16d73bc64e77f2d47298ac1c76a615198ac37f1
Author: lsy3993 <[email protected]>
AuthorDate: Tue Mar 10 09:20:13 2026 +0800
add best practice for fact table (#3441)
## Versions
- [x] dev
- [x] 4.x
- [x] 3.x
- [x] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/table-design/best-practice.md | 2 +-
.../current/table-design/best-practice.md | 2 +-
.../version-2.0/table-design/best-practice.md | 2 +-
.../version-2.1/table-design/best-practice.md | 2 +-
.../version-3.x/table-design/best-practice.md | 2 +-
.../version-4.x/table-design/best-practice.md | 2 +-
versioned_docs/version-2.0/table-design/best-practice.md | 2 +-
versioned_docs/version-2.1/table-design/best-practice.md | 2 +-
versioned_docs/version-3.x/table-design/best-practice.md | 2 +-
versioned_docs/version-4.x/table-design/best-practice.md | 2 +-
10 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/docs/table-design/best-practice.md
b/docs/table-design/best-practice.md
index 3a70c2b72ab..5e314e220fb 100644
--- a/docs/table-design/best-practice.md
+++ b/docs/table-design/best-practice.md
@@ -425,7 +425,7 @@ show partitions from tbl_unique_merge_on_write_p;
2. When the bucketing mode for a table is set to RANDOM, since there is
no bucketing column, querying the table will scan all buckets in the hit
partitions instead of querying specific buckets based on the values of the
bucketing column. This setting is suitable for overall aggregation and analysis
queries rather than high-concurrency point queries.
3. If an OLAP table has a random distribution of data, setting the
`load_to_single_tablet` parameter to true during data ingestion allows each
task to write to a single tablet. This improves concurrency and throughput
during large-scale data ingestion. It can also reduce the write amplification
caused by data ingestion and compaction and ensure cluster stability.
5. Dimension tables, which grow slowly, can use a single partition and
apply bucketing based on commonly used query conditions (where the data
distribution of the bucketing field is relatively even).
- 6. Fact tables.
+ 6. Fact tables: We recommend using DATE or DATETIME as the partitioning
column. In most cases, one partition per day is sufficient. For the bucketing
strategy, use frequently queried columns with relatively even data distribution
as the bucketing columns.
5. For scenarios where there is a large amount of historical partitioned data
but the historical data is relatively small, unbalanced, or queried
infrequently, you can use the following approach to place the data in special
partitions. You can create historical partitions for historical data of small
sizes (e.g., yearly partitions, monthly partitions). For example, you can
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01")
INTERVAL 1 YEAR`:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
index 77bb630ad67..b2d9b4ebce2 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
- f. 事实表
+ f.
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
5. 对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
index 984c873e0e8..ad274eaaf33 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
@@ -492,7 +492,7 @@ show partitions from tbl_unique_merge_on_write_p;
e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
- f. 事实表
+ f.
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
5. 对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
index 791a02fa8a3..a530944039c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
- f. 事实表
+ f.
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
5. 对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
index 3065592f52c..625bfc6e829 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
- f. 事实表
+ f.
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
5. 对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
index 77bb630ad67..b2d9b4ebce2 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
- f. 事实表
+ f.
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
5. 对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git a/versioned_docs/version-2.0/table-design/best-practice.md
b/versioned_docs/version-2.0/table-design/best-practice.md
index f91d0c28b84..772479e3bc8 100644
--- a/versioned_docs/version-2.0/table-design/best-practice.md
+++ b/versioned_docs/version-2.0/table-design/best-practice.md
@@ -421,7 +421,7 @@ show partitions from tbl_unique_merge_on_write_p;
2. When the bucketing mode for a table is set to RANDOM, since there is
no bucketing column, querying the table will scan all buckets in the hit
partitions instead of querying specific buckets based on the values of the
bucketing column. This setting is suitable for overall aggregation and analysis
queries rather than high-concurrency point queries.
3. If an OLAP table has a random distribution of data, setting the
`load_to_single_tablet` parameter to true during data ingestion allows each
task to write to a single tablet. This improves concurrency and throughput
during large-scale data ingestion. It can also reduce the write amplification
caused by data ingestion and compaction and ensure cluster stability.
5. Dimension tables, which grow slowly, can use a single partition and
apply bucketing based on commonly used query conditions (where the data
distribution of the bucketing field is relatively even).
- 6. Fact tables.
+ 6. Fact tables: We recommend using DATE or DATETIME as the partitioning
column. In most cases, one partition per day is sufficient. For the bucketing
strategy, use frequently queried columns with relatively even data distribution
as the bucketing columns.
5. For scenarios where there is a large amount of historical partitioned data
but the historical data is relatively small, unbalanced, or queried
infrequently, you can use the following approach to place the data in special
partitions. You can create historical partitions for historical data of small
sizes (e.g., yearly partitions, monthly partitions). For example, you can
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01")
INTERVAL 1 YEAR`:
diff --git a/versioned_docs/version-2.1/table-design/best-practice.md
b/versioned_docs/version-2.1/table-design/best-practice.md
index 20534f61ba2..4d2b7fa5352 100644
--- a/versioned_docs/version-2.1/table-design/best-practice.md
+++ b/versioned_docs/version-2.1/table-design/best-practice.md
@@ -423,7 +423,7 @@ show partitions from tbl_unique_merge_on_write_p;
2. When the bucketing mode for a table is set to RANDOM, since there is
no bucketing column, querying the table will scan all buckets in the hit
partitions instead of querying specific buckets based on the values of the
bucketing column. This setting is suitable for overall aggregation and analysis
queries rather than high-concurrency point queries.
3. If an OLAP table has a random distribution of data, setting the
`load_to_single_tablet` parameter to true during data ingestion allows each
task to write to a single tablet. This improves concurrency and throughput
during large-scale data ingestion. It can also reduce the write amplification
caused by data ingestion and compaction and ensure cluster stability.
5. Dimension tables, which grow slowly, can use a single partition and
apply bucketing based on commonly used query conditions (where the data
distribution of the bucketing field is relatively even).
- 6. Fact tables.
+ 6. Fact tables: We recommend using DATE or DATETIME as the partitioning
column. In most cases, one partition per day is sufficient. For the bucketing
strategy, use frequently queried columns with relatively even data distribution
as the bucketing columns.
5. For scenarios where there is a large amount of historical partitioned data
but the historical data is relatively small, unbalanced, or queried
infrequently, you can use the following approach to place the data in special
partitions. You can create historical partitions for historical data of small
sizes (e.g., yearly partitions, monthly partitions). For example, you can
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01")
INTERVAL 1 YEAR`:
diff --git a/versioned_docs/version-3.x/table-design/best-practice.md
b/versioned_docs/version-3.x/table-design/best-practice.md
index e15ec9550fc..5f2e67651e9 100644
--- a/versioned_docs/version-3.x/table-design/best-practice.md
+++ b/versioned_docs/version-3.x/table-design/best-practice.md
@@ -423,7 +423,7 @@ show partitions from tbl_unique_merge_on_write_p;
2. When the bucketing mode for a table is set to RANDOM, since there is
no bucketing column, querying the table will scan all buckets in the hit
partitions instead of querying specific buckets based on the values of the
bucketing column. This setting is suitable for overall aggregation and analysis
queries rather than high-concurrency point queries.
3. If an OLAP table has a random distribution of data, setting the
`load_to_single_tablet` parameter to true during data ingestion allows each
task to write to a single tablet. This improves concurrency and throughput
during large-scale data ingestion. It can also reduce the write amplification
caused by data ingestion and compaction and ensure cluster stability.
5. Dimension tables, which grow slowly, can use a single partition and
apply bucketing based on commonly used query conditions (where the data
distribution of the bucketing field is relatively even).
- 6. Fact tables.
+ 6. Fact tables: We recommend using DATE or DATETIME as the partitioning
column. In most cases, one partition per day is sufficient. For the bucketing
strategy, use frequently queried columns with relatively even data distribution
as the bucketing columns.
5. For scenarios where there is a large amount of historical partitioned data
but the historical data is relatively small, unbalanced, or queried
infrequently, you can use the following approach to place the data in special
partitions. You can create historical partitions for historical data of small
sizes (e.g., yearly partitions, monthly partitions). For example, you can
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01")
INTERVAL 1 YEAR`:
diff --git a/versioned_docs/version-4.x/table-design/best-practice.md
b/versioned_docs/version-4.x/table-design/best-practice.md
index 3a70c2b72ab..5e314e220fb 100644
--- a/versioned_docs/version-4.x/table-design/best-practice.md
+++ b/versioned_docs/version-4.x/table-design/best-practice.md
@@ -425,7 +425,7 @@ show partitions from tbl_unique_merge_on_write_p;
2. When the bucketing mode for a table is set to RANDOM, since there is
no bucketing column, querying the table will scan all buckets in the hit
partitions instead of querying specific buckets based on the values of the
bucketing column. This setting is suitable for overall aggregation and analysis
queries rather than high-concurrency point queries.
3. If an OLAP table has a random distribution of data, setting the
`load_to_single_tablet` parameter to true during data ingestion allows each
task to write to a single tablet. This improves concurrency and throughput
during large-scale data ingestion. It can also reduce the write amplification
caused by data ingestion and compaction and ensure cluster stability.
5. Dimension tables, which grow slowly, can use a single partition and
apply bucketing based on commonly used query conditions (where the data
distribution of the bucketing field is relatively even).
- 6. Fact tables.
+ 6. Fact tables: We recommend using DATE or DATETIME as the partitioning
column. In most cases, one partition per day is sufficient. For the bucketing
strategy, use frequently queried columns with relatively even data distribution
as the bucketing columns.
5. For scenarios where there is a large amount of historical partitioned data
but the historical data is relatively small, unbalanced, or queried
infrequently, you can use the following approach to place the data in special
partitions. You can create historical partitions for historical data of small
sizes (e.g., yearly partitions, monthly partitions). For example, you can
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01")
INTERVAL 1 YEAR`:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]