This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new f16d73bc64e add best practice for fact table (#3441)
f16d73bc64e is described below

commit f16d73bc64e77f2d47298ac1c76a615198ac37f1
Author: lsy3993 <[email protected]>
AuthorDate: Tue Mar 10 09:20:13 2026 +0800

    add best practice for fact table (#3441)
    
    ## Versions
    
    - [x] dev
    - [x] 4.x
    - [x] 3.x
    - [x] 2.1
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/table-design/best-practice.md                                      | 2 +-
 .../current/table-design/best-practice.md                               | 2 +-
 .../version-2.0/table-design/best-practice.md                           | 2 +-
 .../version-2.1/table-design/best-practice.md                           | 2 +-
 .../version-3.x/table-design/best-practice.md                           | 2 +-
 .../version-4.x/table-design/best-practice.md                           | 2 +-
 versioned_docs/version-2.0/table-design/best-practice.md                | 2 +-
 versioned_docs/version-2.1/table-design/best-practice.md                | 2 +-
 versioned_docs/version-3.x/table-design/best-practice.md                | 2 +-
 versioned_docs/version-4.x/table-design/best-practice.md                | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/table-design/best-practice.md 
b/docs/table-design/best-practice.md
index 3a70c2b72ab..5e314e220fb 100644
--- a/docs/table-design/best-practice.md
+++ b/docs/table-design/best-practice.md
@@ -425,7 +425,7 @@ show partitions from tbl_unique_merge_on_write_p;
       2. When the bucketing mode for a table is set to RANDOM, since there is 
no bucketing column, querying the table will scan all buckets in the hit 
partitions instead of querying specific buckets based on the values of the 
bucketing column. This setting is suitable for overall aggregation and analysis 
queries rather than high-concurrency point queries.
       3. If an OLAP table has a random distribution of data, setting the 
`load_to_single_tablet` parameter to true during data ingestion allows each 
task to write to a single tablet. This improves concurrency and throughput 
during large-scale data ingestion. It can also reduce the write amplification 
caused by data ingestion and compaction and ensure cluster stability.
    5. Dimension tables, which grow slowly, can use a single partition and 
apply bucketing based on commonly used query conditions (where the data 
distribution of the bucketing field is relatively even).
-   6. Fact tables.
+   6. Fact tables: We recommend using DATE or DATETIME as the partitioning 
column. In most cases, one partition per day is sufficient. For the bucketing 
strategy, use frequently queried columns with relatively even data distribution 
as the bucketing columns.
 
 
 5. For scenarios where there is a large amount of historical partitioned data 
but the historical data is relatively small, unbalanced, or queried 
infrequently, you can use the following approach to place the data in special 
partitions. You can create historical partitions for historical data of small 
sizes (e.g., yearly partitions, monthly partitions). For example, you can 
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01") 
INTERVAL 1 YEAR`:
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
index 77bb630ad67..b2d9b4ebce2 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
 
     e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
     
-    f. 事实表
+    f. 
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
 
 5.  对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
     
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
index 984c873e0e8..ad274eaaf33 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/best-practice.md
@@ -492,7 +492,7 @@ show partitions from tbl_unique_merge_on_write_p;
 
     e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
     
-    f. 事实表
+    f. 
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
 
 
 5.  对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
index 791a02fa8a3..a530944039c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
 
     e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
     
-    f. 事实表
+    f. 
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
 
 5.  对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
     
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
index 3065592f52c..625bfc6e829 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
 
     e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
     
-    f. 事实表
+    f. 
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
 
 
 5.  对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
index 77bb630ad67..b2d9b4ebce2 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/best-practice.md
@@ -493,7 +493,7 @@ show partitions from tbl_unique_merge_on_write_p;
 
     e. 维度表:缓慢增长的,可以使用单分区,在分桶策略上使用常用查询条件(这个字段数据分布相对均衡)分桶。
     
-    f. 事实表
+    f. 
事实表:推荐使用date/datetime类型作为分区字段,一般每日一个分区即可;分桶策略上使用常用查询条件(这个字段数据分布相对均衡)作为分桶。
 
 5.  对于有大量历史分区数据,但是历史数据比较少,或者不均衡,或者查询概率的情况,使用如下方式将数据放在特殊分区。
     
diff --git a/versioned_docs/version-2.0/table-design/best-practice.md 
b/versioned_docs/version-2.0/table-design/best-practice.md
index f91d0c28b84..772479e3bc8 100644
--- a/versioned_docs/version-2.0/table-design/best-practice.md
+++ b/versioned_docs/version-2.0/table-design/best-practice.md
@@ -421,7 +421,7 @@ show partitions from tbl_unique_merge_on_write_p;
       2. When the bucketing mode for a table is set to RANDOM, since there is 
no bucketing column, querying the table will scan all buckets in the hit 
partitions instead of querying specific buckets based on the values of the 
bucketing column. This setting is suitable for overall aggregation and analysis 
queries rather than high-concurrency point queries.
       3. If an OLAP table has a random distribution of data, setting the 
`load_to_single_tablet` parameter to true during data ingestion allows each 
task to write to a single tablet. This improves concurrency and throughput 
during large-scale data ingestion. It can also reduce the write amplification 
caused by data ingestion and compaction and ensure cluster stability.
    5. Dimension tables, which grow slowly, can use a single partition and 
apply bucketing based on commonly used query conditions (where the data 
distribution of the bucketing field is relatively even).
-   6. Fact tables.
+   6. Fact tables: We recommend using DATE or DATETIME as the partitioning 
column. In most cases, one partition per day is sufficient. For the bucketing 
strategy, use frequently queried columns with relatively even data distribution 
as the bucketing columns.
 
 
 5. For scenarios where there is a large amount of historical partitioned data 
but the historical data is relatively small, unbalanced, or queried 
infrequently, you can use the following approach to place the data in special 
partitions. You can create historical partitions for historical data of small 
sizes (e.g., yearly partitions, monthly partitions). For example, you can 
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01") 
INTERVAL 1 YEAR`:
diff --git a/versioned_docs/version-2.1/table-design/best-practice.md 
b/versioned_docs/version-2.1/table-design/best-practice.md
index 20534f61ba2..4d2b7fa5352 100644
--- a/versioned_docs/version-2.1/table-design/best-practice.md
+++ b/versioned_docs/version-2.1/table-design/best-practice.md
@@ -423,7 +423,7 @@ show partitions from tbl_unique_merge_on_write_p;
       2. When the bucketing mode for a table is set to RANDOM, since there is 
no bucketing column, querying the table will scan all buckets in the hit 
partitions instead of querying specific buckets based on the values of the 
bucketing column. This setting is suitable for overall aggregation and analysis 
queries rather than high-concurrency point queries.
       3. If an OLAP table has a random distribution of data, setting the 
`load_to_single_tablet` parameter to true during data ingestion allows each 
task to write to a single tablet. This improves concurrency and throughput 
during large-scale data ingestion. It can also reduce the write amplification 
caused by data ingestion and compaction and ensure cluster stability.
    5. Dimension tables, which grow slowly, can use a single partition and 
apply bucketing based on commonly used query conditions (where the data 
distribution of the bucketing field is relatively even).
-   6. Fact tables.
+   6. Fact tables: We recommend using DATE or DATETIME as the partitioning 
column. In most cases, one partition per day is sufficient. For the bucketing 
strategy, use frequently queried columns with relatively even data distribution 
as the bucketing columns.
 
 
 5. For scenarios where there is a large amount of historical partitioned data 
but the historical data is relatively small, unbalanced, or queried 
infrequently, you can use the following approach to place the data in special 
partitions. You can create historical partitions for historical data of small 
sizes (e.g., yearly partitions, monthly partitions). For example, you can 
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01") 
INTERVAL 1 YEAR`:
diff --git a/versioned_docs/version-3.x/table-design/best-practice.md 
b/versioned_docs/version-3.x/table-design/best-practice.md
index e15ec9550fc..5f2e67651e9 100644
--- a/versioned_docs/version-3.x/table-design/best-practice.md
+++ b/versioned_docs/version-3.x/table-design/best-practice.md
@@ -423,7 +423,7 @@ show partitions from tbl_unique_merge_on_write_p;
       2. When the bucketing mode for a table is set to RANDOM, since there is 
no bucketing column, querying the table will scan all buckets in the hit 
partitions instead of querying specific buckets based on the values of the 
bucketing column. This setting is suitable for overall aggregation and analysis 
queries rather than high-concurrency point queries.
       3. If an OLAP table has a random distribution of data, setting the 
`load_to_single_tablet` parameter to true during data ingestion allows each 
task to write to a single tablet. This improves concurrency and throughput 
during large-scale data ingestion. It can also reduce the write amplification 
caused by data ingestion and compaction and ensure cluster stability.
    5. Dimension tables, which grow slowly, can use a single partition and 
apply bucketing based on commonly used query conditions (where the data 
distribution of the bucketing field is relatively even).
-   6. Fact tables.
+   6. Fact tables: We recommend using DATE or DATETIME as the partitioning 
column. In most cases, one partition per day is sufficient. For the bucketing 
strategy, use frequently queried columns with relatively even data distribution 
as the bucketing columns.
 
 
 5. For scenarios where there is a large amount of historical partitioned data 
but the historical data is relatively small, unbalanced, or queried 
infrequently, you can use the following approach to place the data in special 
partitions. You can create historical partitions for historical data of small 
sizes (e.g., yearly partitions, monthly partitions). For example, you can 
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01") 
INTERVAL 1 YEAR`:
diff --git a/versioned_docs/version-4.x/table-design/best-practice.md 
b/versioned_docs/version-4.x/table-design/best-practice.md
index 3a70c2b72ab..5e314e220fb 100644
--- a/versioned_docs/version-4.x/table-design/best-practice.md
+++ b/versioned_docs/version-4.x/table-design/best-practice.md
@@ -425,7 +425,7 @@ show partitions from tbl_unique_merge_on_write_p;
       2. When the bucketing mode for a table is set to RANDOM, since there is 
no bucketing column, querying the table will scan all buckets in the hit 
partitions instead of querying specific buckets based on the values of the 
bucketing column. This setting is suitable for overall aggregation and analysis 
queries rather than high-concurrency point queries.
       3. If an OLAP table has a random distribution of data, setting the 
`load_to_single_tablet` parameter to true during data ingestion allows each 
task to write to a single tablet. This improves concurrency and throughput 
during large-scale data ingestion. It can also reduce the write amplification 
caused by data ingestion and compaction and ensure cluster stability.
    5. Dimension tables, which grow slowly, can use a single partition and 
apply bucketing based on commonly used query conditions (where the data 
distribution of the bucketing field is relatively even).
-   6. Fact tables.
+   6. Fact tables: We recommend using DATE or DATETIME as the partitioning 
column. In most cases, one partition per day is sufficient. For the bucketing 
strategy, use frequently queried columns with relatively even data distribution 
as the bucketing columns.
 
 
 5. For scenarios where there is a large amount of historical partitioned data 
but the historical data is relatively small, unbalanced, or queried 
infrequently, you can use the following approach to place the data in special 
partitions. You can create historical partitions for historical data of small 
sizes (e.g., yearly partitions, monthly partitions). For example, you can 
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01") 
INTERVAL 1 YEAR`:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to