This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new ad5e34ab9c [Doc](statistics) supplement stats doc (regression test and 
automatic collection) (#20071)
ad5e34ab9c is described below

commit ad5e34ab9c7651f70af8d6227e973e2719da5888
Author: ElvinWei <[email protected]>
AuthorDate: Sat Jun 3 17:25:33 2023 +0800

    [Doc](statistics) supplement stats doc (regression test and automatic 
collection) (#20071)
---
 docs/en/docs/query-acceleration/statistics.md      | 47 +++++++++++++++++++++-
 docs/zh-CN/docs/query-acceleration/statistics.md   | 45 ++++++++++++++++++++-
 .../java/org/apache/doris/statistics/README.md     | 35 ++++++++++++++++
 3 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/docs/en/docs/query-acceleration/statistics.md 
b/docs/en/docs/query-acceleration/statistics.md
index c9106d9e75..e769177753 100644
--- a/docs/en/docs/query-acceleration/statistics.md
+++ b/docs/en/docs/query-acceleration/statistics.md
@@ -403,7 +403,52 @@ mysql> ANALYZE TABLE stats_test.example_tbl UPDATE 
HISTOGRAM WITH PERIOD 86400;
 
 #### Automatic collection
 
-To be added.
+Statistics can be "invalidated" when tables are changed, which can cause the 
optimizer to select the wrong execution plan.
+
+Table statistics may become invalid due to the following causes:
+
+- New field: The new field has no statistics
+- Field change: Original statistics are unavailable
+- Added zone: The new zone has no statistics
+- Zone change: The original statistics are invalid
+- data changes (insert data delete data | | change data) : the statistical 
information is error
+
+The main operations involved include:
+
+- update: updates the data
+- delete: deletes data
+- drop: deletes a partition
+- load: import data and add partitions
+- insert: inserts data and adds partitions
+- alter: Field change, partition change, or new partition
+
+Database, table, partition, field deletion, internal will automatically clear 
these invalid statistics. Adjusting the column order and changing the column 
type do not affect.
+
+The system determines whether to collect statistics again based on the health 
of the table (as defined above). By setting the health threshold, the system 
collects statistics about the table again when the health is lower than a 
certain value. To put it simply, if statistics are collected on a table and the 
data of a partition becomes more or less, or a partition is added or deleted, 
the statistics may be automatically collected. After the statistics are 
collected again, the statistics a [...]
+
+Currently, only tables that are configured by the user to automatically 
collect statistics will be collected, and statistics will not be automatically 
collected for other tables.
+
+Example:
+
+- Automatically analysis statistics for the 'example_tbl' table using the 
following syntax:
+
+```SQL
+-- use with auto
+mysql> ANALYZE TABLE stats_test.example_tbl WITH AUTO;
++--------+
+| job_id |
++--------+
+| 52539  |
++--------+
+
+-- configure automatic
+mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("automatic" = "true");
++--------+
+| job_id |
++--------+
+| 52565  |
++--------+
+```
 
 ### Manage job
 
diff --git a/docs/zh-CN/docs/query-acceleration/statistics.md 
b/docs/zh-CN/docs/query-acceleration/statistics.md
index c0091507b7..a232d87eb0 100644
--- a/docs/zh-CN/docs/query-acceleration/statistics.md
+++ b/docs/zh-CN/docs/query-acceleration/statistics.md
@@ -434,7 +434,50 @@ mysql> ANALYZE TABLE stats_test.example_tbl UPDATE 
HISTOGRAM WITH PERIOD 86400;
 
 #### 自动收集
 
-待补充。
+表发生变更时可能会导致统计信息“失效”,可能会导致优化器选择错误的执行计划。
+
+导致表统计信息失效的原因包括:
+
+- 新增字段:新增字段无统计信息
+- 字段变更:原有统计信息不可用
+- 新增分区:新增分区无统计信息
+- 分区变更:原有统计信息失效
+- 数据变更(插入数据 | 删除数据 | 更改数据):统计信息有误差
+
+主要涉及的操作包括:
+
+- update:更新数据
+- delete:删除数据
+- drop:删除分区
+- load:导入数据、新增分区
+- insert:插入数据、新增分区
+- alter:字段变更、分区变更、新增分区
+
+其中库、表、分区、字段删除,内部会自动清除这些无效的统计信息。调整列顺序以及修改列类型不影响。
+
+系统根据表的健康度(参考上文定义)来决定是否需要重新收集统计信息。我们通过设置健康度阈值,当健康度低于某个值时系统将重新收集表对应的统计信息。简单来讲就是对于收集过统计信息的表,如果某一个分区数据变多/变少、或者新增/删除分区,都有可能触发统计信息的自动收集,重新收集后更新表的统计信息和健康度。目前只会收集用户配置了自动收集统计信息的表,其他表不会自动收集统计信息。
+
+示例:
+
+- 自动收集 `example_tbl` 表的统计信息,使用以下语法:
+
+```SQL
+-- 使用with auto
+mysql> ANALYZE TABLE stats_test.example_tbl WITH AUTO;
++--------+
+| job_id |
++--------+
+| 52539  |
++--------+
+
+-- 配置automatic
+mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("automatic" = "true");
++--------+
+| job_id |
++--------+
+| 52565  |
++--------+
+```
 
 ### 管理任务
 
diff --git a/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md 
b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
index e3a577528a..ef9340b281 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
+++ b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
@@ -116,6 +116,41 @@ end
 
 # Test
 
+The regression tests now mainly cover the following. 
+
+- Analyze stats: mainly to verify the `ANALYZE` statement and its related 
characteristics, because some functions are affected by other factors (such as 
system metadata reporting time), may show instability, so this part is placed 
in p1.
+- Manage stats: mainly used to verify the injection, deletion, display and 
other related operations of statistical information.
+
+For more, see 
[statistics_p0](https://github.com/apache/doris/tree/master/regression-test/suites/statistics)
 
[statistics_p1](https://github.com/apache/doris/tree/master/regression-test/suites/statistics_p1)
+
+## Analyze stats
+
+p0 tests:
+
+1. Universal analysis
+
+p1 tests:
+
+1. Universal analysis
+2. Sampled analysis 
+3. Incremental analysis 
+4. Automatic analysis 
+5. Periodic analysis
+
+## Manage stats
+
+p0 tests:
+
+1. Alter table stats 
+2. Show table stats 
+3. Alter column stats 
+4. Show column stats 
+5. Show column histogram 
+6. Drop column stats 
+7. Drop expired stats
+
+For the modification of the statistics module, all the above cases should be 
guaranteed to pass!
+
 # Feature note
 
 20230508:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to