This is an automated email from the ASF dual-hosted git repository.
zhangchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 1816a0eeaa [doc](data update)Address comment on update of agg model
and translate en doc by LLM (#1721)
1816a0eeaa is described below
commit 1816a0eeaa066f8655ac80d5fb998193fd7e82af
Author: zhannngchen <[email protected]>
AuthorDate: Tue Jan 14 14:08:07 2025 +0800
[doc](data update)Address comment on update of agg model and translate en
doc by LLM (#1721)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [x] Checked by AI
- [x] Test Cases Built
---
.../update/update-of-aggregate-model.md | 50 +++++++++-------------
.../update/update-of-aggregate-model.md | 10 +----
.../update/update-of-aggregate-model.md | 10 +----
.../update/update-of-aggregate-model.md | 10 +----
.../update/update-of-aggregate-model.md | 50 +++++++++-------------
.../update/update-of-aggregate-model.md | 50 +++++++++-------------
6 files changed, 69 insertions(+), 111 deletions(-)
diff --git a/docs/data-operate/update/update-of-aggregate-model.md
b/docs/data-operate/update/update-of-aggregate-model.md
index 4b5ef90675..3fde759a85 100644
--- a/docs/data-operate/update/update-of-aggregate-model.md
+++ b/docs/data-operate/update/update-of-aggregate-model.md
@@ -1,7 +1,7 @@
----
+-
{
- "title": "Updating Data on Aggregate Key Model",
- "language": "en"
+ "title": "Updating Data on Aggregate Key Model",
+ "language": "en"
}
---
@@ -24,23 +24,21 @@ specific language governing permissions and limitations
under the License.
-->
+This document primarily introduces how to update the Doris Aggregate model
based on data load.
+## Whole Row Update
-This guide is about ingestion-based data updates for the Aggregate Key model
in Doris.
-
-## Update all columns
-
-When importing data into an Aggregate Key model in Doris by methods like
Stream Load, Broker Load, Routine Load, and Insert Into, the new values are
combined with the old values to produce new aggregated values based on the
column's aggregation function. These values might be generated during insertion
or produced asynchronously during compaction. However, when querying, users
will always receive the same returned values.
+When loading data into the Aggregate model table using Doris-supported methods
such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new
values will be aggregated with the old values according to the column's
aggregation function to produce new aggregated values. This value may be
produced at the time of insertion or during asynchronous compaction, but users
will get the same return value when querying.
-## Partial column update for Aggregate Key model
+## Partial Column Update of Aggregate Model
-Tables in the Aggregate Key model are primarily used in cases with
pre-aggregation requirements rather than data updates, but Doris allows partial
column updates for them, too. Simply set the aggregation function to
`REPLACE_IF_NOT_NULL`.
+The Aggregate table is mainly used in pre-aggregation scenarios rather than
data update scenarios, but partial column updates can be achieved by setting
the aggregation function to REPLACE_IF_NOT_NULL.
-**Create table**
+**Create Table**
-For the columns that need to be updated, set the aggregation function to
`REPLACE_IF_NOT_NULL`.
+Set the aggregation function of the fields that need to be updated to
`REPLACE_IF_NOT_NULL`.
-```Plain
+```sql
CREATE TABLE order_tbl (
order_id int(11) NULL,
order_amount int(11) REPLACE_IF_NOT_NULL NULL,
@@ -52,38 +50,32 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-+----------+--------------+-----------------+
-| order_id | order_amount | order_status |
-+----------+--------------+-----------------+
-| 1 | 100 | Pending Payment |
-+----------+--------------+-----------------+
-1 row in set (0.01 sec)
```
-**Ingest data**
+**Data Insertion**
-For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly
write the updates to the fields.
+Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`,
directly write the data of the fields to be updated.
**Example**
-Using the same example as above, the corresponding Stream Load command would
be (no additional headers required):
+Similar to the previous example, the corresponding Stream Load command is (no
additional header required):
```shell
$ cat update.csv
1,To be shipped
-$ curl --location-trusted -u root: -H "column_separator:," -H
"columns:order_id,order_status" -T /tmp/update.csv
http://127.0.0.1:8030/api/db1/order_tbl/_stream_load
+curl --location-trusted -u root: -H "column_separator:," -H
"columns:order_id,order_status" -T /tmp/update.csv
http://127.0.0.1:8030/api/db1/order_tbl/_stream_load
```
-The corresponding `INSERT INTO` statement would be (no additional session
variables required):
+The corresponding `INSERT INTO` statement is (no additional session variable
settings required):
-```Plain
-INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending');
+```sql
+INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped');
```
-## Note
+## Notes on Partial Column Updates
-The Aggregate Key model does not perform additional data processing during
data writing, so the writing performance in this model is the same as other
models. However, aggregation during queries can result in performance loss.
Typical aggregation queries can be 5~10 times slower than queries on
Merge-on-Write tables in the Unique Key model.
+The Aggregate Key model does not perform any additional processing during the
write process, so the write performance is not affected and is the same as
normal data load. However, the cost of aggregation during query is relatively
high, and the typical aggregation query performance is 5-10 times lower than
the Merge-on-Write implementation of the Unique Key model.
-Under this circumstance, users cannot set a field from non-NULL to NULL,
because NULL values written will be automatically neglected by the
REPLACE_IF_NOT_NULL aggregation function.
+Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when
the value is not NULL, users cannot change a field value to NULL.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md
index fe2fbe3f98..b1ad17ce72 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md
@@ -26,7 +26,7 @@ under the License.
这篇文档主要介绍 Doris 聚合模型上基于导入的更新。
-## 所有列更新
+## 整行更新
使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg
模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction
时产出,但是用户查询时,都会得到一样的返回值。
@@ -50,12 +50,6 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-+----------+--------------+--------------+
-| order_id | order_amount | order_status |
-+----------+--------------+--------------+
-| 1 | 100 | 待付款 |
-+----------+--------------+--------------+
-1 row in set (0.01 sec)
```
**数据写入**
@@ -84,4 +78,4 @@ INSERT INTO order_tbl (order_id, order_status) values
(1,'待发货');
Aggregate Key
模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key
模型的 Merge-on-Write 实现会有 5-10 倍的下降。
-用户无法通过将某个字段由非 NULL 设置为 NULL,写入的 NULL 值在`REPLACE_IF_NOT_NULL`聚合函数的处理中会自动忽略。
+由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为NULL值。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md
index fe2fbe3f98..b1ad17ce72 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md
@@ -26,7 +26,7 @@ under the License.
这篇文档主要介绍 Doris 聚合模型上基于导入的更新。
-## 所有列更新
+## 整行更新
使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg
模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction
时产出,但是用户查询时,都会得到一样的返回值。
@@ -50,12 +50,6 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-+----------+--------------+--------------+
-| order_id | order_amount | order_status |
-+----------+--------------+--------------+
-| 1 | 100 | 待付款 |
-+----------+--------------+--------------+
-1 row in set (0.01 sec)
```
**数据写入**
@@ -84,4 +78,4 @@ INSERT INTO order_tbl (order_id, order_status) values
(1,'待发货');
Aggregate Key
模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key
模型的 Merge-on-Write 实现会有 5-10 倍的下降。
-用户无法通过将某个字段由非 NULL 设置为 NULL,写入的 NULL 值在`REPLACE_IF_NOT_NULL`聚合函数的处理中会自动忽略。
+由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为NULL值。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md
index fe2fbe3f98..b1ad17ce72 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md
@@ -26,7 +26,7 @@ under the License.
这篇文档主要介绍 Doris 聚合模型上基于导入的更新。
-## 所有列更新
+## 整行更新
使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg
模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction
时产出,但是用户查询时,都会得到一样的返回值。
@@ -50,12 +50,6 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-+----------+--------------+--------------+
-| order_id | order_amount | order_status |
-+----------+--------------+--------------+
-| 1 | 100 | 待付款 |
-+----------+--------------+--------------+
-1 row in set (0.01 sec)
```
**数据写入**
@@ -84,4 +78,4 @@ INSERT INTO order_tbl (order_id, order_status) values
(1,'待发货');
Aggregate Key
模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key
模型的 Merge-on-Write 实现会有 5-10 倍的下降。
-用户无法通过将某个字段由非 NULL 设置为 NULL,写入的 NULL 值在`REPLACE_IF_NOT_NULL`聚合函数的处理中会自动忽略。
+由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为NULL值。
diff --git
a/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md
b/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md
index 1a7dedcad7..3fde759a85 100644
---
a/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md
+++
b/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md
@@ -1,7 +1,7 @@
----
+-
{
- "title": "Updating Data on Aggregate Key Model",
- "language": "en"
+ "title": "Updating Data on Aggregate Key Model",
+ "language": "en"
}
---
@@ -24,23 +24,21 @@ specific language governing permissions and limitations
under the License.
-->
-# Update for Aggregate Load
-
-This guide is about ingestion-based data updates for the Aggregate Key model
in Doris.
+This document primarily introduces how to update the Doris Aggregate model
based on data load.
-## Update all columns
+## Whole Row Update
-When importing data into an Aggregate Key model in Doris by methods like
Stream Load, Broker Load, Routine Load, and Insert Into, the new values are
combined with the old values to produce new aggregated values based on the
column's aggregation function. These values might be generated during insertion
or produced asynchronously during compaction. However, when querying, users
will always receive the same returned values.
+When loading data into the Aggregate model table using Doris-supported methods
such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new
values will be aggregated with the old values according to the column's
aggregation function to produce new aggregated values. This value may be
produced at the time of insertion or during asynchronous compaction, but users
will get the same return value when querying.
-## Partial column update for Aggregate Key model
+## Partial Column Update of Aggregate Model
-Tables in the Aggregate Key model are primarily used in cases with
pre-aggregation requirements rather than data updates, but Doris allows partial
column updates for them, too. Simply set the aggregation function to
`REPLACE_IF_NOT_NULL`.
+The Aggregate table is mainly used in pre-aggregation scenarios rather than
data update scenarios, but partial column updates can be achieved by setting
the aggregation function to REPLACE_IF_NOT_NULL.
-**Create table**
+**Create Table**
-For the columns that need to be updated, set the aggregation function to
`REPLACE_IF_NOT_NULL`.
+Set the aggregation function of the fields that need to be updated to
`REPLACE_IF_NOT_NULL`.
-```Plain
+```sql
CREATE TABLE order_tbl (
order_id int(11) NULL,
order_amount int(11) REPLACE_IF_NOT_NULL NULL,
@@ -52,38 +50,32 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-+----------+--------------+-----------------+
-| order_id | order_amount | order_status |
-+----------+--------------+-----------------+
-| 1 | 100 | Pending Payment |
-+----------+--------------+-----------------+
-1 row in set (0.01 sec)
```
-**Ingest data**
+**Data Insertion**
-For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly
write the updates to the fields.
+Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`,
directly write the data of the fields to be updated.
**Example**
-Using the same example as above, the corresponding Stream Load command would
be (no additional headers required):
+Similar to the previous example, the corresponding Stream Load command is (no
additional header required):
```shell
$ cat update.csv
1,To be shipped
-$ curl --location-trusted -u root: -H "column_separator:," -H
"columns:order_id,order_status" -T /tmp/update.csv
http://127.0.0.1:8030/api/db1/order_tbl/_stream_load
+curl --location-trusted -u root: -H "column_separator:," -H
"columns:order_id,order_status" -T /tmp/update.csv
http://127.0.0.1:8030/api/db1/order_tbl/_stream_load
```
-The corresponding `INSERT INTO` statement would be (no additional session
variables required):
+The corresponding `INSERT INTO` statement is (no additional session variable
settings required):
-```Plain
-INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending');
+```sql
+INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped');
```
-## Note
+## Notes on Partial Column Updates
-The Aggregate Key model does not perform additional data processing during
data writing, so the writing performance in this model is the same as other
models. However, aggregation during queries can result in performance loss.
Typical aggregation queries can be 5~10 times slower than queries on
Merge-on-Write tables in the Unique Key model.
+The Aggregate Key model does not perform any additional processing during the
write process, so the write performance is not affected and is the same as
normal data load. However, the cost of aggregation during query is relatively
high, and the typical aggregation query performance is 5-10 times lower than
the Merge-on-Write implementation of the Unique Key model.
-Under this circumstance, users cannot set a field from non-NULL to NULL,
because NULL values written will be automatically neglected by the
REPLACE_IF_NOT_NULL aggregation function.
+Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when
the value is not NULL, users cannot change a field value to NULL.
diff --git
a/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md
b/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md
index 4b5ef90675..3fde759a85 100644
---
a/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md
+++
b/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md
@@ -1,7 +1,7 @@
----
+-
{
- "title": "Updating Data on Aggregate Key Model",
- "language": "en"
+ "title": "Updating Data on Aggregate Key Model",
+ "language": "en"
}
---
@@ -24,23 +24,21 @@ specific language governing permissions and limitations
under the License.
-->
+This document primarily introduces how to update the Doris Aggregate model
based on data load.
+## Whole Row Update
-This guide is about ingestion-based data updates for the Aggregate Key model
in Doris.
-
-## Update all columns
-
-When importing data into an Aggregate Key model in Doris by methods like
Stream Load, Broker Load, Routine Load, and Insert Into, the new values are
combined with the old values to produce new aggregated values based on the
column's aggregation function. These values might be generated during insertion
or produced asynchronously during compaction. However, when querying, users
will always receive the same returned values.
+When loading data into the Aggregate model table using Doris-supported methods
such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new
values will be aggregated with the old values according to the column's
aggregation function to produce new aggregated values. This value may be
produced at the time of insertion or during asynchronous compaction, but users
will get the same return value when querying.
-## Partial column update for Aggregate Key model
+## Partial Column Update of Aggregate Model
-Tables in the Aggregate Key model are primarily used in cases with
pre-aggregation requirements rather than data updates, but Doris allows partial
column updates for them, too. Simply set the aggregation function to
`REPLACE_IF_NOT_NULL`.
+The Aggregate table is mainly used in pre-aggregation scenarios rather than
data update scenarios, but partial column updates can be achieved by setting
the aggregation function to REPLACE_IF_NOT_NULL.
-**Create table**
+**Create Table**
-For the columns that need to be updated, set the aggregation function to
`REPLACE_IF_NOT_NULL`.
+Set the aggregation function of the fields that need to be updated to
`REPLACE_IF_NOT_NULL`.
-```Plain
+```sql
CREATE TABLE order_tbl (
order_id int(11) NULL,
order_amount int(11) REPLACE_IF_NOT_NULL NULL,
@@ -52,38 +50,32 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-+----------+--------------+-----------------+
-| order_id | order_amount | order_status |
-+----------+--------------+-----------------+
-| 1 | 100 | Pending Payment |
-+----------+--------------+-----------------+
-1 row in set (0.01 sec)
```
-**Ingest data**
+**Data Insertion**
-For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly
write the updates to the fields.
+Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`,
directly write the data of the fields to be updated.
**Example**
-Using the same example as above, the corresponding Stream Load command would
be (no additional headers required):
+Similar to the previous example, the corresponding Stream Load command is (no
additional header required):
```shell
$ cat update.csv
1,To be shipped
-$ curl --location-trusted -u root: -H "column_separator:," -H
"columns:order_id,order_status" -T /tmp/update.csv
http://127.0.0.1:8030/api/db1/order_tbl/_stream_load
+curl --location-trusted -u root: -H "column_separator:," -H
"columns:order_id,order_status" -T /tmp/update.csv
http://127.0.0.1:8030/api/db1/order_tbl/_stream_load
```
-The corresponding `INSERT INTO` statement would be (no additional session
variables required):
+The corresponding `INSERT INTO` statement is (no additional session variable
settings required):
-```Plain
-INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending');
+```sql
+INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped');
```
-## Note
+## Notes on Partial Column Updates
-The Aggregate Key model does not perform additional data processing during
data writing, so the writing performance in this model is the same as other
models. However, aggregation during queries can result in performance loss.
Typical aggregation queries can be 5~10 times slower than queries on
Merge-on-Write tables in the Unique Key model.
+The Aggregate Key model does not perform any additional processing during the
write process, so the write performance is not affected and is the same as
normal data load. However, the cost of aggregation during query is relatively
high, and the typical aggregation query performance is 5-10 times lower than
the Merge-on-Write implementation of the Unique Key model.
-Under this circumstance, users cannot set a field from non-NULL to NULL,
because NULL values written will be automatically neglected by the
REPLACE_IF_NOT_NULL aggregation function.
+Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when
the value is not NULL, users cannot change a field value to NULL.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]