This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 60b323ebe09 docs(table-design): add Merge-on-Write concept page; make
Unique Key Model task-first (#3939)
60b323ebe09 is described below
commit 60b323ebe09501660ccd58308983253e9c3cfec3
Author: Yongqiang YANG <[email protected]>
AuthorDate: Thu Jun 18 05:21:14 2026 -0700
docs(table-design): add Merge-on-Write concept page; make Unique Key Model
task-first (#3939)
## What
- **Add a dedicated `Merge-on-Write` page** under Table Design › Data
Model, as the single canonical source for how the merge-on-write and
merge-on-read implementations work, their read/write trade-offs, how to
choose, and how to enable each.
- **Rewrite the `Unique Key Model` page to be task-first**: it now leads
with `CREATE TABLE` and an upsert example, then links to the new page
for the underlying mechanism. The merge-on-write / merge-on-read
architecture explanation that previously sat between the intro and the
first runnable example has moved to the new page.
## Why
On the current Unique Key Model page, a reader who just wants to create
a unique-key table has to read the "How It Works" implementation
comparison and the update-semantics tables before reaching the first
`CREATE TABLE`. The merge-on-write mechanism is also re-explained inline
across many loading and update pages, with no canonical home to link to.
Separating the explanation (merge-on-write) from the how-to (creating
and upserting a unique-key table) follows the Diátaxis split: the task
page gets you to a runnable example quickly, and the concept page can be
linked from everywhere merge-on-write is referenced.
## Scope
- English only.
- Applies to both the current docs (`docs/`) and
`versioned_docs/version-4.x`; both sidebars updated.
- New doc id: `table-design/data-model/merge-on-write`, placed right
after `unique` in the Table Models category.
## Validation
- `version-4.x-sidebars.json` parses as valid JSON.
- Cross-link targets (`data-operate/update/update-of-unique-model`) and
the referenced image exist in both trees.
---
docs/table-design/data-model/merge-on-write.md | 99 +++++++++++++++++++
docs/table-design/data-model/unique.md | 107 ++++++--------------
.../table-design/data-model/merge-on-write.md | 99 +++++++++++++++++++
.../current/table-design/data-model/unique.md | 109 ++++++---------------
.../table-design/data-model/merge-on-write.md | 99 +++++++++++++++++++
.../version-4.x/table-design/data-model/unique.md | 109 ++++++---------------
sidebars.ts | 9 +-
.../table-design/data-model/merge-on-write.md | 99 +++++++++++++++++++
.../version-4.x/table-design/data-model/unique.md | 107 ++++++--------------
versioned_sidebars/version-4.x-sidebars.json | 12 ++-
10 files changed, 533 insertions(+), 316 deletions(-)
diff --git a/docs/table-design/data-model/merge-on-write.md
b/docs/table-design/data-model/merge-on-write.md
new file mode 100644
index 00000000000..94b01c804db
--- /dev/null
+++ b/docs/table-design/data-model/merge-on-write.md
@@ -0,0 +1,99 @@
+---
+{
+ "title": "Merge-on-Write",
+ "language": "en",
+ "description": "How the Doris Unique Key Model guarantees Key uniqueness:
the merge-on-write and merge-on-read implementations, their performance
trade-offs, and how to choose between them."
+}
+---
+
+**Merge-on-write** and **merge-on-read** are the two storage implementations
of the [Unique Key Model](./unique). Both keep Key columns unique and return
the same query results, but they resolve duplicate Keys at different times, and
that affects read and write performance. Merge-on-write is the default and
works best for most workloads.
+
+<!-- Knowledge type: Concept -->
+<!-- Applicable scenarios: Choosing a Unique Key implementation -->
+
+## Comparison
+
+| Implementation | Merge timing | Write performance | Query performance |
Applicable scenarios |
+| --- | --- | --- | --- | --- |
+| Merge-on-write (default) | At write time | Moderate | High | Most scenarios,
balancing query and write performance |
+| Merge-on-read | At query or compaction time | High | Lower | Write-heavy,
read-light scenarios |
+
+## How Merge-on-Write Works
+
+With merge-on-write, Doris resolves duplicate Keys as data is written. For
each incoming row, it checks whether the Key already exists and marks any
previous version as deleted in a delete bitmap. Queries use the bitmap to skip
the marked rows, so they read only the latest version. Compaction later removes
those rows from disk.
+
+Because duplicates are resolved at write time:
+
+- Queries read a single version per Key, with no read-time merge.
+- Doris can push filters down to the storage layer, so scans skip data that
can't match.
+- Read performance does not degrade as past updates accumulate.
+
+The cost is on the write side. Each upsert looks up the primary key to find
and mark the previous version, which adds some write overhead compared with
merge-on-read. For most workloads this trade-off is worth it, so merge-on-write
is the default.
+
+## How Merge-on-Read Works
+
+With merge-on-read, writes only append data. Doris keeps every version of a
Key and merges them at query time or during compaction, returning the latest
version.
+
+Because duplicates are resolved at read time:
+
+- Writes are lightweight, because they skip the primary-key lookup.
+- Every query has to merge the versions of each Key, and Doris can't push
filters down, so queries are slower.
+- Query latency grows as versions accumulate between compactions.
+
+It fits write-heavy, read-light pipelines, where write throughput matters more
than query speed.
+
+## Choosing Between Them
+
+- **Use merge-on-write (default)** for most workloads, including real-time
updates, dimension synchronization, and any case where query performance
matters. It also enables features such as partial column updates.
+- **Use merge-on-read** only when you have far more writes than reads and you
need to minimize write overhead.
+
+The implementation is fixed at table creation and **cannot be changed later
through schema change**, so decide before you create the table.
+
+## Enabling Each Implementation
+
+The `enable_unique_key_merge_on_write` table property controls the
implementation.
+
+Merge-on-write (default):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "true"
+);
+```
+
+Merge-on-read (set the property to `false`):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "false"
+);
+```
+
+## Capabilities That Require Merge-on-Write
+
+[Partial column update](../../data-operate/update/update-of-unique-model)
requires merge-on-write: it updates a subset of columns without rewriting the
whole row.
+
+## Related
+
+- [Unique Key Model](./unique)
+- [Partial Column Update](../../data-operate/update/update-of-unique-model)
diff --git a/docs/table-design/data-model/unique.md
b/docs/table-design/data-model/unique.md
index 06f6169902e..acd107cb77f 100644
--- a/docs/table-design/data-model/unique.md
+++ b/docs/table-design/data-model/unique.md
@@ -6,62 +6,22 @@
}
---
-The Unique Key Model is designed for business scenarios that require data
updates. This model guarantees the uniqueness of Key columns: when data is
inserted or updated, new data overwrites old data with the same Key, ensuring
that records always reflect the latest version.
+The Unique Key Model keeps Key columns unique: when you insert or update a
row, the new data overwrites any existing row with the same Key, so the table
always holds the latest version. Use it when your data is updated by primary
key.
<!-- Knowledge type: Data model -->
<!-- Applicable scenarios: Data update / High-frequency writes / Primary key
deduplication -->
-## Applicable Scenarios
+## When to Use
-The Unique Key Model is mainly suitable for the following three categories of
business scenarios:
+The Unique Key Model fits three main scenarios:
1. **High-frequency data updates**: Real-time synchronization of dimension
tables from upstream OLTP databases, which requires efficient UPSERT operations.
-2. **Efficient data deduplication**: In ad delivery, customer relationship
management (CRM), and similar systems, deduplication is performed efficiently
based on user IDs.
-3. **Partial column updates**: For profile tagging scenarios where dynamic
tags change frequently, or for order consumption scenarios where the
transaction status changes, the partial column update capability of the Unique
Key Model can be used.
+2. **Efficient data deduplication**: In ad delivery, customer relationship
management (CRM), and similar systems, records are deduplicated by user ID.
+3. **Partial column updates**: For profile tagging where dynamic tags change
frequently, or order scenarios where the transaction status changes, you can
update only the affected columns.
-## Core Features
+## Create a Unique Key Table
-The Unique Key Model provides the following core features:
-
-| Feature | Description |
-| --- | --- |
-| UPSERT based on primary key | Records with duplicate primary keys are
updated; records whose primary keys do not exist are inserted |
-| Deduplication based on primary key | Key columns are unique, and data is
deduplicated by the primary key columns |
-| High-frequency data updates | Supports high-frequency update scenarios while
balancing update performance and query performance |
-
-## How It Works
-
-The Doris Unique Key Model provides two implementations, compared as follows:
-
-| Implementation | Default since version | Merge timing | Query performance |
Predicate pushdown | Applicable scenarios |
-| --- | --- | --- | --- | --- | --- |
-| Merge-on-write | Default since 2.1 (introduced in 1.2) | Merged immediately
at write time | High | Supported | Most scenarios, balancing query and write
performance |
-| Merge-on-read | Default before 2.1 | Merged at query or compaction time |
Lower | Not supported | Write-heavy, read-light scenarios |
-
-- **Merge-on-write**: Records with the same Key are merged immediately at
write time, ensuring that the storage always holds the latest data. This mode
balances query and write performance, avoids merging data across multiple
versions, and supports predicate pushdown to the storage layer. **This mode is
recommended for most scenarios.**
-- **Merge-on-read**: Data is not merged at write time but is appended
incrementally, with multiple versions retained inside Doris. At query or
compaction time, versions with the same Key are merged. This mode suits
write-heavy, read-light scenarios, but queries must merge multiple versions and
predicates cannot be pushed down, which may affect query speed.
-
-## Update Semantics
-
-The Doris Unique Key Model supports two update semantics:
-
-| Update semantics | Description | Implementation requirement |
-| --- | --- | --- |
-| Whole-row update | The default UPSERT semantics of the Unique Key Model: if
the Key exists, the entire row is updated; otherwise, a new row is inserted |
Supported by default |
-| Partial column update | Only the specified columns are updated, and the
original values of unspecified columns are kept | Must use merge-on-write and
be enabled through a parameter |
-
-Notes:
-
-- Under whole-row `UPSERT` semantics, even if `INSERT INTO` specifies only
some columns, Doris fills the unspecified columns with NULL or default values
in the planner.
-- For how to use partial column updates, see [Partial Column
Update](../../data-operate/update/update-of-unique-model).
-
-## Table Creation Examples
-
-Use the `UNIQUE KEY` keyword to define a Unique Key table at table creation
time, and use the `enable_unique_key_merge_on_write` property to control the
implementation.
-
-### Merge-on-write
-
-Since Doris 2.1, merge-on-write is enabled by default:
+Declare the primary key with the `UNIQUE KEY` keyword. Merge-on-write is on by
default and is right for almost all workloads, so you don't need any extra
property:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_unique
@@ -73,39 +33,18 @@ CREATE TABLE IF NOT EXISTS example_tbl_unique
sex TINYINT
)
UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "true"
-);
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
```
-### Merge-on-read
+Each row is identified by `(user_id, user_name)`. Writing a row whose Key
already exists overwrites it; a row with a new Key is inserted.
-Before Doris 2.1, merge-on-read was enabled by default. Starting from 2.1, you
must explicitly disable the `enable_unique_key_merge_on_write` property to use
it:
+## Upsert Data
-```sql
-CREATE TABLE IF NOT EXISTS example_tbl_unique
-(
- user_id LARGEINT NOT NULL,
- user_name VARCHAR(50) NOT NULL,
- city VARCHAR(20),
- age SMALLINT,
- sex TINYINT
-)
-UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "false"
-);
-```
-
-## Data Insertion and Storage
-
-In a Unique Key table, the Key columns are used not only for sorting but also
for deduplication. When data is inserted, records with the same Key are
overwritten.
+Doris upserts by Key no matter which load method you use: rows with an
existing Key are updated, and rows with a new Key are inserted. Doris uses the
Key columns for both sorting and deduplication. The example below uses
`INSERT`; Stream Load, Broker Load, and Routine Load work the same way.

-In the following example, the original table has 4 rows. After inserting 2
rows, the new data updates the table based on the primary key:
+In the following example, the original table has 4 rows. Re-inserting 2 rows
with existing Keys updates them in place:
```sql
-- Insert original data
@@ -115,7 +54,7 @@ INSERT INTO example_tbl_unique VALUES
(103, 'Juice', 'SH', 20, 2),
(104, 'Olivia', 'SZ', 22, 2);
--- Update based on Key
+-- Re-insert the same Keys with new values
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 27, 1),
(102, 'Jason', 'SH', 28, 1);
@@ -132,11 +71,23 @@ SELECT * FROM example_tbl_unique;
+---------+-----------+------+------+------+
```
+With a whole-row upsert, if your `INSERT INTO` lists only some columns, Doris
fills the rest with NULL or default values.
+
+## Update Only Some Columns
+
+To change a few fields without rewriting the whole row, use partial column
update. It needs merge-on-write (the default) and is turned on with a
parameter. See [Partial Column
Update](../../data-operate/update/update-of-unique-model).
+
+## Choose an Implementation
+
+The Unique Key Model has two storage implementations: **merge-on-write** (the
default, recommended for most workloads) and **merge-on-read** (suited to
write-heavy, read-light pipelines). The implementation is fixed at table
creation and cannot be changed later through schema change.
+
+For how each one works, their performance trade-offs, and how to enable
merge-on-read, see [Merge-on-Write](./merge-on-write).
+
## Notes
When using the Unique Key Model, note the following limitations:
-1. **The implementation cannot be changed**: The implementation of a Unique
table (merge-on-write / merge-on-read) can only be specified at table creation
time and cannot be modified through schema change.
-2. **Whole-row UPSERT fills in default values**: Under whole-row `UPSERT`
semantics, even if `INSERT INTO` specifies only some columns, Doris fills the
unspecified columns with NULL or default values in the planner.
-3. **Partial column updates require merge-on-write**: To update only some
columns, you must use the merge-on-write implementation and enable partial
column update support through a specific parameter. For details, see [Partial
Column Update](../../data-operate/update/update-of-unique-model).
-4. **Partition keys must be a subset of Key columns**: To guarantee data
uniqueness, partition keys must be a subset of the Key columns.
+1. **The implementation cannot be changed**: merge-on-write or merge-on-read
can only be set at table creation and cannot be modified through schema change.
+2. **Whole-row UPSERT fills in default values**: even if `INSERT INTO`
specifies only some columns, Doris fills the unspecified columns with NULL or
default values.
+3. **Partial column updates require merge-on-write**: to update only some
columns, use merge-on-write and turn on partial column update with a parameter.
See [Partial Column Update](../../data-operate/update/update-of-unique-model).
+4. **Partition keys must be a subset of Key columns**: Doris requires this to
guarantee data uniqueness.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/merge-on-write.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/merge-on-write.md
new file mode 100644
index 00000000000..0e9b6222ea8
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/merge-on-write.md
@@ -0,0 +1,99 @@
+---
+{
+ "title": "写时合并",
+ "language": "zh-CN",
+ "description": "Doris 主键模型如何保证 Key 列唯一性:写时合并与读时合并两种实现、它们的性能权衡,以及如何选择。"
+}
+---
+
+**写时合并(merge-on-write)**与**读时合并(merge-on-read)**是[主键模型](./unique)用于保证 Key
列唯一性的两种存储实现。两者返回相同的查询结果,但在**何时**消解重复 Key 上有所不同,这决定了它们的读写性能。写时合并是默认实现,推荐用于大多数场景。
+
+<!-- 知识类型: 概念 -->
+<!-- 适用场景: 选择主键模型的实现方式 -->
+
+## 对比
+
+| 实现 | 合并时机 | 写入性能 | 查询性能 | 适用场景 |
+| --- | --- | --- | --- | --- |
+| 写时合并(默认) | 写入时 | 适中 | 高 | 大多数场景,兼顾查询与写入性能 |
+| 读时合并 | 查询或 Compaction 时 | 高 | 较低 | 写多读少场景 |
+
+## 写时合并的工作原理
+
+采用写时合并时,Doris 在数据写入时消解相同 Key 的记录。对于每一条写入的数据,Doris 会检查该 Key 是否已存在,并在 delete
bitmap 中将其旧版本标记为删除。查询时借助该 bitmap 跳过被标记的行,因此只读取最新版本。这些旧行在 Compaction 时才从磁盘物理删除。
+
+由于重复在写入时消解:
+
+- 查询每个 Key 只需读取单个版本,无需在读取时合并。
+- 过滤谓词可以下推到存储层,从而跳过无关数据。
+- 读取性能不会随历史更新的累积而下降。
+
+代价在写入侧:每次 upsert
都需要一次主键查找来定位并标记旧版本,相比读时合并增加了一定的写入开销。对绝大多数场景而言这一权衡是值得的,因此写时合并是默认实现。
+
+## 读时合并的工作原理
+
+采用读时合并时,写入只追加数据。Doris 内部保留同一 Key 的多个版本,并在查询时或 Compaction 时合并,保留最新版本。
+
+由于重复在读取时消解:
+
+- 写入很轻量,写入时无需主键查找。
+- 每次查询都必须合并同一 Key 的多个版本,且谓词无法下推,因此查询更慢。
+- 随着两次 Compaction 之间版本的累积,查询延迟会上升。
+
+该实现适合写多读少、写入吞吐比查询延迟更重要的场景。
+
+## 如何选择
+
+- **使用写时合并(默认)**:适用于绝大多数场景,包括实时更新、维度同步,以及任何对查询性能有要求的场景。它还支持部分列更新等能力。
+- **使用读时合并**:仅当写入量远大于读取量、且需要尽量降低写入开销时。
+
+实现方式在建表时确定,**之后无法通过 schema change 修改**,因此请在建表前选定。
+
+## 启用各实现
+
+实现由 `enable_unique_key_merge_on_write` 表属性控制。
+
+写时合并(默认):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "true"
+);
+```
+
+读时合并(将该属性设为 `false`):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "false"
+);
+```
+
+## 依赖写时合并的能力
+
+[部分列更新](../../data-operate/update/update-of-unique-model)
依赖写时合并实现:仅更新部分列而无需重写整行。
+
+## 相关文档
+
+- [主键模型](./unique)
+- [部分列更新](../../data-operate/update/update-of-unique-model)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/unique.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/unique.md
index 718b44a0018..fdcbe64a77b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/unique.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/unique.md
@@ -6,62 +6,22 @@
}
---
-主键模型(Unique Key Model)适用于需要数据更新的业务场景。该模型保证 Key 列的唯一性:插入或更新数据时,新数据会覆盖具有相同 Key
的旧数据,从而确保数据记录始终为最新版本。
+主键模型(Unique Key Model)保证 Key 列的唯一性:插入或更新一行数据时,新数据会覆盖具有相同 Key
的已有行,从而确保记录始终为最新版本。当数据需要按主键更新时,使用该模型。
<!-- 知识类型: 数据模型 -->
<!-- 适用场景: 数据更新 / 高频写入 / 主键去重 -->
## 适用场景
-主键模型主要适用于以下三类业务场景:
+主键模型主要适用于以下三类场景:
-1. **高频数据更新**:上游 OLTP 数据库中的维度表实时同步,需要高效执行 UPSERT 操作;
-2. **数据高效去重**:在广告投放、客户关系管理(CRM)等系统中,基于用户 ID 进行高效去重;
-3. **部分列更新**:在画像标签场景中变更频繁改动的动态标签,或在消费订单场景中改变交易状态,可通过主键模型的部分列更新能力完成。
+1. **高频数据更新**:上游 OLTP 数据库中的维度表实时同步,需要高效执行 UPSERT 操作。
+2. **数据高效去重**:在广告投放、客户关系管理(CRM)等系统中,按用户 ID 对记录去重。
+3. **部分列更新**:在画像标签场景中动态标签频繁变化,或在订单场景中交易状态变化,只需更新受影响的列。
-## 核心特性
+## 创建主键表
-主键模型具备以下核心特性:
-
-| 特性 | 说明 |
-| --- | --- |
-| 基于主键 UPSERT | 主键重复的数据会被更新;主键不存在的记录会被插入 |
-| 基于主键去重 | Key 列具有唯一性,根据主键列对数据进行去重 |
-| 高频数据更新 | 支持高频更新场景,平衡数据更新性能与查询性能 |
-
-## 工作原理
-
-Doris 主键模型提供两种实现方式,对比如下:
-
-| 实现方式 | 默认开启版本 | 合并时机 | 查询性能 | 谓词下推 | 适用场景 |
-| --- | --- | --- | --- | --- | --- |
-| 写时合并(merge-on-write) | 1.2 版本起默认 | 写入时立即合并 | 高 | 支持 | 大多数场景,兼顾查询与写入性能 |
-| 读时合并(merge-on-read) | 1.2 版本前默认 | 查询或 Compaction 时合并 | 较低 | 不支持 | 写多读少场景 |
-
-- **写时合并**:数据在写入时立即合并相同 Key
的记录,确保存储的始终是最新数据。该模式兼顾查询和写入性能,避免多个版本的数据合并,并支持谓词下推到存储层,**大多数场景推荐使用此模式**。
-- **读时合并**:数据在写入时并不进行合并,以增量方式被追加存储,在 Doris 内保留多个版本。查询或 Compaction 时,会对相同 Key
的版本进行合并。该模式适合写多读少的场景,但在查询时需要进行多版本合并,谓词无法下推,可能影响查询速度。
-
-## 更新语义
-
-Doris 主键模型支持两种更新语义:
-
-| 更新语义 | 说明 | 实现要求 |
-| --- | --- | --- |
-| 整行更新 | Unique Key 模型默认的 UPSERT 语义:Key 存在则更新整行,不存在则插入 | 默认支持 |
-| 部分列更新 | 仅更新指定字段,保留未指定字段的原值 | 必须使用写时合并,并通过参数开启 |
-
-说明:
-
-- 在整行 `UPSERT` 语义下,即使使用 `INSERT INTO` 指定部分列写入,Doris 也会在 Planner 中将未提供的列填充为
NULL 或默认值。
-- 部分列更新的使用方式请参阅 [部分列更新](../../data-operate/update/update-of-unique-model)。
-
-## 建表示例
-
-在建表时使用 `UNIQUE KEY` 关键字指定主键表,并通过 `enable_unique_key_merge_on_write` 属性控制实现方式。
-
-### 写时合并
-
-自 Doris 2.1 版本起,默认开启写时合并:
+使用 `UNIQUE KEY` 关键字声明主键。写时合并为默认实现,适用于几乎所有场景,因此无需额外设置属性:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_unique
@@ -73,49 +33,28 @@ CREATE TABLE IF NOT EXISTS example_tbl_unique
sex TINYINT
)
UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "true"
-);
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
```
-### 读时合并
+每一行由 `(user_id, user_name)` 唯一标识。写入一行时,若其 Key 已存在则覆盖该行,否则插入新行。
-在 Doris 2.1 版本之前,默认开启读时合并;2.1 版本起需通过显式关闭 `enable_unique_key_merge_on_write`
属性指定:
+## 写入数据(Upsert)
-```sql
-CREATE TABLE IF NOT EXISTS example_tbl_unique
-(
- user_id LARGEINT NOT NULL,
- user_name VARCHAR(50) NOT NULL,
- city VARCHAR(20),
- age SMALLINT,
- sex TINYINT
-)
-UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "false"
-);
-```
-
-## 数据插入与存储
-
-在主键表中,Key 列不仅用于排序,还用于去重;插入数据时,相同 Key 的记录会被覆盖。
+无论使用哪种导入方式,Doris 都会按 Key 进行 upsert:Key 已存在的记录会被更新,Key 不存在的记录会被插入。Key
列同时用于排序和去重。下面的示例使用 `INSERT`,Stream Load、Broker Load、Routine Load 等方式的行为相同。

-如下示例所示,原表中有 4 行数据,插入 2 行后,新数据基于主键完成更新:
+下面的示例中,原表有 4 行。以已存在的 Key 重新写入 2 行后,它们被就地更新:
```sql
--- 插入原始数据
+-- 写入原始数据
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 26, 1),
(102, 'Jason', 'BJ', 27, 1),
(103, 'Juice', 'SH', 20, 2),
(104, 'Olivia', 'SZ', 22, 2);
--- 基于 Key 进行更新
+-- 以相同 Key 重新写入新值
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 27, 1),
(102, 'Jason', 'SH', 28, 1);
@@ -132,11 +71,23 @@ SELECT * FROM example_tbl_unique;
+---------+-----------+------+------+------+
```
+在整行 `UPSERT` 语义下,即使 `INSERT INTO` 只指定了部分列,Doris 也会用 NULL 或默认值填充未指定的列。
+
+## 仅更新部分列
+
+如需只修改少量字段而不重写整行,使用部分列更新。它要求采用写时合并实现(默认),并通过参数开启。详见[部分列更新](../../data-operate/update/update-of-unique-model)。
+
+## 选择实现方式
+
+主键模型有两种存储实现:**写时合并**(默认,推荐用于大多数场景)与**读时合并**(适合写多读少的场景)。实现方式在建表时确定,之后无法通过
schema change 修改。
+
+关于两种实现的工作原理、性能权衡,以及如何启用读时合并,见[写时合并](./merge-on-write)。
+
## 注意事项
使用主键模型时,请注意以下限制:
-1. **实现方式不可变更**:Unique 表的实现方式(写时合并 / 读时合并)只能在建表时指定,无法通过 schema change 修改;
-2. **整行 UPSERT 会填充缺省值**:在整行 `UPSERT` 语义下,即使使用 `INSERT INTO` 指定部分列写入,Doris 也会在
Planner 中将未提供的列填充为 NULL 或默认值;
-3. **部分列更新需启用写时合并**:如需更新部分字段,必须使用写时合并实现,并通过特定参数开启部分列更新支持,详见
[部分列更新](../../data-operate/update/update-of-unique-model);
-4. **分区键必须包含在 Key 列内**:为保证数据的唯一性,分区键必须是 Key 列的子集。
+1. **实现方式不可更改**:写时合并或读时合并只能在建表时指定,无法通过 schema change 修改。
+2. **整行 UPSERT 填充默认值**:即使 `INSERT INTO` 只指定部分列,Doris 也会用 NULL 或默认值填充未指定的列。
+3.
**部分列更新需要写时合并**:如需只更新部分列,使用写时合并并通过参数开启部分列更新。详见[部分列更新](../../data-operate/update/update-of-unique-model)。
+4. **分区列必须是 Key 列的子集**:Doris 以此保证数据唯一性。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/merge-on-write.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/merge-on-write.md
new file mode 100644
index 00000000000..0e9b6222ea8
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/merge-on-write.md
@@ -0,0 +1,99 @@
+---
+{
+ "title": "写时合并",
+ "language": "zh-CN",
+ "description": "Doris 主键模型如何保证 Key 列唯一性:写时合并与读时合并两种实现、它们的性能权衡,以及如何选择。"
+}
+---
+
+**写时合并(merge-on-write)**与**读时合并(merge-on-read)**是[主键模型](./unique)用于保证 Key
列唯一性的两种存储实现。两者返回相同的查询结果,但在**何时**消解重复 Key 上有所不同,这决定了它们的读写性能。写时合并是默认实现,推荐用于大多数场景。
+
+<!-- 知识类型: 概念 -->
+<!-- 适用场景: 选择主键模型的实现方式 -->
+
+## 对比
+
+| 实现 | 合并时机 | 写入性能 | 查询性能 | 适用场景 |
+| --- | --- | --- | --- | --- |
+| 写时合并(默认) | 写入时 | 适中 | 高 | 大多数场景,兼顾查询与写入性能 |
+| 读时合并 | 查询或 Compaction 时 | 高 | 较低 | 写多读少场景 |
+
+## 写时合并的工作原理
+
+采用写时合并时,Doris 在数据写入时消解相同 Key 的记录。对于每一条写入的数据,Doris 会检查该 Key 是否已存在,并在 delete
bitmap 中将其旧版本标记为删除。查询时借助该 bitmap 跳过被标记的行,因此只读取最新版本。这些旧行在 Compaction 时才从磁盘物理删除。
+
+由于重复在写入时消解:
+
+- 查询每个 Key 只需读取单个版本,无需在读取时合并。
+- 过滤谓词可以下推到存储层,从而跳过无关数据。
+- 读取性能不会随历史更新的累积而下降。
+
+代价在写入侧:每次 upsert
都需要一次主键查找来定位并标记旧版本,相比读时合并增加了一定的写入开销。对绝大多数场景而言这一权衡是值得的,因此写时合并是默认实现。
+
+## 读时合并的工作原理
+
+采用读时合并时,写入只追加数据。Doris 内部保留同一 Key 的多个版本,并在查询时或 Compaction 时合并,保留最新版本。
+
+由于重复在读取时消解:
+
+- 写入很轻量,写入时无需主键查找。
+- 每次查询都必须合并同一 Key 的多个版本,且谓词无法下推,因此查询更慢。
+- 随着两次 Compaction 之间版本的累积,查询延迟会上升。
+
+该实现适合写多读少、写入吞吐比查询延迟更重要的场景。
+
+## 如何选择
+
+- **使用写时合并(默认)**:适用于绝大多数场景,包括实时更新、维度同步,以及任何对查询性能有要求的场景。它还支持部分列更新等能力。
+- **使用读时合并**:仅当写入量远大于读取量、且需要尽量降低写入开销时。
+
+实现方式在建表时确定,**之后无法通过 schema change 修改**,因此请在建表前选定。
+
+## 启用各实现
+
+实现由 `enable_unique_key_merge_on_write` 表属性控制。
+
+写时合并(默认):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "true"
+);
+```
+
+读时合并(将该属性设为 `false`):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "false"
+);
+```
+
+## 依赖写时合并的能力
+
+[部分列更新](../../data-operate/update/update-of-unique-model)
依赖写时合并实现:仅更新部分列而无需重写整行。
+
+## 相关文档
+
+- [主键模型](./unique)
+- [部分列更新](../../data-operate/update/update-of-unique-model)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/unique.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/unique.md
index 718b44a0018..fdcbe64a77b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/unique.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/data-model/unique.md
@@ -6,62 +6,22 @@
}
---
-主键模型(Unique Key Model)适用于需要数据更新的业务场景。该模型保证 Key 列的唯一性:插入或更新数据时,新数据会覆盖具有相同 Key
的旧数据,从而确保数据记录始终为最新版本。
+主键模型(Unique Key Model)保证 Key 列的唯一性:插入或更新一行数据时,新数据会覆盖具有相同 Key
的已有行,从而确保记录始终为最新版本。当数据需要按主键更新时,使用该模型。
<!-- 知识类型: 数据模型 -->
<!-- 适用场景: 数据更新 / 高频写入 / 主键去重 -->
## 适用场景
-主键模型主要适用于以下三类业务场景:
+主键模型主要适用于以下三类场景:
-1. **高频数据更新**:上游 OLTP 数据库中的维度表实时同步,需要高效执行 UPSERT 操作;
-2. **数据高效去重**:在广告投放、客户关系管理(CRM)等系统中,基于用户 ID 进行高效去重;
-3. **部分列更新**:在画像标签场景中变更频繁改动的动态标签,或在消费订单场景中改变交易状态,可通过主键模型的部分列更新能力完成。
+1. **高频数据更新**:上游 OLTP 数据库中的维度表实时同步,需要高效执行 UPSERT 操作。
+2. **数据高效去重**:在广告投放、客户关系管理(CRM)等系统中,按用户 ID 对记录去重。
+3. **部分列更新**:在画像标签场景中动态标签频繁变化,或在订单场景中交易状态变化,只需更新受影响的列。
-## 核心特性
+## 创建主键表
-主键模型具备以下核心特性:
-
-| 特性 | 说明 |
-| --- | --- |
-| 基于主键 UPSERT | 主键重复的数据会被更新;主键不存在的记录会被插入 |
-| 基于主键去重 | Key 列具有唯一性,根据主键列对数据进行去重 |
-| 高频数据更新 | 支持高频更新场景,平衡数据更新性能与查询性能 |
-
-## 工作原理
-
-Doris 主键模型提供两种实现方式,对比如下:
-
-| 实现方式 | 默认开启版本 | 合并时机 | 查询性能 | 谓词下推 | 适用场景 |
-| --- | --- | --- | --- | --- | --- |
-| 写时合并(merge-on-write) | 1.2 版本起默认 | 写入时立即合并 | 高 | 支持 | 大多数场景,兼顾查询与写入性能 |
-| 读时合并(merge-on-read) | 1.2 版本前默认 | 查询或 Compaction 时合并 | 较低 | 不支持 | 写多读少场景 |
-
-- **写时合并**:数据在写入时立即合并相同 Key
的记录,确保存储的始终是最新数据。该模式兼顾查询和写入性能,避免多个版本的数据合并,并支持谓词下推到存储层,**大多数场景推荐使用此模式**。
-- **读时合并**:数据在写入时并不进行合并,以增量方式被追加存储,在 Doris 内保留多个版本。查询或 Compaction 时,会对相同 Key
的版本进行合并。该模式适合写多读少的场景,但在查询时需要进行多版本合并,谓词无法下推,可能影响查询速度。
-
-## 更新语义
-
-Doris 主键模型支持两种更新语义:
-
-| 更新语义 | 说明 | 实现要求 |
-| --- | --- | --- |
-| 整行更新 | Unique Key 模型默认的 UPSERT 语义:Key 存在则更新整行,不存在则插入 | 默认支持 |
-| 部分列更新 | 仅更新指定字段,保留未指定字段的原值 | 必须使用写时合并,并通过参数开启 |
-
-说明:
-
-- 在整行 `UPSERT` 语义下,即使使用 `INSERT INTO` 指定部分列写入,Doris 也会在 Planner 中将未提供的列填充为
NULL 或默认值。
-- 部分列更新的使用方式请参阅 [部分列更新](../../data-operate/update/update-of-unique-model)。
-
-## 建表示例
-
-在建表时使用 `UNIQUE KEY` 关键字指定主键表,并通过 `enable_unique_key_merge_on_write` 属性控制实现方式。
-
-### 写时合并
-
-自 Doris 2.1 版本起,默认开启写时合并:
+使用 `UNIQUE KEY` 关键字声明主键。写时合并为默认实现,适用于几乎所有场景,因此无需额外设置属性:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_unique
@@ -73,49 +33,28 @@ CREATE TABLE IF NOT EXISTS example_tbl_unique
sex TINYINT
)
UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "true"
-);
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
```
-### 读时合并
+每一行由 `(user_id, user_name)` 唯一标识。写入一行时,若其 Key 已存在则覆盖该行,否则插入新行。
-在 Doris 2.1 版本之前,默认开启读时合并;2.1 版本起需通过显式关闭 `enable_unique_key_merge_on_write`
属性指定:
+## 写入数据(Upsert)
-```sql
-CREATE TABLE IF NOT EXISTS example_tbl_unique
-(
- user_id LARGEINT NOT NULL,
- user_name VARCHAR(50) NOT NULL,
- city VARCHAR(20),
- age SMALLINT,
- sex TINYINT
-)
-UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "false"
-);
-```
-
-## 数据插入与存储
-
-在主键表中,Key 列不仅用于排序,还用于去重;插入数据时,相同 Key 的记录会被覆盖。
+无论使用哪种导入方式,Doris 都会按 Key 进行 upsert:Key 已存在的记录会被更新,Key 不存在的记录会被插入。Key
列同时用于排序和去重。下面的示例使用 `INSERT`,Stream Load、Broker Load、Routine Load 等方式的行为相同。

-如下示例所示,原表中有 4 行数据,插入 2 行后,新数据基于主键完成更新:
+下面的示例中,原表有 4 行。以已存在的 Key 重新写入 2 行后,它们被就地更新:
```sql
--- 插入原始数据
+-- 写入原始数据
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 26, 1),
(102, 'Jason', 'BJ', 27, 1),
(103, 'Juice', 'SH', 20, 2),
(104, 'Olivia', 'SZ', 22, 2);
--- 基于 Key 进行更新
+-- 以相同 Key 重新写入新值
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 27, 1),
(102, 'Jason', 'SH', 28, 1);
@@ -132,11 +71,23 @@ SELECT * FROM example_tbl_unique;
+---------+-----------+------+------+------+
```
+在整行 `UPSERT` 语义下,即使 `INSERT INTO` 只指定了部分列,Doris 也会用 NULL 或默认值填充未指定的列。
+
+## 仅更新部分列
+
+如需只修改少量字段而不重写整行,使用部分列更新。它要求采用写时合并实现(默认),并通过参数开启。详见[部分列更新](../../data-operate/update/update-of-unique-model)。
+
+## 选择实现方式
+
+主键模型有两种存储实现:**写时合并**(默认,推荐用于大多数场景)与**读时合并**(适合写多读少的场景)。实现方式在建表时确定,之后无法通过
schema change 修改。
+
+关于两种实现的工作原理、性能权衡,以及如何启用读时合并,见[写时合并](./merge-on-write)。
+
## 注意事项
使用主键模型时,请注意以下限制:
-1. **实现方式不可变更**:Unique 表的实现方式(写时合并 / 读时合并)只能在建表时指定,无法通过 schema change 修改;
-2. **整行 UPSERT 会填充缺省值**:在整行 `UPSERT` 语义下,即使使用 `INSERT INTO` 指定部分列写入,Doris 也会在
Planner 中将未提供的列填充为 NULL 或默认值;
-3. **部分列更新需启用写时合并**:如需更新部分字段,必须使用写时合并实现,并通过特定参数开启部分列更新支持,详见
[部分列更新](../../data-operate/update/update-of-unique-model);
-4. **分区键必须包含在 Key 列内**:为保证数据的唯一性,分区键必须是 Key 列的子集。
+1. **实现方式不可更改**:写时合并或读时合并只能在建表时指定,无法通过 schema change 修改。
+2. **整行 UPSERT 填充默认值**:即使 `INSERT INTO` 只指定部分列,Doris 也会用 NULL 或默认值填充未指定的列。
+3.
**部分列更新需要写时合并**:如需只更新部分列,使用写时合并并通过参数开启部分列更新。详见[部分列更新](../../data-operate/update/update-of-unique-model)。
+4. **分区列必须是 Key 列的子集**:Doris 以此保证数据唯一性。
diff --git a/sidebars.ts b/sidebars.ts
index 34b9d609ff1..d9cdc2c2fcd 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -199,7 +199,14 @@ const sidebars: SidebarsConfig = {
link: {type: 'doc', id: 'table-design/data-model/intro'},
items: [
'table-design/data-model/duplicate',
- 'table-design/data-model/unique',
+ {
+ type: 'category',
+ label: 'Unique Key Model',
+ link: {type: 'doc', id:
'table-design/data-model/unique'},
+ items: [
+ 'table-design/data-model/merge-on-write',
+ ],
+ },
'table-design/data-model/aggregate',
'table-design/data-model/tips',
],
diff --git
a/versioned_docs/version-4.x/table-design/data-model/merge-on-write.md
b/versioned_docs/version-4.x/table-design/data-model/merge-on-write.md
new file mode 100644
index 00000000000..94b01c804db
--- /dev/null
+++ b/versioned_docs/version-4.x/table-design/data-model/merge-on-write.md
@@ -0,0 +1,99 @@
+---
+{
+ "title": "Merge-on-Write",
+ "language": "en",
+ "description": "How the Doris Unique Key Model guarantees Key uniqueness:
the merge-on-write and merge-on-read implementations, their performance
trade-offs, and how to choose between them."
+}
+---
+
+**Merge-on-write** and **merge-on-read** are the two storage implementations
of the [Unique Key Model](./unique). Both keep Key columns unique and return
the same query results, but they resolve duplicate Keys at different times, and
that affects read and write performance. Merge-on-write is the default and
works best for most workloads.
+
+<!-- Knowledge type: Concept -->
+<!-- Applicable scenarios: Choosing a Unique Key implementation -->
+
+## Comparison
+
+| Implementation | Merge timing | Write performance | Query performance |
Applicable scenarios |
+| --- | --- | --- | --- | --- |
+| Merge-on-write (default) | At write time | Moderate | High | Most scenarios,
balancing query and write performance |
+| Merge-on-read | At query or compaction time | High | Lower | Write-heavy,
read-light scenarios |
+
+## How Merge-on-Write Works
+
+With merge-on-write, Doris resolves duplicate Keys as data is written. For
each incoming row, it checks whether the Key already exists and marks any
previous version as deleted in a delete bitmap. Queries use the bitmap to skip
the marked rows, so they read only the latest version. Compaction later removes
those rows from disk.
+
+Because duplicates are resolved at write time:
+
+- Queries read a single version per Key, with no read-time merge.
+- Doris can push filters down to the storage layer, so scans skip data that
can't match.
+- Read performance does not degrade as past updates accumulate.
+
+The cost is on the write side. Each upsert looks up the primary key to find
and mark the previous version, which adds some write overhead compared with
merge-on-read. For most workloads this trade-off is worth it, so merge-on-write
is the default.
+
+## How Merge-on-Read Works
+
+With merge-on-read, writes only append data. Doris keeps every version of a
Key and merges them at query time or during compaction, returning the latest
version.
+
+Because duplicates are resolved at read time:
+
+- Writes are lightweight, because they skip the primary-key lookup.
+- Every query has to merge the versions of each Key, and Doris can't push
filters down, so queries are slower.
+- Query latency grows as versions accumulate between compactions.
+
+It fits write-heavy, read-light pipelines, where write throughput matters more
than query speed.
+
+## Choosing Between Them
+
+- **Use merge-on-write (default)** for most workloads, including real-time
updates, dimension synchronization, and any case where query performance
matters. It also enables features such as partial column updates.
+- **Use merge-on-read** only when you have far more writes than reads and you
need to minimize write overhead.
+
+The implementation is fixed at table creation and **cannot be changed later
through schema change**, so decide before you create the table.
+
+## Enabling Each Implementation
+
+The `enable_unique_key_merge_on_write` table property controls the
implementation.
+
+Merge-on-write (default):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "true"
+);
+```
+
+Merge-on-read (set the property to `false`):
+
+```sql
+CREATE TABLE IF NOT EXISTS example_tbl_unique
+(
+ user_id LARGEINT NOT NULL,
+ user_name VARCHAR(50) NOT NULL,
+ city VARCHAR(20),
+ age SMALLINT,
+ sex TINYINT
+)
+UNIQUE KEY(user_id, user_name)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "false"
+);
+```
+
+## Capabilities That Require Merge-on-Write
+
+[Partial column update](../../data-operate/update/update-of-unique-model)
requires merge-on-write: it updates a subset of columns without rewriting the
whole row.
+
+## Related
+
+- [Unique Key Model](./unique)
+- [Partial Column Update](../../data-operate/update/update-of-unique-model)
diff --git a/versioned_docs/version-4.x/table-design/data-model/unique.md
b/versioned_docs/version-4.x/table-design/data-model/unique.md
index 06f6169902e..acd107cb77f 100644
--- a/versioned_docs/version-4.x/table-design/data-model/unique.md
+++ b/versioned_docs/version-4.x/table-design/data-model/unique.md
@@ -6,62 +6,22 @@
}
---
-The Unique Key Model is designed for business scenarios that require data
updates. This model guarantees the uniqueness of Key columns: when data is
inserted or updated, new data overwrites old data with the same Key, ensuring
that records always reflect the latest version.
+The Unique Key Model keeps Key columns unique: when you insert or update a
row, the new data overwrites any existing row with the same Key, so the table
always holds the latest version. Use it when your data is updated by primary
key.
<!-- Knowledge type: Data model -->
<!-- Applicable scenarios: Data update / High-frequency writes / Primary key
deduplication -->
-## Applicable Scenarios
+## When to Use
-The Unique Key Model is mainly suitable for the following three categories of
business scenarios:
+The Unique Key Model fits three main scenarios:
1. **High-frequency data updates**: Real-time synchronization of dimension
tables from upstream OLTP databases, which requires efficient UPSERT operations.
-2. **Efficient data deduplication**: In ad delivery, customer relationship
management (CRM), and similar systems, deduplication is performed efficiently
based on user IDs.
-3. **Partial column updates**: For profile tagging scenarios where dynamic
tags change frequently, or for order consumption scenarios where the
transaction status changes, the partial column update capability of the Unique
Key Model can be used.
+2. **Efficient data deduplication**: In ad delivery, customer relationship
management (CRM), and similar systems, records are deduplicated by user ID.
+3. **Partial column updates**: For profile tagging where dynamic tags change
frequently, or order scenarios where the transaction status changes, you can
update only the affected columns.
-## Core Features
+## Create a Unique Key Table
-The Unique Key Model provides the following core features:
-
-| Feature | Description |
-| --- | --- |
-| UPSERT based on primary key | Records with duplicate primary keys are
updated; records whose primary keys do not exist are inserted |
-| Deduplication based on primary key | Key columns are unique, and data is
deduplicated by the primary key columns |
-| High-frequency data updates | Supports high-frequency update scenarios while
balancing update performance and query performance |
-
-## How It Works
-
-The Doris Unique Key Model provides two implementations, compared as follows:
-
-| Implementation | Default since version | Merge timing | Query performance |
Predicate pushdown | Applicable scenarios |
-| --- | --- | --- | --- | --- | --- |
-| Merge-on-write | Default since 2.1 (introduced in 1.2) | Merged immediately
at write time | High | Supported | Most scenarios, balancing query and write
performance |
-| Merge-on-read | Default before 2.1 | Merged at query or compaction time |
Lower | Not supported | Write-heavy, read-light scenarios |
-
-- **Merge-on-write**: Records with the same Key are merged immediately at
write time, ensuring that the storage always holds the latest data. This mode
balances query and write performance, avoids merging data across multiple
versions, and supports predicate pushdown to the storage layer. **This mode is
recommended for most scenarios.**
-- **Merge-on-read**: Data is not merged at write time but is appended
incrementally, with multiple versions retained inside Doris. At query or
compaction time, versions with the same Key are merged. This mode suits
write-heavy, read-light scenarios, but queries must merge multiple versions and
predicates cannot be pushed down, which may affect query speed.
-
-## Update Semantics
-
-The Doris Unique Key Model supports two update semantics:
-
-| Update semantics | Description | Implementation requirement |
-| --- | --- | --- |
-| Whole-row update | The default UPSERT semantics of the Unique Key Model: if
the Key exists, the entire row is updated; otherwise, a new row is inserted |
Supported by default |
-| Partial column update | Only the specified columns are updated, and the
original values of unspecified columns are kept | Must use merge-on-write and
be enabled through a parameter |
-
-Notes:
-
-- Under whole-row `UPSERT` semantics, even if `INSERT INTO` specifies only
some columns, Doris fills the unspecified columns with NULL or default values
in the planner.
-- For how to use partial column updates, see [Partial Column
Update](../../data-operate/update/update-of-unique-model).
-
-## Table Creation Examples
-
-Use the `UNIQUE KEY` keyword to define a Unique Key table at table creation
time, and use the `enable_unique_key_merge_on_write` property to control the
implementation.
-
-### Merge-on-write
-
-Since Doris 2.1, merge-on-write is enabled by default:
+Declare the primary key with the `UNIQUE KEY` keyword. Merge-on-write is on by
default and is right for almost all workloads, so you don't need any extra
property:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_unique
@@ -73,39 +33,18 @@ CREATE TABLE IF NOT EXISTS example_tbl_unique
sex TINYINT
)
UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "true"
-);
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
```
-### Merge-on-read
+Each row is identified by `(user_id, user_name)`. Writing a row whose Key
already exists overwrites it; a row with a new Key is inserted.
-Before Doris 2.1, merge-on-read was enabled by default. Starting from 2.1, you
must explicitly disable the `enable_unique_key_merge_on_write` property to use
it:
+## Upsert Data
-```sql
-CREATE TABLE IF NOT EXISTS example_tbl_unique
-(
- user_id LARGEINT NOT NULL,
- user_name VARCHAR(50) NOT NULL,
- city VARCHAR(20),
- age SMALLINT,
- sex TINYINT
-)
-UNIQUE KEY(user_id, user_name)
-DISTRIBUTED BY HASH(user_id) BUCKETS 10
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "false"
-);
-```
-
-## Data Insertion and Storage
-
-In a Unique Key table, the Key columns are used not only for sorting but also
for deduplication. When data is inserted, records with the same Key are
overwritten.
+Doris upserts by Key no matter which load method you use: rows with an
existing Key are updated, and rows with a new Key are inserted. Doris uses the
Key columns for both sorting and deduplication. The example below uses
`INSERT`; Stream Load, Broker Load, and Routine Load work the same way.

-In the following example, the original table has 4 rows. After inserting 2
rows, the new data updates the table based on the primary key:
+In the following example, the original table has 4 rows. Re-inserting 2 rows
with existing Keys updates them in place:
```sql
-- Insert original data
@@ -115,7 +54,7 @@ INSERT INTO example_tbl_unique VALUES
(103, 'Juice', 'SH', 20, 2),
(104, 'Olivia', 'SZ', 22, 2);
--- Update based on Key
+-- Re-insert the same Keys with new values
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 27, 1),
(102, 'Jason', 'SH', 28, 1);
@@ -132,11 +71,23 @@ SELECT * FROM example_tbl_unique;
+---------+-----------+------+------+------+
```
+With a whole-row upsert, if your `INSERT INTO` lists only some columns, Doris
fills the rest with NULL or default values.
+
+## Update Only Some Columns
+
+To change a few fields without rewriting the whole row, use partial column
update. It needs merge-on-write (the default) and is turned on with a
parameter. See [Partial Column
Update](../../data-operate/update/update-of-unique-model).
+
+## Choose an Implementation
+
+The Unique Key Model has two storage implementations: **merge-on-write** (the
default, recommended for most workloads) and **merge-on-read** (suited to
write-heavy, read-light pipelines). The implementation is fixed at table
creation and cannot be changed later through schema change.
+
+For how each one works, their performance trade-offs, and how to enable
merge-on-read, see [Merge-on-Write](./merge-on-write).
+
## Notes
When using the Unique Key Model, note the following limitations:
-1. **The implementation cannot be changed**: The implementation of a Unique
table (merge-on-write / merge-on-read) can only be specified at table creation
time and cannot be modified through schema change.
-2. **Whole-row UPSERT fills in default values**: Under whole-row `UPSERT`
semantics, even if `INSERT INTO` specifies only some columns, Doris fills the
unspecified columns with NULL or default values in the planner.
-3. **Partial column updates require merge-on-write**: To update only some
columns, you must use the merge-on-write implementation and enable partial
column update support through a specific parameter. For details, see [Partial
Column Update](../../data-operate/update/update-of-unique-model).
-4. **Partition keys must be a subset of Key columns**: To guarantee data
uniqueness, partition keys must be a subset of the Key columns.
+1. **The implementation cannot be changed**: merge-on-write or merge-on-read
can only be set at table creation and cannot be modified through schema change.
+2. **Whole-row UPSERT fills in default values**: even if `INSERT INTO`
specifies only some columns, Doris fills the unspecified columns with NULL or
default values.
+3. **Partial column updates require merge-on-write**: to update only some
columns, use merge-on-write and turn on partial column update with a parameter.
See [Partial Column Update](../../data-operate/update/update-of-unique-model).
+4. **Partition keys must be a subset of Key columns**: Doris requires this to
guarantee data uniqueness.
diff --git a/versioned_sidebars/version-4.x-sidebars.json
b/versioned_sidebars/version-4.x-sidebars.json
index 1963c0873aa..089ec16ed64 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -235,7 +235,17 @@
},
"items": [
"table-design/data-model/duplicate",
- "table-design/data-model/unique",
+ {
+ "type": "category",
+ "label": "Unique Key Model",
+ "link": {
+ "type": "doc",
+ "id": "table-design/data-model/unique"
+ },
+ "items": [
+ "table-design/data-model/merge-on-write"
+ ]
+ },
"table-design/data-model/aggregate",
"table-design/data-model/tips"
]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]