This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new ca2599c0bc5 [fix] auto inc opt to 3.0 and 2.1 (#1595)
ca2599c0bc5 is described below
commit ca2599c0bc5f22fce91c3a91ad97e55df06ece21
Author: Yongqiang YANG <[email protected]>
AuthorDate: Wed Dec 25 17:22:37 2024 +0800
[fix] auto inc opt to 3.0 and 2.1 (#1595)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---------
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: KassieZ <[email protected]>
---
docs/table-design/auto-increment.md | 40 ++++---
.../current/table-design/auto-increment.md | 36 +++++--
.../version-2.1/table-design/auto-increment.md | 38 +++++--
.../version-3.0/table-design/auto-increment.md | 38 +++++--
.../version-2.1/table-design/auto-increment.md | 115 ++++++++++++--------
.../version-3.0/table-design/auto-increment.md | 116 +++++++++++++--------
6 files changed, 255 insertions(+), 128 deletions(-)
diff --git a/docs/table-design/auto-increment.md
b/docs/table-design/auto-increment.md
index 487c47b89f0..b1f726d1631 100644
--- a/docs/table-design/auto-increment.md
+++ b/docs/table-design/auto-increment.md
@@ -24,22 +24,27 @@ specific language governing permissions and limitations
under the License.
-->
-When importing data, Doris automatically assigns unique values to rows that do
not have specified values in the **auto-increment column**. This feature
simplifies data import workflows while maintaining flexibility.
+When writing data, Doris automatically assigns unique values to rows that do
not have specified values in the **auto-increment column**.
---
## Functionality
-For tables with an auto-increment column, Doris processes data imports as
follows:
+For tables with an auto-increment column, Doris processes data writes as
follows:
- **Auto-Population (Column Excluded)**:
- If the imported data does not include the auto-increment column, Doris
generates and populates unique values for this column.
+ If the written data does not include the auto-increment column, Doris
generates and populates unique values for this column.
- **Partial Specification (Column Included)**:
- - **Null Values**: Doris replaces null values in the imported data with
system-generated unique values.
- - **Non-Null Values**: User-provided values remain unchanged.
- > **Important**: User-provided non-null values can disrupt the uniqueness of
the auto-increment column.
+ - **Null Values**: Doris replaces null values in the written data with
system-generated unique values.
+
+ - **Non-Null Values**: User-provided values remain unchanged.
+
+ :::caution Attention
+ User-provided non-null values can disrupt the uniqueness of the
auto-increment column.
+ :::
+
---
### Uniqueness
@@ -56,9 +61,12 @@ Doris guarantees **table-wide uniqueness** for values it
generates in the auto-i
Auto-increment values generated by Doris are generally **dense** but with some
considerations:
- **Potential Gaps**: Gaps may appear due to performance optimizations. Each
backend node (BE) pre-allocates a block of unique values for efficiency, and
these blocks do not overlap between nodes.
-- **Non-Chronological Values**: Doris does not guarantee that values generated
in later imports are larger than those from earlier imports.
- > **Note**: Auto-increment values cannot be used to infer the chronological
order of imports.
+- **Non-Chronological Values**: Doris does not guarantee that values generated
in later writes are larger than those from earlier writes.
+ :::info Note
+ Auto-increment values cannot be used to infer the chronological order of
writes.
+ :::
+
---
## Syntax
@@ -150,7 +158,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
## Usage
-### Import
+### Loading
Consider the table below:
@@ -168,7 +176,7 @@ PROPERTIES (
);
```
-When using the insert into statement to import data without including the
auto-increment column `id`, Doris automatically generates and fills unique
values for the column.
+When using the insert into statement to write data without including the
auto-increment column `id`, Doris automatically generates and fills unique
values for the column.
```sql
mysql> insert into tbl(name, value) values("Bob", 10), ("Alice", 20), ("Jack",
30);
@@ -186,7 +194,7 @@ mysql> select * from tbl order by id;
3 rows in set (0.05 sec)
```
-Similarly, when using stream load to import the file `test.csv` without
specifying the auto-increment column `id`, Doris will automatically populate
the `id` column with generated values.
+Similarly, when using stream load to load the file `test.csv` without
specifying the auto-increment column `id`, Doris will automatically populate
the `id` column with generated values.
test.csv:
```
@@ -211,7 +219,7 @@ mysql> select * from tbl order by id;
+------+-------+-------+
5 rows in set (0.04 sec)
```
-When importing data using the `INSERT INTO` statement and specifying the
auto-increment column `id`, any null values in the imported data for that
column will be replaced with generated values.
+When writing data using the `INSERT INTO` statement and specifying the
auto-increment column `id`, any null values in the written data for that column
will be replaced with generated values.
```sql
mysql> insert into tbl(id, name, value) values(null, "Doris", 60), (null,
"Nereids", 70);
@@ -237,7 +245,7 @@ mysql> select * from tbl order by id;
When performing a partial update on a merge-on-write Unique table with an
auto-increment column:
-If the auto-increment column is a key column, users must explicitly specify it
during partial updates. As a result, the target columns for partial updates
must include the auto-increment column. In this case, the import behavior
aligns with that of standard partial updates.
+If the auto-increment column is a key column, users must explicitly specify it
during partial updates. As a result, the target columns for partial updates
must include the auto-increment column. In this case, the behavior aligns with
that of standard partial updates.
```sql
mysql> CREATE TABLE `demo`.`tbl2` (
@@ -289,7 +297,7 @@ mysql> select * from tbl2 order by id;
4 rows in set (0.04 sec)
```
-When the auto-increment column is a non-key column and no value is provided,
its value will be derived from existing rows in the table. If a value is
specified for the auto-increment column, null values in the imported data will
be replaced with generated values, while non-null values will remain unchanged.
These records will then be processed according to the semantics of partial
updates.
+When the auto-increment column is a non-key column and no value is provided,
its value will be derived from existing rows in the table. If a value is
specified for the auto-increment column, null values in the written data will
be replaced with generated values, while non-null values will remain unchanged.
These records will then be processed according to the semantics of partial
updates.
```sql
mysql> CREATE TABLE `demo`.`tbl3` (
@@ -396,14 +404,14 @@ PROPERTIES (
);
```
-Import the `user_id` values from existing data into the dictionary table to
map `user_id` to corresponding integer values:
+Write the `user_id` values from existing data into the dictionary table to map
`user_id` to corresponding integer values:
```sql
insert into dictionary_tbl(user_id)
select user_id from dwd_dup_tbl group by user_id;
```
-Alternatively, import only the `user_id` values from incremental data into the
dictionary table.
+Alternatively, write only the `user_id` values from incremental data into the
dictionary table.
```sql
insert into dictionary_tbl(user_id)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/auto-increment.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/auto-increment.md
index fb348308995..6d40aa2bc07 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/auto-increment.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/auto-increment.md
@@ -28,20 +28,44 @@ under the License.
在导入数据时,Doris 会为在自增列上没有指定值的数据行分配一个表内唯一的值。
-## 功能说明
+---
+
+## 功能
+
+对于具有自增列的表,Doris处理数据写入的方式如下:
-对于含有自增列的表,用户在在导入数据时:
-- 如果导入的目标列中不包含自增列,则自增列将会被 Doris 自动生成的值填充。
-- 如果导入的目标列中包含自增列,则导入数据中该列中的 null 值将会被 Doris 自动生成的值替换,非 null 值则保持不变。需要注意**非
null 值会破坏自增列值的唯一性**。
+- **自动填充(列排除)**:
+ 如果写入的数据不包括自增列,Doris会生成并填充该列的唯一值。
+
+- **部分指定(列包含)**:
+ - **空值**:Doris会用系统生成的唯一值替换写入数据中的空值。
+ - **非空值**:用户提供的值保持不变。
+
+ :::caution 重要
+ 用户提供的非空值可能会破坏自增列的唯一性。
+ :::
+
+---
### 唯一性
-Doris 保证了自增列上生成的值具有**表内唯一性**。但需要注意的是,**自增列的唯一性仅保证由 Doris
自动填充的值具有唯一性,而不考虑由用户提供的值**,如果用户同时对该表通过显示指定自增列的方式插入了用户提供的值,则不能保证这个唯一性。
+Doris保证自增列中生成的值具有**表级唯一性**。但是:
+
+- **保证唯一性**:这仅适用于系统生成的值。
+- **用户提供的值**:Doris不会验证或强制执行用户在自增列中指定的值的唯一性。这可能导致重复条目。
+
+---
### 聚集性
-Doris
保证自增列上自动生成的值是稠密的,但**不能保证**在一次导入中自动填充的自增列的值是完全连续的,因此可能会出现一次导入中自增列自动填充的值具有一定的跳跃性的现象。这是因为出于性能考虑,每个
BE 上都会缓存一部分预先分配的自增列的值,每个 BE 上缓存的值互不相交。此外,由于缓存的存在,Doris
不能保证在物理时间上后一次导入的数据在自增列上自动生成的值比前一次更大。因此,不能根据自增列分配出的值的大小来判断导入时间上的先后顺序。
+Doris生成的自增值通常是**密集的**,但有一些考虑:
+
+- **潜在的间隙**:由于性能优化,可能会出现间隙。每个后端节点(BE)会预分配一块唯一值以提高效率,这些块在节点之间不重叠。
+- **非时间顺序值**:Doris不保证后续写入生成的值大于早期写入的值。
+ :::info 注意
+ 自增值不能用于推断写入的时间顺序。
+ :::
## 语法
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/auto-increment.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/auto-increment.md
index df87f4056e6..6d40aa2bc07 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/auto-increment.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/auto-increment.md
@@ -28,24 +28,48 @@ under the License.
在导入数据时,Doris 会为在自增列上没有指定值的数据行分配一个表内唯一的值。
-## 功能说明
+---
+
+## 功能
+
+对于具有自增列的表,Doris处理数据写入的方式如下:
-对于含有自增列的表,用户在在导入数据时:
-- 如果导入的目标列中不包含自增列,则自增列将会被 Doris 自动生成的值填充。
-- 如果导入的目标列中包含自增列,则导入数据中该列中的 null 值将会被 Doris 自动生成的值替换,非 null 值则保持不变。需要注意**非
null 值会破坏自增列值的唯一性**。
+- **自动填充(列排除)**:
+ 如果写入的数据不包括自增列,Doris会生成并填充该列的唯一值。
+
+- **部分指定(列包含)**:
+ - **空值**:Doris会用系统生成的唯一值替换写入数据中的空值。
+ - **非空值**:用户提供的值保持不变。
+
+ :::caution 重要
+ 用户提供的非空值可能会破坏自增列的唯一性。
+ :::
+
+---
### 唯一性
-Doris 保证了自增列上生成的值具有**表内唯一性**。但需要注意的是,**自增列的唯一性仅保证由 Doris
自动填充的值具有唯一性,而不考虑由用户提供的值**,如果用户同时对该表通过显示指定自增列的方式插入了用户提供的值,则不能保证这个唯一性。
+Doris保证自增列中生成的值具有**表级唯一性**。但是:
+
+- **保证唯一性**:这仅适用于系统生成的值。
+- **用户提供的值**:Doris不会验证或强制执行用户在自增列中指定的值的唯一性。这可能导致重复条目。
+
+---
### 聚集性
-Doris
保证自增列上自动生成的值是稠密的,但**不能保证**在一次导入中自动填充的自增列的值是完全连续的,因此可能会出现一次导入中自增列自动填充的值具有一定的跳跃性的现象。这是因为出于性能考虑,每个
BE 上都会缓存一部分预先分配的自增列的值,每个 BE 上缓存的值互不相交。此外,由于缓存的存在,Doris
不能保证在物理时间上后一次导入的数据在自增列上自动生成的值比前一次更大。因此,不能根据自增列分配出的值的大小来判断导入时间上的先后顺序。
+Doris生成的自增值通常是**密集的**,但有一些考虑:
+
+- **潜在的间隙**:由于性能优化,可能会出现间隙。每个后端节点(BE)会预分配一块唯一值以提高效率,这些块在节点之间不重叠。
+- **非时间顺序值**:Doris不保证后续写入生成的值大于早期写入的值。
+ :::info 注意
+ 自增值不能用于推断写入的时间顺序。
+ :::
## 语法
-要使用自增列,需要在建表[CREATE-TABLE](../sql-manual/sql-statements/table-and-view/table/CREATE-TABLE)时为对应的列添加`AUTO_INCREMENT`属性。若要手动指定自增列起始值,可以通过建表时`AUTO_INCREMENT(start_value)`语句指定,如果未指定,则默认起始值为
1。
+要使用自增列,需要在建表[CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时为对应的列添加`AUTO_INCREMENT`属性。若要手动指定自增列起始值,可以通过建表时`AUTO_INCREMENT(start_value)`语句指定,如果未指定,则默认起始值为
1。
### 示例
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/auto-increment.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/auto-increment.md
index df87f4056e6..407f94857f3 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/auto-increment.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/auto-increment.md
@@ -28,24 +28,48 @@ under the License.
在导入数据时,Doris 会为在自增列上没有指定值的数据行分配一个表内唯一的值。
-## 功能说明
+---
+
+## 功能
+
+对于具有自增列的表,Doris处理数据写入的方式如下:
-对于含有自增列的表,用户在在导入数据时:
-- 如果导入的目标列中不包含自增列,则自增列将会被 Doris 自动生成的值填充。
-- 如果导入的目标列中包含自增列,则导入数据中该列中的 null 值将会被 Doris 自动生成的值替换,非 null 值则保持不变。需要注意**非
null 值会破坏自增列值的唯一性**。
+- **自动填充(列排除)**:
+ 如果写入的数据不包括自增列,Doris会生成并填充该列的唯一值。
+
+- **部分指定(列包含)**:
+ - **空值**:Doris会用系统生成的唯一值替换写入数据中的空值。
+ - **非空值**:用户提供的值保持不变。
+
+ :::caution 重要
+ 用户提供的非空值可能会破坏自增列的唯一性。
+ :::
+
+---
### 唯一性
-Doris 保证了自增列上生成的值具有**表内唯一性**。但需要注意的是,**自增列的唯一性仅保证由 Doris
自动填充的值具有唯一性,而不考虑由用户提供的值**,如果用户同时对该表通过显示指定自增列的方式插入了用户提供的值,则不能保证这个唯一性。
+Doris保证自增列中生成的值具有**表级唯一性**。但是:
+
+- **保证唯一性**:这仅适用于系统生成的值。
+- **用户提供的值**:Doris不会验证或强制执行用户在自增列中指定的值的唯一性。这可能导致重复条目。
+
+---
### 聚集性
-Doris
保证自增列上自动生成的值是稠密的,但**不能保证**在一次导入中自动填充的自增列的值是完全连续的,因此可能会出现一次导入中自增列自动填充的值具有一定的跳跃性的现象。这是因为出于性能考虑,每个
BE 上都会缓存一部分预先分配的自增列的值,每个 BE 上缓存的值互不相交。此外,由于缓存的存在,Doris
不能保证在物理时间上后一次导入的数据在自增列上自动生成的值比前一次更大。因此,不能根据自增列分配出的值的大小来判断导入时间上的先后顺序。
+Doris生成的自增值通常是**密集的**,但有一些考虑:
+- **潜在的间隙**:由于性能优化,可能会出现间隙。每个后端节点(BE)会预分配一块唯一值以提高效率,这些块在节点之间不重叠。
+- **非时间顺序值**:Doris不保证后续写入生成的值大于早期写入的值。
+ :::info 注意
+ 自增值不能用于推断写入的时间顺序。
+ :::
+
## 语法
-要使用自增列,需要在建表[CREATE-TABLE](../sql-manual/sql-statements/table-and-view/table/CREATE-TABLE)时为对应的列添加`AUTO_INCREMENT`属性。若要手动指定自增列起始值,可以通过建表时`AUTO_INCREMENT(start_value)`语句指定,如果未指定,则默认起始值为
1。
+要使用自增列,需要在建表[CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时为对应的列添加`AUTO_INCREMENT`属性。若要手动指定自增列起始值,可以通过建表时`AUTO_INCREMENT(start_value)`语句指定,如果未指定,则默认起始值为
1。
### 示例
diff --git a/versioned_docs/version-2.1/table-design/auto-increment.md
b/versioned_docs/version-2.1/table-design/auto-increment.md
index 3a79974454d..49eb1b4eb60 100644
--- a/versioned_docs/version-2.1/table-design/auto-increment.md
+++ b/versioned_docs/version-2.1/table-design/auto-increment.md
@@ -1,6 +1,6 @@
---
{
- "title": "Using AUTO_INCREMENT",
+ "title": "Auto-Increment Column",
"language": "en"
}
---
@@ -24,31 +24,56 @@ specific language governing permissions and limitations
under the License.
-->
-# AUTO_INCREMENT Column
+When writing data, Doris automatically assigns unique values to rows that do
not have specified values in the **auto-increment column**.
-When importing data, Doris assigns a table-unique value to rows that do not
have specified values in the auto-increment column.
+---
## Functionality
-For tables containing an auto-increment column, during data import:
-- If the target columns don't include the auto-increment column, Doris will
populate the auto-increment column with generated values.
-- If the target columns include the auto-increment column, null values in the
imported data for that column will be replaced by values generated by Doris,
while non-null values will remain unchanged. Note that **non-null values can
disrupt the uniqueness of the auto-increment column values**.
+For tables with an auto-increment column, Doris processes data writes as
follows:
+
+- **Auto-Population (Column Excluded)**:
+ If the written data does not include the auto-increment column, Doris
generates and populates unique values for this column.
+
+- **Partial Specification (Column Included)**:
+ - **Null Values**: Doris replaces null values in the written data with
system-generated unique values.
+ - **Non-Null Values**: User-provided values remain unchanged.
+
+ :::caution Attention
+ User-provided non-null values can disrupt the uniqueness of the
auto-increment column.
+ :::
+
+---
### Uniqueness
-Doris ensures that values generated on the auto-increment column have
**table-wide uniqueness**. However, it's important to note that **the
uniqueness of the auto-increment column only guarantees uniqueness for values
automatically filled by Doris and does not consider values provided by users**.
If a user explicitly inserts user-provided values for this table by specifying
the auto-increment column, this uniqueness cannot be guaranteed.
+Doris guarantees **table-wide uniqueness** for values it generates in the
auto-increment column. However:
+
+- **Guaranteed Uniqueness**: This applies only to system-generated values.
+- **User-Provided Values**: Doris does not validate or enforce uniqueness for
values specified by users in the auto-increment column. This may result in
duplicate entries.
+
+---
### Density
-Doris ensures that the values generated on the auto-increment column are
dense, but it **cannot guarantee** that the values automatically generated in
the auto-increment column during an import will be entirely contiguous. Thus,
there might be some jumps in the values generated by the auto-increment column
during an import. This is because, for performance consideration, each BE
caches a portion of pre-allocated auto-increment column values, and these
cached values do not intersect betwe [...]
+Auto-increment values generated by Doris are generally **dense** but with some
considerations:
+
+- **Potential Gaps**: Gaps may appear due to performance optimizations. Each
backend node (BE) pre-allocates a block of unique values for efficiency, and
these blocks do not overlap between nodes.
+- **Non-Chronological Values**: Doris does not guarantee that values generated
in later writes are larger than those from earlier writes.
+
+:::info Note
+Auto-increment values cannot be used to infer the chronological order of
writes.
+:::
+
+---
## Syntax
-To use auto-increment columns, you need to add the `AUTO_INCREMENT` attribute
to the corresponding column during table creation
([CREATE-TABLE](../sql-manual/sql-statements/table-and-view/table/CREATE-TABLE)).
To manually specify the starting value for an auto-increment column, you can
do so by using the `AUTO_INCREMENT(start_value)` statement when creating the
table. If not specified, the default starting value is 1.
+To use auto-increment columns, you need to add the `AUTO_INCREMENT` attribute
to the corresponding column during table creation
([CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)).
To manually specify the starting value for an auto-increment column, you can
do so by using the `AUTO_INCREMENT(start_value)` statement when creating the
table. If not specified, the default starting value is 1.
### Examples
-1. Creating a Duplicate table with one key column as an auto-increment column:
+1. Creating a duplicate table with an auto-increment column as the key column.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -61,7 +86,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
"replication_allocation" = "tag.location.default: 3"
);
-2. Creating a Duplicate table with one key column as an auto-increment column,
and set start value is 100:
+2. Creating a duplicate table with an auto-increment column as the key column,
and setting the starting value to 100.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -75,7 +100,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
);
```
-3. Creating a Duplicate table with one value column as an auto-increment
column:
+3. Creating a duplicate table with an auto-increment column as one of the
value columns.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -91,7 +116,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
);
```
-4. Creating a Unique tbl table with one key column as an auto-increment column:
+4. Creating a unique table with an auto-increment column as the key column.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -107,7 +132,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
);
```
-5. Creating a Unique tbl table with one value column as an auto-increment
column:
+5. Creating a unique table with an auto-increment column as one of the value
columns.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -124,16 +149,16 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
### Constraints and Limitations
-- Only Duplicate model tables and Unique model tables can contain
auto-increment columns.
-- A table can contain at most one auto-increment column.
-- The type of the auto-increment column must be BIGINT and must be NOT NULL.
-- The manually specified starting value for an auto-increment column must be
greater than or equal to 0.
+- Auto-increment columns can only be used in Duplicate or Unique model tables.
+- A table can have only one auto-increment column.
+- The auto-increment column must be of type `BIGINT` and cannot be `NULL`.
+- The manually specified starting value for an auto-increment column must be 0
or greater.
## Usage
-### Import
+### Loading
-Consider the following table:
+Consider the table below:
```sql
CREATE TABLE `demo`.`tbl` (
@@ -149,7 +174,7 @@ PROPERTIES (
);
```
-When using the insert into statement to import data without specifying the
auto-increment column `id`, the `id` column will automatically be filled with
generated values.
+When using the insert into statement to write data without including the
auto-increment column `id`, Doris automatically generates and fills unique
values for the column.
```sql
mysql> insert into tbl(name, value) values("Bob", 10), ("Alice", 20), ("Jack",
30);
@@ -167,7 +192,7 @@ mysql> select * from tbl order by id;
3 rows in set (0.05 sec)
```
-Similarly, using stream load to import the file test.csv without specifying
the auto-increment column `id` will result in the id column being automatically
filled with generated values.
+Similarly, when using stream load to load the file `test.csv` without
specifying the auto-increment column `id`, Doris will automatically populate
the `id` column with generated values.
test.csv:
```
@@ -192,8 +217,7 @@ mysql> select * from tbl order by id;
+------+-------+-------+
5 rows in set (0.04 sec)
```
-
-When importing using insert into statement while specifying the auto-increment
column `id`, null values in the imported data for that column will be replaced
by generated values.
+When writing data using the `INSERT INTO` statement and specifying the
auto-increment column `id`, any null values in the written data for that column
will be replaced with generated values.
```sql
mysql> insert into tbl(id, name, value) values(null, "Doris", 60), (null,
"Nereids", 70);
@@ -217,9 +241,9 @@ mysql> select * from tbl order by id;
### Partial Update
-When performing a partial update on a merge-on-write Unique table containing
an auto-increment column:
+When performing a partial update on a merge-on-write Unique table with an
auto-increment column:
-If the auto-increment column is a key column, during partial updates, as users
must explicitly specify the key column, the target columns for partial column
updates must include the auto-increment column. In this scenario, the import
behavior is similar to regular partial updates.
+If the auto-increment column is a key column, users must explicitly specify it
during partial updates. As a result, the target columns for partial updates
must include the auto-increment column. In this case, the behavior aligns with
that of standard partial updates.
```sql
mysql> CREATE TABLE `demo`.`tbl2` (
@@ -271,7 +295,7 @@ mysql> select * from tbl2 order by id;
4 rows in set (0.04 sec)
```
-When the auto-increment column is a non-key column and users haven't specified
the value for the auto-increment column, the value will be filled from existing
data rows in the table. If users specify the auto-increment column, null values
in the imported data for that column will be replaced by generated values,
while non-null values will remain unchanged, and then these data will be loaded
with the semantics of partial updates.
+When the auto-increment column is a non-key column and no value is provided,
its value will be derived from existing rows in the table. If a value is
specified for the auto-increment column, null values in the written data will
be replaced with generated values, while non-null values will remain unchanged.
These records will then be processed according to the semantics of partial
updates.
```sql
mysql> CREATE TABLE `demo`.`tbl3` (
@@ -341,9 +365,10 @@ mysql> select * from tbl3 order by id;
### Dictionary Encoding
-Using bitmaps for audience analysis in user profile requires building a user
dictionary where each user corresponds to a unique integer dictionary value.
Aggregating these dictionary values can improve the performance of bitmap.
+Using bitmaps for audience analysis in user profiling involves creating a user
dictionary, where each user is assigned a unique integer as their dictionary
value. Aggregating these dictionary values can improve the performance of
bitmap operations.
+
+For example, in an offline UV (Unique Visitors) and PV (Page Views) analysis
scenario, consider a detailed user behavior table:
-Taking the offline UV and PV analysis scenario as an example, assuming there's
a detailed user behavior table:
```sql
CREATE TABLE `demo`.`dwd_dup_tbl` (
@@ -362,8 +387,7 @@ PROPERTIES (
);
```
-Using the auto-incrementa column to create the following dictionary table:
-
+Using the auto-increment column to create the following dictionary table:
```sql
CREATE TABLE `demo`.`dictionary_tbl` (
@@ -378,15 +402,14 @@ PROPERTIES (
);
```
-Import the value of `user_id` from existing data into the dictionary table,
establishing the mapping of `user_id` to integer values:
+Write the `user_id` values from existing data into the dictionary table to map
`user_id` to corresponding integer values:
```sql
insert into dictionary_tbl(user_id)
select user_id from dwd_dup_tbl group by user_id;
```
-Or import only the value of `user_id` in incrementa data into the dictionary
table alternatively:
-
+Alternatively, write only the `user_id` values from incremental data into the
dictionary table.
```sql
insert into dictionary_tbl(user_id)
@@ -394,9 +417,9 @@ select dwd_dup_tbl.user_id from dwd_dup_tbl left join
dictionary_tbl
on dwd_dup_tbl.user_id = dictionary_tbl.user_id where dwd_dup_tbl.visit_time >
'2023-12-10' and dictionary_tbl.user_id is NULL;
```
-In real-world scenarios, Flink connectors can also be employed to write data
into Doris.
+In practical applications, Flink connectors can be used to write data into
Doris.
-Assuming `dim1`, `dim3`, `dim5` represent statistical dimensions of interest
to us, create the following table to store aggregated results:
+To store aggregated results for the statistical dimensions `dim1`, `dim3`, and
`dim5`, create the following table:
```sql
CREATE TABLE `demo`.`dws_agg_tbl` (
@@ -413,7 +436,7 @@ PROPERTIES (
);
```
-Store the result of the data aggregation operations into the aggregation
result table:
+Save the aggregated data into the results table.
```sql
insert into dws_agg_tbl
@@ -421,7 +444,7 @@ select dwd_dup_tbl.dim1, dwd_dup_tbl.dim3,
dwd_dup_tbl.dim5, BITMAP_UNION(TO_BIT
from dwd_dup_tbl INNER JOIN dictionary_tbl on dwd_dup_tbl.user_id =
dictionary_tbl.user_id;
```
-Perform UV and PV queries using the following statement:
+Execute UV and PV queries with the following statement:
```sql
select dim1, dim3, dim5, user_id_bitmap as uv, pv from dws_agg_tbl;
@@ -429,7 +452,7 @@ select dim1, dim3, dim5, user_id_bitmap as uv, pv from
dws_agg_tbl;
### Efficient Pagination
-When displaying data on a page, pagination is often necessary. Traditional
pagination typically involves using `limit`, `offset`, and `order by` in SQL
queries. For instance, consider the following business table intended for
display:
+Pagination is often required when displaying data on a page. Traditional
pagination usually involves using `LIMIT`, `OFFSET`, and `ORDER BY` in SQL
queries. For example, consider the following business table designed for
display:
```sql
CREATE TABLE `demo`.`records_tbl` (
@@ -448,21 +471,21 @@ PROPERTIES (
);
```
-Assuming 100 records are displayed per page in pagination. To fetch the first
page's data, the following SQL query can be used:
+Assuming 100 records are displayed per page, the following SQL query can be
used to fetch data for the first page:
```sql
select * from records_tbl order by `key`, `name` limit 100;
```
-Fetching the data for the second page can be accomplished by:
+To fetch data for the second page, you can use the following query:
```sql
select * from records_tbl order by `key`, `name` limit 100 offset 100;
```
-However, when performing deep pagination queries (with large offsets), even if
the actual required data rows are few, this method still reads all data into
memory for full sorting before subsequent processing, which is quite
inefficient. Using an auto-incrementa column assigns a unique value to each
row, allowing the use of where `unique_value` > x limit y to filter a
significant amount of data beforehand, making pagination more efficient.
+However, when performing deep pagination queries (with large offsets), this
method can be inefficient, as it reads all data into memory for sorting before
processing, even if only a small number of rows are needed. By using an
auto-increment column, each row is assigned a unique value, enabling the use of
a query like `WHERE unique_value > x LIMIT y` to filter out a large portion of
the data in advance, making pagination more efficient.
-Continuing with the aforementioned business table, an auto-increment column is
added to the table to give each row a unique identifier:
+To illustrate this, an auto-increment column is added to the business table,
giving each row a unique identifier:
```sql
CREATE TABLE `demo`.`records_tbl2` (
@@ -482,19 +505,19 @@ PROPERTIES (
);
```
-For pagination displaying 100 records per page, to fetch the first page's
data, the following SQL query can be used:
+For pagination with 100 records per page, the following SQL query can be used
to fetch the data for the first page:
```sql
select * from records_tbl2 order by unique_value limit 100;
```
-By recording the maximum value of unique_value in the returned results, let's
assume it's 99. The following query can then fetch data for the second page:
+By recording the maximum value of `unique_value` from the returned results,
let's assume it is 99. The following query can then be used to fetch data for
the second page:
```sql
select * from records_tbl2 where unique_value > 99 order by unique_value limit
100;
```
-If directly querying contents from a later page and it's inconvenient to
directly obtain the maximum value of `unique_value` from the preceding page's
data (for instance, directly obtaining contents from the 101st page), the
following query can be used:
+If directly querying data from a later page and it's inconvenient to retrieve
the maximum value of `unique_value` from the previous page's results (for
example, when fetching data starting from the 101st page), the following query
can be used:
```sql
select key, name, address, city, nation, region, phone, mktsegment
diff --git a/versioned_docs/version-3.0/table-design/auto-increment.md
b/versioned_docs/version-3.0/table-design/auto-increment.md
index 50422aa825b..f6a7204e506 100644
--- a/versioned_docs/version-3.0/table-design/auto-increment.md
+++ b/versioned_docs/version-3.0/table-design/auto-increment.md
@@ -1,6 +1,6 @@
---
{
- "title": "Using AUTO_INCREMENT",
+ "title": "Auto-Increment Column",
"language": "en"
}
---
@@ -24,30 +24,56 @@ specific language governing permissions and limitations
under the License.
-->
+When writing data, Doris automatically assigns unique values to rows that do
not have specified values in the **auto-increment column**.
-When importing data, Doris assigns a table-unique value to rows that do not
have specified values in the auto-increment column.
+---
## Functionality
-For tables containing an auto-increment column, during data import:
-- If the target columns don't include the auto-increment column, Doris will
populate the auto-increment column with generated values.
-- If the target columns include the auto-increment column, null values in the
imported data for that column will be replaced by values generated by Doris,
while non-null values will remain unchanged. Note that **non-null values can
disrupt the uniqueness of the auto-increment column values**.
+For tables with an auto-increment column, Doris processes data writes as
follows:
+
+- **Auto-Population (Column Excluded)**:
+ If the written data does not include the auto-increment column, Doris
generates and populates unique values for this column.
+
+- **Partial Specification (Column Included)**:
+ - **Null Values**: Doris replaces null values in the written data with
system-generated unique values.
+ - **Non-Null Values**: User-provided values remain unchanged.
+
+ :::caution Attention
+ User-provided non-null values can disrupt the uniqueness of the
auto-increment column.
+ :::
+
+---
### Uniqueness
-Doris ensures that values generated on the auto-increment column have
**table-wide uniqueness**. However, it's important to note that **the
uniqueness of the auto-increment column only guarantees uniqueness for values
automatically filled by Doris and does not consider values provided by users**.
If a user explicitly inserts user-provided values for this table by specifying
the auto-increment column, this uniqueness cannot be guaranteed.
+Doris guarantees **table-wide uniqueness** for values it generates in the
auto-increment column. However:
+
+- **Guaranteed Uniqueness**: This applies only to system-generated values.
+- **User-Provided Values**: Doris does not validate or enforce uniqueness for
values specified by users in the auto-increment column. This may result in
duplicate entries.
+
+---
### Density
-Doris ensures that the values generated on the auto-increment column are
dense, but it **cannot guarantee** that the values automatically generated in
the auto-increment column during an import will be entirely contiguous. Thus,
there might be some jumps in the values generated by the auto-increment column
during an import. This is because, for performance consideration, each BE
caches a portion of pre-allocated auto-increment column values, and these
cached values do not intersect betwe [...]
+Auto-increment values generated by Doris are generally **dense** but with some
considerations:
+
+- **Potential Gaps**: Gaps may appear due to performance optimizations. Each
backend node (BE) pre-allocates a block of unique values for efficiency, and
these blocks do not overlap between nodes.
+- **Non-Chronological Values**: Doris does not guarantee that values generated
in later writes are larger than those from earlier writes.
+
+ :::info Note
+ Auto-increment values cannot be used to infer the chronological order of
writes.
+ :::
+
+---
## Syntax
-To use auto-increment columns, you need to add the `AUTO_INCREMENT` attribute
to the corresponding column during table creation
([CREATE-TABLE](../sql-manual/sql-statements/table-and-view/table/CREATE-TABLE)).
To manually specify the starting value for an auto-increment column, you can
do so by using the `AUTO_INCREMENT(start_value)` statement when creating the
table. If not specified, the default starting value is 1.
+To use auto-increment columns, you need to add the `AUTO_INCREMENT` attribute
to the corresponding column during table creation
([CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)).
To manually specify the starting value for an auto-increment column, you can
do so by using the `AUTO_INCREMENT(start_value)` statement when creating the
table. If not specified, the default starting value is 1.
-## Examples
+### Examples
-1. Creating a Duplicate table with one key column as an auto-increment column:
+1. Creating a duplicate table with an auto-increment column as the key column.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -60,7 +86,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
"replication_allocation" = "tag.location.default: 3"
);
-2. Creating a Duplicate table with one key column as an auto-increment column,
and set start value is 100:
+2. Creating a duplicate table with an auto-increment column as the key column,
and setting the starting value to 100.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -74,7 +100,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
);
```
-3. Creating a Duplicate table with one value column as an auto-increment
column:
+3. Creating a duplicate table with an auto-increment column as one of the
value columns.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -90,7 +116,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
);
```
-4. Creating a Unique tbl table with one key column as an auto-increment column:
+4. Creating a unique table with an auto-increment column as the key column.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -106,7 +132,7 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
);
```
-5. Creating a Unique tbl table with one value column as an auto-increment
column:
+5. Creating a unique table with an auto-increment column as one of the value
columns.
```sql
CREATE TABLE `demo`.`tbl` (
@@ -123,16 +149,16 @@ To use auto-increment columns, you need to add the
`AUTO_INCREMENT` attribute to
### Constraints and Limitations
-- Only Duplicate model tables and Unique model tables can contain
auto-increment columns.
-- A table can contain at most one auto-increment column.
-- The type of the auto-increment column must be BIGINT and must be NOT NULL.
-- The manually specified starting value for an auto-increment column must be
greater than or equal to 0.
+- Auto-increment columns can only be used in Duplicate or Unique model tables.
+- A table can have only one auto-increment column.
+- The auto-increment column must be of type `BIGINT` and cannot be `NULL`.
+- The manually specified starting value for an auto-increment column must be 0
or greater.
## Usage
-### Import
+### Loading
-Consider the following table:
+Consider the table below:
```sql
CREATE TABLE `demo`.`tbl` (
@@ -148,7 +174,7 @@ PROPERTIES (
);
```
-When using the insert into statement to import data without specifying the
auto-increment column `id`, the `id` column will automatically be filled with
generated values.
+When using the insert into statement to write data without including the
auto-increment column `id`, Doris automatically generates and fills unique
values for the column.
```sql
mysql> insert into tbl(name, value) values("Bob", 10), ("Alice", 20), ("Jack",
30);
@@ -166,7 +192,7 @@ mysql> select * from tbl order by id;
3 rows in set (0.05 sec)
```
-Similarly, using stream load to import the file test.csv without specifying
the auto-increment column `id` will result in the id column being automatically
filled with generated values.
+Similarly, when using stream load to load the file `test.csv` without
specifying the auto-increment column `id`, Doris will automatically populate
the `id` column with generated values.
test.csv:
```
@@ -191,8 +217,7 @@ mysql> select * from tbl order by id;
+------+-------+-------+
5 rows in set (0.04 sec)
```
-
-When importing using insert into statement while specifying the auto-increment
column `id`, null values in the imported data for that column will be replaced
by generated values.
+When writing data using the `INSERT INTO` statement and specifying the
auto-increment column `id`, any null values in the written data for that column
will be replaced with generated values.
```sql
mysql> insert into tbl(id, name, value) values(null, "Doris", 60), (null,
"Nereids", 70);
@@ -216,9 +241,9 @@ mysql> select * from tbl order by id;
### Partial Update
-When performing a partial update on a merge-on-write Unique table containing
an auto-increment column:
+When performing a partial update on a merge-on-write Unique table with an
auto-increment column:
-If the auto-increment column is a key column, during partial updates, as users
must explicitly specify the key column, the target columns for partial column
updates must include the auto-increment column. In this scenario, the import
behavior is similar to regular partial updates.
+If the auto-increment column is a key column, users must explicitly specify it
during partial updates. As a result, the target columns for partial updates
must include the auto-increment column. In this case, the behavior aligns with
that of standard partial updates.
```sql
mysql> CREATE TABLE `demo`.`tbl2` (
@@ -270,7 +295,7 @@ mysql> select * from tbl2 order by id;
4 rows in set (0.04 sec)
```
-When the auto-increment column is a non-key column and users haven't specified
the value for the auto-increment column, the value will be filled from existing
data rows in the table. If users specify the auto-increment column, null values
in the imported data for that column will be replaced by generated values,
while non-null values will remain unchanged, and then these data will be loaded
with the semantics of partial updates.
+When the auto-increment column is a non-key column and no value is provided,
its value will be derived from existing rows in the table. If a value is
specified for the auto-increment column, null values in the written data will
be replaced with generated values, while non-null values will remain unchanged.
These records will then be processed according to the semantics of partial
updates.
```sql
mysql> CREATE TABLE `demo`.`tbl3` (
@@ -340,9 +365,10 @@ mysql> select * from tbl3 order by id;
### Dictionary Encoding
-Using bitmaps for audience analysis in user profile requires building a user
dictionary where each user corresponds to a unique integer dictionary value.
Aggregating these dictionary values can improve the performance of bitmap.
+Using bitmaps for audience analysis in user profiling involves creating a user
dictionary, where each user is assigned a unique integer as their dictionary
value. Aggregating these dictionary values can improve the performance of
bitmap operations.
+
+For example, in an offline UV (Unique Visitors) and PV (Page Views) analysis
scenario, consider a detailed user behavior table:
-Taking the offline UV and PV analysis scenario as an example, assuming there's
a detailed user behavior table:
```sql
CREATE TABLE `demo`.`dwd_dup_tbl` (
@@ -361,8 +387,7 @@ PROPERTIES (
);
```
-Using the auto-incrementa column to create the following dictionary table:
-
+Using the auto-increment column to create the following dictionary table:
```sql
CREATE TABLE `demo`.`dictionary_tbl` (
@@ -377,15 +402,14 @@ PROPERTIES (
);
```
-Import the value of `user_id` from existing data into the dictionary table,
establishing the mapping of `user_id` to integer values:
+Write the `user_id` values from existing data into the dictionary table to map
`user_id` to corresponding integer values:
```sql
insert into dictionary_tbl(user_id)
select user_id from dwd_dup_tbl group by user_id;
```
-Or import only the value of `user_id` in incrementa data into the dictionary
table alternatively:
-
+Alternatively, write only the `user_id` values from incremental data into the
dictionary table.
```sql
insert into dictionary_tbl(user_id)
@@ -393,9 +417,9 @@ select dwd_dup_tbl.user_id from dwd_dup_tbl left join
dictionary_tbl
on dwd_dup_tbl.user_id = dictionary_tbl.user_id where dwd_dup_tbl.visit_time >
'2023-12-10' and dictionary_tbl.user_id is NULL;
```
-In real-world scenarios, Flink connectors can also be employed to write data
into Doris.
+In practical applications, Flink connectors can be used to write data into
Doris.
-Assuming `dim1`, `dim3`, `dim5` represent statistical dimensions of interest
to us, create the following table to store aggregated results:
+To store aggregated results for the statistical dimensions `dim1`, `dim3`, and
`dim5`, create the following table:
```sql
CREATE TABLE `demo`.`dws_agg_tbl` (
@@ -412,7 +436,7 @@ PROPERTIES (
);
```
-Store the result of the data aggregation operations into the aggregation
result table:
+Save the aggregated data into the results table.
```sql
insert into dws_agg_tbl
@@ -420,7 +444,7 @@ select dwd_dup_tbl.dim1, dwd_dup_tbl.dim3,
dwd_dup_tbl.dim5, BITMAP_UNION(TO_BIT
from dwd_dup_tbl INNER JOIN dictionary_tbl on dwd_dup_tbl.user_id =
dictionary_tbl.user_id;
```
-Perform UV and PV queries using the following statement:
+Execute UV and PV queries with the following statement:
```sql
select dim1, dim3, dim5, user_id_bitmap as uv, pv from dws_agg_tbl;
@@ -428,7 +452,7 @@ select dim1, dim3, dim5, user_id_bitmap as uv, pv from
dws_agg_tbl;
### Efficient Pagination
-When displaying data on a page, pagination is often necessary. Traditional
pagination typically involves using `limit`, `offset`, and `order by` in SQL
queries. For instance, consider the following business table intended for
display:
+Pagination is often required when displaying data on a page. Traditional
pagination usually involves using `LIMIT`, `OFFSET`, and `ORDER BY` in SQL
queries. For example, consider the following business table designed for
display:
```sql
CREATE TABLE `demo`.`records_tbl` (
@@ -447,21 +471,21 @@ PROPERTIES (
);
```
-Assuming 100 records are displayed per page in pagination. To fetch the first
page's data, the following SQL query can be used:
+Assuming 100 records are displayed per page, the following SQL query can be
used to fetch data for the first page:
```sql
select * from records_tbl order by `key`, `name` limit 100;
```
-Fetching the data for the second page can be accomplished by:
+To fetch data for the second page, you can use the following query:
```sql
select * from records_tbl order by `key`, `name` limit 100 offset 100;
```
-However, when performing deep pagination queries (with large offsets), even if
the actual required data rows are few, this method still reads all data into
memory for full sorting before subsequent processing, which is quite
inefficient. Using an auto-incrementa column assigns a unique value to each
row, allowing the use of where `unique_value` > x limit y to filter a
significant amount of data beforehand, making pagination more efficient.
+However, when performing deep pagination queries (with large offsets), this
method can be inefficient, as it reads all data into memory for sorting before
processing, even if only a small number of rows are needed. By using an
auto-increment column, each row is assigned a unique value, enabling the use of
a query like `WHERE unique_value > x LIMIT y` to filter out a large portion of
the data in advance, making pagination more efficient.
-Continuing with the aforementioned business table, an auto-increment column is
added to the table to give each row a unique identifier:
+To illustrate this, an auto-increment column is added to the business table,
giving each row a unique identifier:
```sql
CREATE TABLE `demo`.`records_tbl2` (
@@ -481,19 +505,19 @@ PROPERTIES (
);
```
-For pagination displaying 100 records per page, to fetch the first page's
data, the following SQL query can be used:
+For pagination with 100 records per page, the following SQL query can be used
to fetch the data for the first page:
```sql
select * from records_tbl2 order by unique_value limit 100;
```
-By recording the maximum value of unique_value in the returned results, let's
assume it's 99. The following query can then fetch data for the second page:
+By recording the maximum value of `unique_value` from the returned results,
let's assume it is 99. The following query can then be used to fetch data for
the second page:
```sql
select * from records_tbl2 where unique_value > 99 order by unique_value limit
100;
```
-If directly querying contents from a later page and it's inconvenient to
directly obtain the maximum value of `unique_value` from the preceding page's
data (for instance, directly obtaining contents from the 101st page), the
following query can be used:
+If directly querying data from a later page and it's inconvenient to retrieve
the maximum value of `unique_value` from the previous page's results (for
example, when fetching data starting from the 101st page), the following query
can be used:
```sql
select key, name, address, city, nation, region, phone, mktsegment
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]