(doris-website) branch master updated: [opt](load) Optimize the loading overview and adjust the import way directory (#903)

dataroaring Tue, 23 Jul 2024 23:25:25 -0700

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 8fe06a9571 [opt](load) Optimize the loading overview and adjust the 
import way directory (#903)
8fe06a9571 is described below

commit 8fe06a95715dfd9ba0faf04358e8ba823d2c9324
Author: Xin Liao <[email protected]>
AuthorDate: Wed Jul 24 14:21:55 2024 +0800

    [opt](load) Optimize the loading overview and adjust the import way 
directory (#903)
---
 .../import/{ => import-way}/broker-load-manual.md  |   0
 .../import/{ => import-way}/group-commit-manual.md |   0
 .../import/{ => import-way}/insert-into-manual.md  |   0
 .../import/{ => import-way}/mysql-load-manual.md   |   0
 .../import/{ => import-way}/routine-load-manual.md |   0
 .../import/{ => import-way}/stream-load-manual.md  |   0
 docs/data-operate/import/load-manual.md            | 111 +++++++++++----------
 .../import/{ => import-way}/broker-load-manual.md  |   0
 .../import/{ => import-way}/group-commit-manual.md |   0
 .../import/{ => import-way}/insert-into-manual.md  |   0
 .../import/{ => import-way}/mysql-load-manual.md   |   0
 .../import/{ => import-way}/routine-load-manual.md |   0
 .../import/{ => import-way}/stream-load-manual.md  |   0
 .../current/data-operate/import/load-manual.md     | 111 +++++++++++----------
 sidebars.json                                      |  18 ++--
 15 files changed, 126 insertions(+), 114 deletions(-)

diff --git a/docs/data-operate/import/broker-load-manual.md 
b/docs/data-operate/import/import-way/broker-load-manual.md
similarity index 100%
rename from docs/data-operate/import/broker-load-manual.md
rename to docs/data-operate/import/import-way/broker-load-manual.md
diff --git a/docs/data-operate/import/group-commit-manual.md 
b/docs/data-operate/import/import-way/group-commit-manual.md
similarity index 100%
rename from docs/data-operate/import/group-commit-manual.md
rename to docs/data-operate/import/import-way/group-commit-manual.md
diff --git a/docs/data-operate/import/insert-into-manual.md 
b/docs/data-operate/import/import-way/insert-into-manual.md
similarity index 100%
rename from docs/data-operate/import/insert-into-manual.md
rename to docs/data-operate/import/import-way/insert-into-manual.md
diff --git a/docs/data-operate/import/mysql-load-manual.md 
b/docs/data-operate/import/import-way/mysql-load-manual.md
similarity index 100%
rename from docs/data-operate/import/mysql-load-manual.md
rename to docs/data-operate/import/import-way/mysql-load-manual.md
diff --git a/docs/data-operate/import/routine-load-manual.md 
b/docs/data-operate/import/import-way/routine-load-manual.md
similarity index 100%
rename from docs/data-operate/import/routine-load-manual.md
rename to docs/data-operate/import/import-way/routine-load-manual.md
diff --git a/docs/data-operate/import/stream-load-manual.md 
b/docs/data-operate/import/import-way/stream-load-manual.md
similarity index 100%
rename from docs/data-operate/import/stream-load-manual.md
rename to docs/data-operate/import/import-way/stream-load-manual.md
diff --git a/docs/data-operate/import/load-manual.md 
b/docs/data-operate/import/load-manual.md
index ac119a97ae..edb1cc2b01 100644
--- a/docs/data-operate/import/load-manual.md
+++ b/docs/data-operate/import/load-manual.md
@@ -24,84 +24,87 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+## Introduction to Import Solutions
 
+This section provides an overview of import solutions in order to help users 
choose the most suitable import solution based on data source, file format, and 
data volume.
 
-## Supported Data Sources
+Doris supports various import methods, including Stream Load, Broker Load, 
Insert Into, Routine Load, and MySQL Load. In addition to using Doris's native 
import methods, Doris also provides a range of ecosystem tools to assist users 
in data import, including Spark Doris Connector, Flink Doris Connector, Doris 
Kafka Connector, DataX Doriswriter, and Doris Streamloader.
 
-Doris provides a variety of data import solutions, and you can choose 
different data import methods for different data sources.
+For high-frequency small import scenarios, Doris also provides the Group 
Commit feature. Group Commit is not a new import method, but an extension to 
`INSERT INTO VALUES, Stream Load, Http Stream`, which batches small imports on 
the server side.
 
-### By Scene
+Each import method and ecosystem tool has different use cases and supports 
different data sources and file formats.
 
-| Data Source                          | Loading Method                        
                |
-| ------------------------------------ 
|-------------------------------------------------------|
-| Object Storage (s3), HDFS            | [Loading data using 
Broker](./broker-load-manual)     |
-| Local file                           | [Loading local 
data](./stream-load-manual)            |
-| Kafka                                | [Subscribing to Kafka 
data](./routine-load-manual)    |
-| MySQL, PostgreSQL, Oracle, SQLServer | [Sync data via external 
table](./mysql-load-manual)   |
-| Loading via JDBC                      | [Sync data using 
JDBC](../../lakehouse/database/jdbc) |
-| Loading JSON format data              | [JSON format data 
Loading](./load-json-format)        |
-| AutoMQ                            | [AutoMQ 
Load](../../ecosystem/automq-load.md)                       |
+### Import Methods
+| Import Method                                      | Use Case                
                   | Supported File Formats | Single Import Volume | Import 
Mode |
+| :-------------------------------------------- | 
:----------------------------------------- | ----------------------- | 
----------------- | -------- |
+| [Stream Load](./import-way/stream-load-manual)           | Import from local 
data                             | csv, json, parquet, orc | Less than 10GB     
     | Synchronous     |
+| [Broker Load](./import-way/broker-load-manual.md)        | Import from 
object storage, HDFS, etc.                     | csv, json, parquet, orc | Tens 
of GB to hundreds of GB   | Asynchronous     |
+| [INSERT INTO VALUES](./import-way/insert-into-manual.md) | <p>Import single 
or small batch data</p><p>Import via JDBC, etc.</p> | SQL                     | 
Simple testing | Synchronous     |
+| [INSERT INTO SELECT](./import-way/insert-into-manual.md) | <p>Import data 
between Doris internal tables</p><p>Import external tables</p>      | SQL       
              | Depending on memory size  | Synchronous     |
+| [Routine Load](./import-way/routine-load-manual.md)      | Real-time import 
from Kafka                            | csv, json               | Micro-batch 
import MB to GB | Asynchronous     |
+| [MySQL Load](./import-way/mysql-load-manual.md)          | Import from local 
data                             | csv                     | Less than 10GB     
     | Synchronous     |
+| [Group Commit](./import-way/group-commit-manual.md)          | 
High-frequency small batch import                             | Depending on 
the import method used                     |  Micro-batch import KB         | - 
    |
 
-### By Loading Method
 
-| Loading method name | Use method                                             
      |
-| ------------------ | 
------------------------------------------------------------ |
-| Broker Load        | [Import external storage data via 
Broker](./broker-load-manual) |
-| Stream Load        | [Stream import data (local file and memory 
data)](./stream-load-manual) |
-| Routine Load       | [Import Kafka data](./routine-load-manual)   |
-| Insert Into        | [External table imports data through 
INSERT](./insert-into-manual) |
-| S3 Load            | [Object storage data import of S3 
protocol](./broker-load-manual) |
-| MySQL Load         | [Local data import of MySql 
protocol](./mysql-load-manual) |
+### Ecosystem Tools
 
-## Supported Data Formats
+| Ecosystem Tool              | Use Case                                       
              |
+| --------------------- | 
------------------------------------------------------------ |
+| [Spark Doris Connector](../../ecosystem/spark-doris-connector.md) | Batch 
import data from Spark                                          |
+| [Flink Doris Connector](../../ecosystem/flink-doris-connector.md) | 
Real-time import data from Flink                                          |
+| [Doris Kafka Connector](../../ecosystem/doris-kafka-connector.md) | 
Real-time import data from Kafka                                         |
+| [DataX Doriswriter](../../ecosystem/datax.md)     | Synchronize data from 
MySQL, Oracle, SQL Server, PostgreSQL, Hive, ADS, etc.     |
+| [Doris Streamloader](../../ecosystem/doris-streamloader.md)    | Implements 
concurrent import for Stream Load, allowing multiple files and directories to 
be imported at once |
+| [X2Doris](./migrate-data-from-other-olap.md)               | Migrate data 
from other AP databases to Doris                                |
 
-Different import methods support slightly different data formats.
+### File Formats
 
-| Import Methods | Supported Formats       |
-| -------------- | ----------------------- |
-| Broker Load    | parquet, orc, csv, gzip |
-| Stream Load    | csv, json, parquet, orc |
-| Routine Load   | csv, json               |
-| MySql Load     | csv                     |
+| File Format | Supported Import Methods                       | Supported 
Compression Formats                            |
+| -------- | ------------------------------------ | 
----------------------------------------- |
+| csv      | Stream Load, Broker Load, MySQL Load | gz, lzo, bz2, lz4, 
LZ4FRAME,lzop, deflate |
+| json     | Stream Load, Broker Load             | Not supported              
                      |
+| parquet  | Stream Load, Broker Load             | Not supported              
                      |
+| orc      | Stream Load, Broker Load             | Not supported              
                      |
 
-## Import Instructions
+### Data Sources
 
-The data import implementation of Apache Doris has the following common 
features, which are introduced here to help you better use the data import 
function
+| Data Source                                         | Supported Import 
Methods                                         |
+| ---------------------------------------------- | 
------------------------------------------------------ |
+| Local data                                       | <p>Stream Load</p> 
<p>StreamLoader</p> <p>MySQL Load</p>              |
+| Object storage                                       | <p>Broker Load</p> 
<p>INSERT TO SELECT FROM S3 TVF</p>                |
+| HDFS                                           | <p>Broker Load</p> 
<p>INSERT TO SELECT FROM HDFS TVF</p>            |
+| Kafka                                          | <p>Routine Load</p> 
<p>Kakfa Doris Connector</p>                 |
+| Flink                                          | Flink Doris Connector       
                           |
+| Spark                                          | Spark Doris Connector       
                           |
+| Mysql, PostgreSQL, Oracle, SQL Server, and other TP databases | <p>Import 
via external tables</p> <p>Flink Doris Connector</p>                 |
+| Other AP databases                                   | <p>X2Doris</p> 
<p>Import via external tables</p> <p>Spark/Flink Doris Connector</p> |
 
-## Import Atomicity Guarantees
+## Concept Introduction
 
-Each import job of Doris, whether it is batch import using Broker Load or 
single import using INSERT statement, is a complete transaction operation. The 
import transaction can ensure that the data in a batch takes effect atomically, 
and there will be no partial data writing.
+This section mainly introduces some concepts related to import to help users 
better utilize the data import feature.
 
-At the same time, an import job will have a Label. This Label is unique under 
a database (Database) and is used to uniquely identify an import job. Label can 
be specified by the user, and some import functions will also be automatically 
generated by the system.
+### Atomicity
 
-Label is used to ensure that the corresponding import job can only be 
successfully imported once. A successfully imported Label, when used again, 
will be rejected with the error `Label already used`. Through this mechanism, 
`At-Most-Once` semantics can be implemented in Doris. If combined with the 
`At-Least-Once` semantics of the upstream system, the `Exactly-Once` semantics 
of imported data can be achieved.
+All import tasks in Doris are atomic, meaning that a import job either 
succeeds completely or fails completely. Partially successful data import will 
not occur within the same import task, and atomicity and consistency between 
materialized views and base tables are also guaranteed. For simple import 
tasks, users do not need to perform additional configurations or operations. 
For materialized views associated with tables, atomicity and consistency with 
the base table are also guaranteed.
 
-For best practices on atomicity guarantees, see Importing Transactions and 
Atomicity.
+### Label Mechanism
 
-## Synchronous and Asynchronous Imports
+Import jobs in Doris can be assigned a label. This label is usually a 
user-defined string with certain business logic properties. If not specified by 
the user, the system will generate one automatically. The main purpose of the 
label is to uniquely identify an import task and ensure that the same label is 
imported successfully only once.
 
-Import methods are divided into synchronous and asynchronous. For the 
synchronous import method, the returned result indicates whether the import 
succeeds or fails. For the asynchronous import method, a successful return only 
means that the job was submitted successfully, not that the data was imported 
successfully. You need to use the corresponding command to check the running 
status of the import job.
+The label is used to ensure that the corresponding import job can only be 
successfully imported once. If a label that has been successfully imported is 
used again, it will be rejected and an error message `Label already used` will 
be reported. With this mechanism, Doris can achieve `At-Most-Once` semantics on 
the Doris side. If combined with the `At-Least-Once` semantics of the upstream 
system, it is possible to achieve `Exactly-Once` semantics for importing data.
 
-## Import the Data of Array Types
+### Import Mode
 
-For example, in the following import, you need to cast columns b14 and a13 
into `array<string>` type, and then use the `array_union` function.
+Import mode can be either synchronous or asynchronous. For synchronous import 
methods, the result returned indicates whether the import is successful or not. 
For asynchronous import methods, a successful return only indicates that the 
job has been submitted successfully, not that the data import is successful. 
Users need to use the corresponding command to check the running status of the 
import job.
 
-```sql
-LOAD LABEL label_03_14_49_34_898986_19090452100 ( 
-  DATA 
INFILE("hdfs://test.hdfs.com:9000/user/test/data/sys/load/array_test.data") 
-  INTO TABLE `test_array_table` 
-  COLUMNS TERMINATED BY "|" (`k1`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, 
`a8`, `a9`, `a10`, `a11`, `a12`, `a13`, `b14`) 
-  SET(a14=array_union(cast(b14 as array<string>), cast(a13 as array<string>))) 
WHERE size(a2) > 270) 
-  WITH BROKER "hdfs" ("username"="test_array", "password"="") 
-  PROPERTIES( "max_filter_ratio"="0.8" );
-```
+### Data Transformation
 
-## Execution Engine Selected
+When importing data into a table, sometimes the content in the table may not 
be exactly the same as the content in the source data file, and data 
transformation is required. Doris supports performing certain transformations 
on the source data during the import process. Specifically, it includes 
mapping, conversion, pre-filtering, and post-filtering.
 
-The Pipeline engine is turned off by default on import, and is enabled by the 
following two variables:
+### Error Data Handling
 
-1. `enable_pipeline_load` in [FE CONFIG](../../admin-manual/config/fe-config) 
`enable_pipeline_load`. When enabled, import tasks such as Streamload will try 
to use the Pipeline engine.
+During the import process, the data types of the original columns and the 
target columns may not be completely consistent. During the import, the values 
of original columns with inconsistent data types will be converted. During the 
conversion process, conversion failures may occur, such as field type mismatch 
or field length exceeded. Strict mode is used to control whether to filter out 
these conversion failure error data rows during the import process.
 
-2. `enable_nereids_dml_with_pipeline` in Session Variable to enable insert 
into to try to use the Pipeline engine.
+### Minimum Write Replica Number
 
-When the above variables are turned on, whether and which set of Pipeline 
engine is used still depends on the settings of the other two Session Variables 
`enable_pipeline_engine` and `enable_pipeline_x_engine`. When both are enabled, 
PipelineX is selected in preference to the Pipeline Engine. If neither is 
enabled, the import will not be executed using the Pipeline engine even if the 
above variables are set to `true`.
+By default, data import requires that at least a majority of replicas are 
successfully written for the import to be considered successful. However, this 
approach is not flexible and may cause inconvenience in certain scenarios. 
Doris allows users to set the minimum write replica number (Min Load Replica 
Num). For import data tasks, when the number of replicas successfully written 
is greater than or equal to the minimum write replica number, the import is 
considered successful.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/broker-load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/broker-load-manual.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/broker-load-manual.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/broker-load-manual.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/group-commit-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/group-commit-manual.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/group-commit-manual.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/group-commit-manual.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/insert-into-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/insert-into-manual.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/insert-into-manual.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/insert-into-manual.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/mysql-load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/mysql-load-manual.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/mysql-load-manual.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/mysql-load-manual.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/routine-load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/routine-load-manual.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/stream-load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/stream-load-manual.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-manual.md
index 292970bfc0..4bd5841c9c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-manual.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-manual.md
@@ -24,83 +24,86 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+## 导入方案介绍
 
+本节对导入方案做一个总体介绍，以便大家根据数据源、文件格式、数据量等选择最合适的导入方案。
 
-## 支持的数据源
+Doris支持的导入方式包括Stream Load、Broker Load、Insert Into、Routine Load、 MySQL 
Load。除了直接使用Doris原生的导入方式进行导入，Doris还提供了一系列的生态工具帮助用户进行数据导入，包括Spark Doris 
Connector、Flink Doris Connector、Doris Kafka Connector、DataX Doriswriter、Doris 
Streamloader等。
 
-Doris 提供多种数据导入方案，可以针对不同的数据源进行选择不同的数据导入方式。
+针对高频小导入场景，Doris还提供了Group Commit功能。Group Commit 不是一种新的导入方式，而是对`INSERT INTO 
VALUES、Stream Load、Http Stream`的扩展，对小导入在服务端进行攒批。
 
-### 按场景划分
+每种导入方式和生态工具适用的场景不一样，支持的数据源、文件格式也有差异。
 
-| 数据源                               | 导入方式                                     
                |
-| ------------------------------------ | 
------------------------------------------------------------ |
-| 对象存储（s3）,HDFS                  | [使用 Broker 导入数据](./broker-load-manual) |
-| 本地文件                             | [Stream Load](./stream-load-manual), 
[MySQL Load](./mysql-load-manual)         |
-| Kafka                                | [订阅 Kafka 数据](./routine-load-manual)  
           |
-| Mysql、PostgreSQL，Oracle，SQLServer | [通过外部表同步数据](./insert-into-manual) |
-| 通过 JDBC 导入                         | [使用 JDBC 
同步数据](../../lakehouse/database/jdbc)           |
-| 导入 JSON 格式数据                     | [JSON 格式数据导入](./load-json-format)       |
+### 导入方式
+| 导入方式                                      | 使用场景                             
      | 支持的文件格式          | 单次导入数据量    | 导入模式 |
+| :-------------------------------------------- | 
:----------------------------------------- | ----------------------- | 
----------------- | -------- |
+| [Stream Load](./import-way/stream-load-manual)           | 从本地数据导入           
                  | csv、json、parquet、orc | 小于10GB          | 同步     |
+| [Broker Load](./import-way/broker-load-manual.md)        | 从对象存储、HDFS等导入     
                | csv、json、parquet、orc | 数十GB到数百 GB   | 异步     |
+| [INSERT INTO VALUES](./import-way/insert-into-manual.md) | 
<p>单条或小批量据量导入</p><p>通过JDBC等接口导入</p> | SQL                     | 简单测试用        | 
同步     |
+| [INSERT INTO SELECT](./import-way/insert-into-manual.md) | 
<p>Doris内部表之间数据导入</p><p>外部表导入</p>      | SQL                     | 根据内存大小而定  | 
同步     |
+| [Routine Load](./import-way/routine-load-manual.md)      | 从kakfa实时导入        
                    | csv、json               | 微批导入 MB 到 GB | 异步     |
+| [MySQL Load](./import-way/mysql-load-manual.md)          | 从本地数据导入           
                  | csv                     | 小于10GB          | 同步     |
+| [Group Commit](./import-way/group-commit-manual.md)          | 高频小批量导入       
                      | 根据使用的导入方式而定                     |  微批导入KB         | -   
  |
 
-### 按导入方式划分
+### 生态工具
 
-| 导入方式名称 | 使用方式                                                     |
-| ------------ | ------------------------------------------------------------ |
-| Broker Load  | [通过 Broker 导入外部存储数据](./broker-load-manual) |
-| Stream Load  | [流式导入数据 (本地文件及内存数据)](./stream-load-manual) |
-| Routine Load | [导入 Kafka 数据](./routine-load-manual)       |
-| Insert Into  | [外部表通过 INSERT 方式导入数据](./insert-into-manual) |
-| S3 Load      | [S3 协议的对象存储数据导入](./broker-load-manual#s3-load) |
-| MySQL Load   | [MySQL 客户端导入本地数据](./mysql-load-manual) |
+| 生态工具              | 使用场景                                                     
|
+| --------------------- | 
------------------------------------------------------------ |
+| [Spark Doris Connector](../../ecosystem/spark-doris-connector.md) | 
从spark批量导入数据                                          |
+| [Flink Doris Connector](../../ecosystem/flink-doris-connector.md) | 
从flink实时导入数据                                          |
+| [Doris Kafka Connector](../../ecosystem/doris-kafka-connector.md) | 
从kafaka实时导入数据                                         |
+| [DataX Doriswriter](../../ecosystem/datax.md)     | 
从MySQL、Oracle、SqlServer、Postgre、Hive、ADS等同步数据     |
+| [Doris Streamloader](../../ecosystem/doris-streamloader.md)    | 实现了 Stream 
Load 的多并发导入，一次导入可以同时导入多个文件及目录 |
+| [X2Doris](./migrate-data-from-other-olap.md)               | 
从其他AP数据库迁移数据到Doris                                |
 
-## 支持的数据格式
+### 文件格式
 
-不同的导入方式支持的数据格式略有不同。
+| 文件格式 | 支持的导入方式                       | 支持的压缩格式                            |
+| -------- | ------------------------------------ | 
----------------------------------------- |
+| csv      | Stream Load、Broker Load、MySQL Load | gz, lzo, bz2, lz4, 
LZ4FRAME,lzop, deflate |
+| json     | Stream Load、Broker Load             | 不支持                         
           |
+| parquet  | Stream Load、Broker Load             | 不支持                         
           |
+| orc      | Stream Load、Broker Load             | 不支持                         
           |
 
-| 导入方式     | 支持的格式                |
-| ------------ | ----------------------- |
-| Broker Load  | parquet、orc、csv、gzip |
-| Stream Load  | csv、json、parquet、orc |
-| Routine Load | csv、json               |
-| MySQL Load   | csv                    |
+### 数据源
 
-## 导入说明
+| 数据源                                         | 支持的导入方式                        
                |
+| ---------------------------------------------- | 
------------------------------------------------------ |
+| 本地数据                                       | <p>Stream Load</p> 
<p>StreamLoader</p> <p>MySQL load</p>         |
+| 对象存储                                       | <p>Broker Load</p> <p>INSERT TO 
SELECT FROM S3 TVF</p>              |
+| HDFS                                           | <p>Broker Load</p> 
<p>INSERT TO SELECT FROM HDFS TVF</p>         |
+| Kafka                                          | <p>Routine Load</p> 
<p>Kakfa  Doris Connector</p>              |
+| Flink                                          | Flink Doris Connector       
                           |
+| Spark                                          | Spark Doris Connector       
                           |
+| Mysql、PostgreSQL，Oracle，SQLServer等TP数据库 | <p>通过外表导入</p> <p>Flink Doris 
Connector</p>                |
+| 其他AP数据库                                   | <p>X2Doris</p> <p>通过外表导入</p> 
<p>Spark/Flink Doris Connector</p> |
 
-Apache Doris 的数据导入实现有以下共性特征，这里分别介绍，以帮助大家更好的使用数据导入功能
+## 概念介绍
 
-## 导入的原子性保证
+本节主要对导入相关的一些概念进行介绍，以帮助大家更好的使用数据导入功能。
 
-Doris 的每一个导入作业，不论是使用 Broker Load 进行批量导入，还是使用 INSERT 
语句进行单条导入，都是一个完整的事务操作。导入事务可以保证一批次内的数据原子生效，不会出现部分数据写入的情况。
+### 原子性
 
-同时，一个导入作业都会有一个 Label。这个 Label 是在一个数据库（Database）下唯一的，用于唯一标识一个导入作业。Label 
可以由用户指定，部分导入功能也会由系统自动生成。
+Doris 
中所有导入任务都是原子性的，即一个导入作业要么全部成功，要么全部失败，不会出现仅部分数据导入成功的情况，并且在同一个导入任务中对多张表的导入也能够保证原子性。对于简单的导入任务，用户无需做额外配置或操作。对于表所附属的物化视图，也同时保证和基表的原子性和一致性。
 
-Label 是用于保证对应的导入作业，仅能成功导入一次。一个被成功导入的 Label，再次使用时，会被拒绝并报错 `Label already 
used`。通过这个机制，可以在 Doris 侧做到 `At-Most-Once` 语义。如果结合上游系统的 `At-Least-Once` 
语义，则可以实现导入数据的 `Exactly-Once` 语义。
-
-关于原子性保证的最佳实践，可以参阅 导入事务和原子性。
+### 标签机制
 
-## 同步及异步导入
+Doris 的导入作业都可以设置一个 Label。这个 Label 
通常是用户自定义的、具有一定业务逻辑属性的字符串，如果用户不指定，系统也会自动生成一个。Label 的主要作用是唯一标识一个导入任务，并且能够保证相同的 
Label 仅会被成功导入一次。
 
-导入方式分为同步和异步。对于同步导入方式，返回结果即表示导入成功还是失败。而对于异步导入方式，返回成功仅代表作业提交成功，不代表数据导入成功，需要使用对应的命令查看导入作业的运行状态。
+Label 是用于保证对应的导入作业，仅能成功导入一次。一个被成功导入的 Label，再次使用时，会被拒绝并报错 `Label already 
used`。通过这个机制，可以在 Doris 侧做到 `At-Most-Once` 语义。如果结合上游系统的 `At-Least-Once` 
语义，则可以实现导入数据的 `Exactly-Once` 语义。
 
-## 导入 Array 类型
+### 导入模式
 
-例如以下导入，需要先将列 b14 和列 a13 先 cast 成`array<string>`类型，再运用`array_union`函数。
+导入模式分为同步导入和异步导入。对于同步导入方式，返回结果即表示导入成功还是失败。而对于异步导入方式，返回成功仅代表作业提交成功，不代表数据导入成功，需要使用对应的命令查看导入作业的运行状态。
 
-```sql
-LOAD LABEL label_03_14_49_34_898986_19090452100 ( 
-  DATA 
INFILE("hdfs://test.hdfs.com:9000/user/test/data/sys/load/array_test.data") 
-  INTO TABLE `test_array_table` 
-  COLUMNS TERMINATED BY "|" (`k1`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, 
`a8`, `a9`, `a10`, `a11`, `a12`, `a13`, `b14`) 
-  SET(a14=array_union(cast(b14 as array<string>), cast(a13 as array<string>))) 
WHERE size(a2) > 270) 
-  WITH BROKER "hdfs" ("username"="test_array", "password"="") 
-  PROPERTIES( "max_filter_ratio"="0.8" );
-```
+### 数据转化
 
-## 使用的执行引擎
+在向表中导入数据时，有时候表中的内容与源数据文件中的内容不完全一致，需要对数据进行变换才行。Doris支持在导入过程中直接对源数据进行一些变换。具体有：映射、转换、前置过滤和后置过滤。
 
-导入时默认关闭 Pipeline 引擎，通过以下两个变量开启：
+### 错误数据处理
 
-1. [FE CONFIG](../../admin-manual/config/fe-config) 中的 
`enable_pipeline_load`，开启后 Streamload 等导入任务将尝试使用 Pipeline 引擎执行。
+在导入过程中，原始列跟目标列的数据类型可能不完全一致，导入时会对数据类型不一致的原始列值进行转换。转换过程中可能会发生字段类型不匹配、字段超长等转换失败的情况。严格模式用于控制导入过程中是否会对这些转换失败的错误数据行进行过滤。
 
-2. Session Variable 中的 `enable_nereids_dml_with_pipeline`，开启后 insert into 
将尝试使用 Pipeline 引擎执行。
+### 最小写入副本数
 
-以上变量开启后，具体是否使用 Pipeline 引擎，仍然取决于 Session Variables 
`enable_pipeline_engine`。如果该值为 `false`，即使以上变量被设置为 `true`，导入依然不会使用 Pipeline 引擎执行。
+默认情况下，数据导入要求至少有超过半数的副本写入成功，导入才算成功。然而，这种方式不够灵活，在某些场景会带来不便。Doris 允许用户设置最小写入副本数 
(Min Load Replica Num)。对导入数据任务，当它成功写入的副本数大于或等于最小写入副本数时，导入即成功。
diff --git a/sidebars.json b/sidebars.json
index f0f23d893d..2b49d18a5f 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -114,14 +114,20 @@
                     "label": "Loading Data",
                     "items": [
                         "data-operate/import/load-manual",
-                        "data-operate/import/stream-load-manual",
-                        "data-operate/import/broker-load-manual",
-                        "data-operate/import/routine-load-manual",
-                        "data-operate/import/insert-into-manual",
-                        "data-operate/import/mysql-load-manual",
+                        {
+                            "type": "category",
+                            "label": "Import Way",
+                            "items": [
+                                
"data-operate/import/import-way/stream-load-manual",
+                                
"data-operate/import/import-way/broker-load-manual",
+                                
"data-operate/import/import-way/routine-load-manual",
+                                
"data-operate/import/import-way/insert-into-manual",
+                                
"data-operate/import/import-way/mysql-load-manual",
+                                
"data-operate/import/import-way/group-commit-manual"
+                            ]
+                        },
                         "data-operate/import/load-json-format",
                         "data-operate/import/migrate-data-from-other-olap",
-                        "data-operate/import/group-commit-manual",
                         "data-operate/import/load-atomicity",
                         "data-operate/import/load-data-convert",
                         "data-operate/import/min-load-replica-num",


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [opt](load) Optimize the loading overview and adjust the import way directory (#903)

Reply via email to