This is an automated email from the ASF dual-hosted git repository.
diwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new aa2cb776daa [doc](load) fix snowflake datasource style (#2340)
aa2cb776daa is described below
commit aa2cb776daa52de412c6c1d3a6685b6bdbfd3f95
Author: wudi <[email protected]>
AuthorDate: Mon Apr 28 18:21:46 2025 +0800
[doc](load) fix snowflake datasource style (#2340)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/data-operate/import/data-source/snowflake.md | 4 +-
.../data-operate/import/data-source/snowflake.md | 186 ++++++++++-----------
.../data-operate/import/data-source/snowflake.md | 186 ++++++++++-----------
.../data-operate/import/data-source/snowflake.md | 186 ++++++++++-----------
.../data-operate/import/data-source/snowflake.md | 4 +-
.../data-operate/import/data-source/snowflake.md | 4 +-
6 files changed, 282 insertions(+), 288 deletions(-)
diff --git a/docs/data-operate/import/data-source/snowflake.md
b/docs/data-operate/import/data-source/snowflake.md
index 845e70f9ba1..b728bb75920 100644
--- a/docs/data-operate/import/data-source/snowflake.md
+++ b/docs/data-operate/import/data-source/snowflake.md
@@ -107,7 +107,7 @@ PROPERTIES (
2.1. **Export to S3 Parquet Files via COPY INTO**
- Snowflake supports exporting to [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),**Export
data partitioned by Doris' partition fields**. Example for AWS S3:
+ Snowflake supports exporting to [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),**Export
data partitioned by Doris' partition fields**. Example for AWS S3:
```sql
CREATE FILE FORMAT my_parquet_format TYPE = parquet;
@@ -122,7 +122,7 @@ PROPERTIES (
2.2. **Verify Exported Files on S3**
- Exported files are organized into **subdirectories by partition** on S3:
+ Exported files are organized into **subdirectories by partition** on S3:

diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/snowflake.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/snowflake.md
index 72d41d87dae..4096912b091 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/snowflake.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/snowflake.md
@@ -106,22 +106,20 @@ PROPERTIES (
2.1. **通过 COPY INFO 方式导出到 S3 Parquet 格式的文件**
- Snowflake 支持导出到 [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),导出时,建议按照**Doris
的分区字段**进行导出。以下为导出到 AWS S3 的示例:
+ Snowflake 支持导出到 [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),导出时,建议按照**Doris
的分区字段**进行导出。以下为导出到 AWS S3 的示例:
- ```sql
- CREATE FILE FORMAT my_parquet_format TYPE = parquet;
-
- CREATE OR REPLACE STAGE external_stage
- URL='s3://mybucket/sales_data'
- CREDENTIALS=(AWS_KEY_ID='<ak>' AWS_SECRET_KEY='<sk>')
- FILE_FORMAT = my_parquet_format;
-
- COPY INTO @external_stage from sales_data PARTITION BY (CAST(order_date AS
VARCHAR)) header=true;
- ```
+ ```sql
+ CREATE FILE FORMAT my_parquet_format TYPE = parquet;
+ CREATE OR REPLACE STAGE external_stage
+ URL='s3://mybucket/sales_data'
+ CREDENTIALS=(AWS_KEY_ID='<ak>' AWS_SECRET_KEY='<sk>')
+ FILE_FORMAT = my_parquet_format;
+ COPY INTO @external_stage from sales_data PARTITION BY (CAST(order_date AS
VARCHAR)) header=true;
+ ```
2.2. **查看 S3 上的导出文件**
- 导出后,在 S3 上会按照**分区划分成具体的子目录**,每一个目录是对应的 如下
+ 导出后,在 S3 上会按照**分区划分成具体的子目录**,每一个目录是对应的 如下

@@ -137,95 +135,95 @@ PROPERTIES (
3.1. **导入一个分区的数据**
- ```sql
- LOAD LABEL sales_data_2025_04_08
- (
- DATA INFILE("s3://mybucket/sales_data/2025_04_08/*")
- INTO TABLE sales_data
- FORMAT AS "parquet"
- (order_id, order_date, customer_name, amount, country)
- )
- WITH S3
- (
- "provider" = "S3",
- "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
- "s3.access_key" = "<ak>",
- "s3.secret_key" = "<sk>",
- "s3.region" = "ap-southeast-1"
- );
- ```
+ ```sql
+ LOAD LABEL sales_data_2025_04_08
+ (
+ DATA INFILE("s3://mybucket/sales_data/2025_04_08/*")
+ INTO TABLE sales_data
+ FORMAT AS "parquet"
+ (order_id, order_date, customer_name, amount, country)
+ )
+ WITH S3
+ (
+ "provider" = "S3",
+ "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
+ "s3.access_key" = "<ak>",
+ "s3.secret_key" = "<sk>",
+ "s3.region" = "ap-southeast-1"
+ );
+ ```
3.2. **通过 Show Load 查看任务运行情况**
- 由于 S3Load 导入是异步提交的,所以需要通过 show load 可以查看指定 label 的导入情况:
-
- ```yaml
- mysql> show load where label = "label_sales_data_2025_04_08"\G
- *************************** 1. row ***************************
- JobId: 17956078
- Label: label_sales_data_2025_04_08
- State: FINISHED
- Progress: 100.00% (1/1)
- Type: BROKER
- EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
- TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):3600;
max_filter_ratio:0.0; priority:NORMAL
- ErrorMsg: NULL
- CreateTime: 2025-04-10 17:50:53
- EtlStartTime: 2025-04-10 17:50:54
- EtlFinishTime: 2025-04-10 17:50:54
- LoadStartTime: 2025-04-10 17:50:54
- LoadFinishTime: 2025-04-10 17:50:54
- URL: NULL
- JobDetails: {"Unfinished
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
- TransactionId: 766228
- ErrorTablets: {}
- User: root
- Comment:
- 1 row in set (0.00 sec)
- ```
+ 由于 S3Load 导入是异步提交的,所以需要通过 show load 可以查看指定 label 的导入情况:
+
+ ```yaml
+ mysql> show load where label = "label_sales_data_2025_04_08"\G
+ *************************** 1. row ***************************
+ JobId: 17956078
+ Label: label_sales_data_2025_04_08
+ State: FINISHED
+ Progress: 100.00% (1/1)
+ Type: BROKER
+ EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+ TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):3600;
max_filter_ratio:0.0; priority:NORMAL
+ ErrorMsg: NULL
+ CreateTime: 2025-04-10 17:50:53
+ EtlStartTime: 2025-04-10 17:50:54
+ EtlFinishTime: 2025-04-10 17:50:54
+ LoadStartTime: 2025-04-10 17:50:54
+ LoadFinishTime: 2025-04-10 17:50:54
+ URL: NULL
+ JobDetails: {"Unfinished
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+ TransactionId: 766228
+ ErrorTablets: {}
+ User: root
+ Comment:
+ 1 row in set (0.00 sec)
+ ```
3.3. **处理导入过程中的错误**
- 当有多个导入任务时,可以通过以下语句,查询数据导入失败的日期和原因。
-
- ```SQL
- mysql> show load where state='CANCELLED' and label like "label_test%"\G
- *************************** 1. row ***************************
- JobId: 18312384
- Label: label_test123
- State: CANCELLED
- Progress: 100.00% (3/3)
- Type: BROKER
- EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
- TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):14400;
max_filter_ratio:0.0; priority:NORMAL
- ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to
cancel
- CreateTime: 2025-04-15 17:32:59
- EtlStartTime: 2025-04-15 17:33:02
- EtlFinishTime: 2025-04-15 17:33:02
- LoadStartTime: 2025-04-15 17:33:02
- LoadFinishTime: 2025-04-15 17:33:02
- URL:
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
- JobDetails: {"Unfinished
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
- TransactionId: 769213
- ErrorTablets: {}
- User: root
- Comment:
- ```
-
- 如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED),具体错误需要通过访问返回的 URL
的链接进行查看,如下是数据超过了表中的 Schema 中 country 列的实际长度:
-
- ```python
- [root@VM-10-6-centos ~]$ curl
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; .
src line [];
- ```
-
- 同时对于数据质量的错误,如果可以允许错误数据跳过的,可以通过在 S3 Load 任务中 Properties
设置容错率,具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
+ 当有多个导入任务时,可以通过以下语句,查询数据导入失败的日期和原因。
+
+ ```SQL
+ mysql> show load where state='CANCELLED' and label like "label_test%"\G
+ *************************** 1. row ***************************
+ JobId: 18312384
+ Label: label_test123
+ State: CANCELLED
+ Progress: 100.00% (3/3)
+ Type: BROKER
+ EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+ TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):14400;
max_filter_ratio:0.0; priority:NORMAL
+ ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to
cancel
+ CreateTime: 2025-04-15 17:32:59
+ EtlStartTime: 2025-04-15 17:33:02
+ EtlFinishTime: 2025-04-15 17:33:02
+ LoadStartTime: 2025-04-15 17:33:02
+ LoadFinishTime: 2025-04-15 17:33:02
+ URL: http://10.16.10.6:28747/api/_load_error_log?file=__shard_2
error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+ JobDetails: {"Unfinished
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+ TransactionId: 769213
+ ErrorTablets: {}
+ User: root
+ Comment:
+ ```
+
+ 如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED),具体错误需要通过访问返回的 URL
的链接进行查看,如下是数据超过了表中的 Schema 中 country 列的实际长度:
+
+ ```python
+ [root@VM-10-6-centos ~]$ curl
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2
error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src
line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; .
src line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src
line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9;
. src line [];
+ ```
+
+ 同时对于数据质量的错误,如果可以允许错误数据跳过的,可以通过在 S3 Load 任务中 Properties
设置容错率,具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
3.4. **导入多个分区的数据**
- 当需要迁移大数据量的存量数据时,建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区,数据量建议不超过
100GB,以减轻系统压力并降低导入失败后的重试成本。
+ 当需要迁移大数据量的存量数据时,建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区,数据量建议不超过
100GB,以减轻系统压力并降低导入失败后的重试成本。
- 可参考脚本
[s3_load_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_demo.sh),该脚本可以实现了轮询
S3 上的分区目录,同时提交 S3 Load 任务到 Doris 中,实现批量导入的效果。
\ No newline at end of file
+ 可参考脚本
[s3_load_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_demo.sh),该脚本可以实现了轮询
S3 上的分区目录,同时提交 S3 Load 任务到 Doris 中,实现批量导入的效果。
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/snowflake.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/snowflake.md
index 72d41d87dae..4096912b091 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/snowflake.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/snowflake.md
@@ -106,22 +106,20 @@ PROPERTIES (
2.1. **通过 COPY INFO 方式导出到 S3 Parquet 格式的文件**
- Snowflake 支持导出到 [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),导出时,建议按照**Doris
的分区字段**进行导出。以下为导出到 AWS S3 的示例:
+ Snowflake 支持导出到 [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),导出时,建议按照**Doris
的分区字段**进行导出。以下为导出到 AWS S3 的示例:
- ```sql
- CREATE FILE FORMAT my_parquet_format TYPE = parquet;
-
- CREATE OR REPLACE STAGE external_stage
- URL='s3://mybucket/sales_data'
- CREDENTIALS=(AWS_KEY_ID='<ak>' AWS_SECRET_KEY='<sk>')
- FILE_FORMAT = my_parquet_format;
-
- COPY INTO @external_stage from sales_data PARTITION BY (CAST(order_date AS
VARCHAR)) header=true;
- ```
+ ```sql
+ CREATE FILE FORMAT my_parquet_format TYPE = parquet;
+ CREATE OR REPLACE STAGE external_stage
+ URL='s3://mybucket/sales_data'
+ CREDENTIALS=(AWS_KEY_ID='<ak>' AWS_SECRET_KEY='<sk>')
+ FILE_FORMAT = my_parquet_format;
+ COPY INTO @external_stage from sales_data PARTITION BY (CAST(order_date AS
VARCHAR)) header=true;
+ ```
2.2. **查看 S3 上的导出文件**
- 导出后,在 S3 上会按照**分区划分成具体的子目录**,每一个目录是对应的 如下
+ 导出后,在 S3 上会按照**分区划分成具体的子目录**,每一个目录是对应的 如下

@@ -137,95 +135,95 @@ PROPERTIES (
3.1. **导入一个分区的数据**
- ```sql
- LOAD LABEL sales_data_2025_04_08
- (
- DATA INFILE("s3://mybucket/sales_data/2025_04_08/*")
- INTO TABLE sales_data
- FORMAT AS "parquet"
- (order_id, order_date, customer_name, amount, country)
- )
- WITH S3
- (
- "provider" = "S3",
- "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
- "s3.access_key" = "<ak>",
- "s3.secret_key" = "<sk>",
- "s3.region" = "ap-southeast-1"
- );
- ```
+ ```sql
+ LOAD LABEL sales_data_2025_04_08
+ (
+ DATA INFILE("s3://mybucket/sales_data/2025_04_08/*")
+ INTO TABLE sales_data
+ FORMAT AS "parquet"
+ (order_id, order_date, customer_name, amount, country)
+ )
+ WITH S3
+ (
+ "provider" = "S3",
+ "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
+ "s3.access_key" = "<ak>",
+ "s3.secret_key" = "<sk>",
+ "s3.region" = "ap-southeast-1"
+ );
+ ```
3.2. **通过 Show Load 查看任务运行情况**
- 由于 S3Load 导入是异步提交的,所以需要通过 show load 可以查看指定 label 的导入情况:
-
- ```yaml
- mysql> show load where label = "label_sales_data_2025_04_08"\G
- *************************** 1. row ***************************
- JobId: 17956078
- Label: label_sales_data_2025_04_08
- State: FINISHED
- Progress: 100.00% (1/1)
- Type: BROKER
- EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
- TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):3600;
max_filter_ratio:0.0; priority:NORMAL
- ErrorMsg: NULL
- CreateTime: 2025-04-10 17:50:53
- EtlStartTime: 2025-04-10 17:50:54
- EtlFinishTime: 2025-04-10 17:50:54
- LoadStartTime: 2025-04-10 17:50:54
- LoadFinishTime: 2025-04-10 17:50:54
- URL: NULL
- JobDetails: {"Unfinished
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
- TransactionId: 766228
- ErrorTablets: {}
- User: root
- Comment:
- 1 row in set (0.00 sec)
- ```
+ 由于 S3Load 导入是异步提交的,所以需要通过 show load 可以查看指定 label 的导入情况:
+
+ ```yaml
+ mysql> show load where label = "label_sales_data_2025_04_08"\G
+ *************************** 1. row ***************************
+ JobId: 17956078
+ Label: label_sales_data_2025_04_08
+ State: FINISHED
+ Progress: 100.00% (1/1)
+ Type: BROKER
+ EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+ TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):3600;
max_filter_ratio:0.0; priority:NORMAL
+ ErrorMsg: NULL
+ CreateTime: 2025-04-10 17:50:53
+ EtlStartTime: 2025-04-10 17:50:54
+ EtlFinishTime: 2025-04-10 17:50:54
+ LoadStartTime: 2025-04-10 17:50:54
+ LoadFinishTime: 2025-04-10 17:50:54
+ URL: NULL
+ JobDetails: {"Unfinished
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+ TransactionId: 766228
+ ErrorTablets: {}
+ User: root
+ Comment:
+ 1 row in set (0.00 sec)
+ ```
3.3. **处理导入过程中的错误**
- 当有多个导入任务时,可以通过以下语句,查询数据导入失败的日期和原因。
-
- ```SQL
- mysql> show load where state='CANCELLED' and label like "label_test%"\G
- *************************** 1. row ***************************
- JobId: 18312384
- Label: label_test123
- State: CANCELLED
- Progress: 100.00% (3/3)
- Type: BROKER
- EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
- TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):14400;
max_filter_ratio:0.0; priority:NORMAL
- ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to
cancel
- CreateTime: 2025-04-15 17:32:59
- EtlStartTime: 2025-04-15 17:33:02
- EtlFinishTime: 2025-04-15 17:33:02
- LoadStartTime: 2025-04-15 17:33:02
- LoadFinishTime: 2025-04-15 17:33:02
- URL:
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
- JobDetails: {"Unfinished
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
- TransactionId: 769213
- ErrorTablets: {}
- User: root
- Comment:
- ```
-
- 如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED),具体错误需要通过访问返回的 URL
的链接进行查看,如下是数据超过了表中的 Schema 中 country 列的实际长度:
-
- ```python
- [root@VM-10-6-centos ~]$ curl
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; .
src line [];
- ```
-
- 同时对于数据质量的错误,如果可以允许错误数据跳过的,可以通过在 S3 Load 任务中 Properties
设置容错率,具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
+ 当有多个导入任务时,可以通过以下语句,查询数据导入失败的日期和原因。
+
+ ```SQL
+ mysql> show load where state='CANCELLED' and label like "label_test%"\G
+ *************************** 1. row ***************************
+ JobId: 18312384
+ Label: label_test123
+ State: CANCELLED
+ Progress: 100.00% (3/3)
+ Type: BROKER
+ EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+ TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):14400;
max_filter_ratio:0.0; priority:NORMAL
+ ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to
cancel
+ CreateTime: 2025-04-15 17:32:59
+ EtlStartTime: 2025-04-15 17:33:02
+ EtlFinishTime: 2025-04-15 17:33:02
+ LoadStartTime: 2025-04-15 17:33:02
+ LoadFinishTime: 2025-04-15 17:33:02
+ URL: http://10.16.10.6:28747/api/_load_error_log?file=__shard_2
error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+ JobDetails: {"Unfinished
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+ TransactionId: 769213
+ ErrorTablets: {}
+ User: root
+ Comment:
+ ```
+
+ 如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED),具体错误需要通过访问返回的 URL
的链接进行查看,如下是数据超过了表中的 Schema 中 country 列的实际长度:
+
+ ```python
+ [root@VM-10-6-centos ~]$ curl
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2
error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src
line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; .
src line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src
line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9;
. src line [];
+ ```
+
+ 同时对于数据质量的错误,如果可以允许错误数据跳过的,可以通过在 S3 Load 任务中 Properties
设置容错率,具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
3.4. **导入多个分区的数据**
- 当需要迁移大数据量的存量数据时,建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区,数据量建议不超过
100GB,以减轻系统压力并降低导入失败后的重试成本。
+ 当需要迁移大数据量的存量数据时,建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区,数据量建议不超过
100GB,以减轻系统压力并降低导入失败后的重试成本。
- 可参考脚本
[s3_load_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_demo.sh),该脚本可以实现了轮询
S3 上的分区目录,同时提交 S3 Load 任务到 Doris 中,实现批量导入的效果。
\ No newline at end of file
+ 可参考脚本
[s3_load_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_demo.sh),该脚本可以实现了轮询
S3 上的分区目录,同时提交 S3 Load 任务到 Doris 中,实现批量导入的效果。
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/snowflake.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/snowflake.md
index 72d41d87dae..4096912b091 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/snowflake.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/snowflake.md
@@ -106,22 +106,20 @@ PROPERTIES (
2.1. **通过 COPY INFO 方式导出到 S3 Parquet 格式的文件**
- Snowflake 支持导出到 [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),导出时,建议按照**Doris
的分区字段**进行导出。以下为导出到 AWS S3 的示例:
+ Snowflake 支持导出到 [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),导出时,建议按照**Doris
的分区字段**进行导出。以下为导出到 AWS S3 的示例:
- ```sql
- CREATE FILE FORMAT my_parquet_format TYPE = parquet;
-
- CREATE OR REPLACE STAGE external_stage
- URL='s3://mybucket/sales_data'
- CREDENTIALS=(AWS_KEY_ID='<ak>' AWS_SECRET_KEY='<sk>')
- FILE_FORMAT = my_parquet_format;
-
- COPY INTO @external_stage from sales_data PARTITION BY (CAST(order_date AS
VARCHAR)) header=true;
- ```
+ ```sql
+ CREATE FILE FORMAT my_parquet_format TYPE = parquet;
+ CREATE OR REPLACE STAGE external_stage
+ URL='s3://mybucket/sales_data'
+ CREDENTIALS=(AWS_KEY_ID='<ak>' AWS_SECRET_KEY='<sk>')
+ FILE_FORMAT = my_parquet_format;
+ COPY INTO @external_stage from sales_data PARTITION BY (CAST(order_date AS
VARCHAR)) header=true;
+ ```
2.2. **查看 S3 上的导出文件**
- 导出后,在 S3 上会按照**分区划分成具体的子目录**,每一个目录是对应的 如下
+ 导出后,在 S3 上会按照**分区划分成具体的子目录**,每一个目录是对应的 如下

@@ -137,95 +135,95 @@ PROPERTIES (
3.1. **导入一个分区的数据**
- ```sql
- LOAD LABEL sales_data_2025_04_08
- (
- DATA INFILE("s3://mybucket/sales_data/2025_04_08/*")
- INTO TABLE sales_data
- FORMAT AS "parquet"
- (order_id, order_date, customer_name, amount, country)
- )
- WITH S3
- (
- "provider" = "S3",
- "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
- "s3.access_key" = "<ak>",
- "s3.secret_key" = "<sk>",
- "s3.region" = "ap-southeast-1"
- );
- ```
+ ```sql
+ LOAD LABEL sales_data_2025_04_08
+ (
+ DATA INFILE("s3://mybucket/sales_data/2025_04_08/*")
+ INTO TABLE sales_data
+ FORMAT AS "parquet"
+ (order_id, order_date, customer_name, amount, country)
+ )
+ WITH S3
+ (
+ "provider" = "S3",
+ "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
+ "s3.access_key" = "<ak>",
+ "s3.secret_key" = "<sk>",
+ "s3.region" = "ap-southeast-1"
+ );
+ ```
3.2. **通过 Show Load 查看任务运行情况**
- 由于 S3Load 导入是异步提交的,所以需要通过 show load 可以查看指定 label 的导入情况:
-
- ```yaml
- mysql> show load where label = "label_sales_data_2025_04_08"\G
- *************************** 1. row ***************************
- JobId: 17956078
- Label: label_sales_data_2025_04_08
- State: FINISHED
- Progress: 100.00% (1/1)
- Type: BROKER
- EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
- TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):3600;
max_filter_ratio:0.0; priority:NORMAL
- ErrorMsg: NULL
- CreateTime: 2025-04-10 17:50:53
- EtlStartTime: 2025-04-10 17:50:54
- EtlFinishTime: 2025-04-10 17:50:54
- LoadStartTime: 2025-04-10 17:50:54
- LoadFinishTime: 2025-04-10 17:50:54
- URL: NULL
- JobDetails: {"Unfinished
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
- TransactionId: 766228
- ErrorTablets: {}
- User: root
- Comment:
- 1 row in set (0.00 sec)
- ```
+ 由于 S3Load 导入是异步提交的,所以需要通过 show load 可以查看指定 label 的导入情况:
+
+ ```yaml
+ mysql> show load where label = "label_sales_data_2025_04_08"\G
+ *************************** 1. row ***************************
+ JobId: 17956078
+ Label: label_sales_data_2025_04_08
+ State: FINISHED
+ Progress: 100.00% (1/1)
+ Type: BROKER
+ EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+ TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):3600;
max_filter_ratio:0.0; priority:NORMAL
+ ErrorMsg: NULL
+ CreateTime: 2025-04-10 17:50:53
+ EtlStartTime: 2025-04-10 17:50:54
+ EtlFinishTime: 2025-04-10 17:50:54
+ LoadStartTime: 2025-04-10 17:50:54
+ LoadFinishTime: 2025-04-10 17:50:54
+ URL: NULL
+ JobDetails: {"Unfinished
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+ TransactionId: 766228
+ ErrorTablets: {}
+ User: root
+ Comment:
+ 1 row in set (0.00 sec)
+ ```
3.3. **处理导入过程中的错误**
- 当有多个导入任务时,可以通过以下语句,查询数据导入失败的日期和原因。
-
- ```SQL
- mysql> show load where state='CANCELLED' and label like "label_test%"\G
- *************************** 1. row ***************************
- JobId: 18312384
- Label: label_test123
- State: CANCELLED
- Progress: 100.00% (3/3)
- Type: BROKER
- EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
- TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):14400;
max_filter_ratio:0.0; priority:NORMAL
- ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to
cancel
- CreateTime: 2025-04-15 17:32:59
- EtlStartTime: 2025-04-15 17:33:02
- EtlFinishTime: 2025-04-15 17:33:02
- LoadStartTime: 2025-04-15 17:33:02
- LoadFinishTime: 2025-04-15 17:33:02
- URL:
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
- JobDetails: {"Unfinished
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
- TransactionId: 769213
- ErrorTablets: {}
- User: root
- Comment:
- ```
-
- 如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED),具体错误需要通过访问返回的 URL
的链接进行查看,如下是数据超过了表中的 Schema 中 country 列的实际长度:
-
- ```python
- [root@VM-10-6-centos ~]$ curl
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src
line [];
- Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; .
src line [];
- ```
-
- 同时对于数据质量的错误,如果可以允许错误数据跳过的,可以通过在 S3 Load 任务中 Properties
设置容错率,具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
+ 当有多个导入任务时,可以通过以下语句,查询数据导入失败的日期和原因。
+
+ ```SQL
+ mysql> show load where state='CANCELLED' and label like "label_test%"\G
+ *************************** 1. row ***************************
+ JobId: 18312384
+ Label: label_test123
+ State: CANCELLED
+ Progress: 100.00% (3/3)
+ Type: BROKER
+ EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+ TaskInfo: cluster:s3.ap-southeast-1.amazonaws.com; timeout(s):14400;
max_filter_ratio:0.0; priority:NORMAL
+ ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to
cancel
+ CreateTime: 2025-04-15 17:32:59
+ EtlStartTime: 2025-04-15 17:33:02
+ EtlFinishTime: 2025-04-15 17:33:02
+ LoadStartTime: 2025-04-15 17:33:02
+ LoadFinishTime: 2025-04-15 17:33:02
+ URL: http://10.16.10.6:28747/api/_load_error_log?file=__shard_2
error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+ JobDetails: {"Unfinished
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+ TransactionId: 769213
+ ErrorTablets: {}
+ User: root
+ Comment:
+ ```
+
+ 如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED),具体错误需要通过访问返回的 URL
的链接进行查看,如下是数据超过了表中的 Schema 中 country 列的实际长度:
+
+ ```python
+ [root@VM-10-6-centos ~]$ curl
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2
error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src
line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; .
src line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src
line [];
+ Reason: column_name[country], the length of input is too long than schema.
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9;
. src line [];
+ ```
+
+ 同时对于数据质量的错误,如果可以允许错误数据跳过的,可以通过在 S3 Load 任务中 Properties
设置容错率,具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
3.4. **导入多个分区的数据**
- 当需要迁移大数据量的存量数据时,建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区,数据量建议不超过
100GB,以减轻系统压力并降低导入失败后的重试成本。
+ 当需要迁移大数据量的存量数据时,建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区,数据量建议不超过
100GB,以减轻系统压力并降低导入失败后的重试成本。
- 可参考脚本
[s3_load_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_demo.sh),该脚本可以实现了轮询
S3 上的分区目录,同时提交 S3 Load 任务到 Doris 中,实现批量导入的效果。
\ No newline at end of file
+ 可参考脚本
[s3_load_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_demo.sh),该脚本可以实现了轮询
S3 上的分区目录,同时提交 S3 Load 任务到 Doris 中,实现批量导入的效果。
\ No newline at end of file
diff --git
a/versioned_docs/version-2.1/data-operate/import/data-source/snowflake.md
b/versioned_docs/version-2.1/data-operate/import/data-source/snowflake.md
index 845e70f9ba1..b728bb75920 100644
--- a/versioned_docs/version-2.1/data-operate/import/data-source/snowflake.md
+++ b/versioned_docs/version-2.1/data-operate/import/data-source/snowflake.md
@@ -107,7 +107,7 @@ PROPERTIES (
2.1. **Export to S3 Parquet Files via COPY INTO**
- Snowflake supports exporting to [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),**Export
data partitioned by Doris' partition fields**. Example for AWS S3:
+ Snowflake supports exporting to [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),**Export
data partitioned by Doris' partition fields**. Example for AWS S3:
```sql
CREATE FILE FORMAT my_parquet_format TYPE = parquet;
@@ -122,7 +122,7 @@ PROPERTIES (
2.2. **Verify Exported Files on S3**
- Exported files are organized into **subdirectories by partition** on S3:
+ Exported files are organized into **subdirectories by partition** on S3:

diff --git
a/versioned_docs/version-3.0/data-operate/import/data-source/snowflake.md
b/versioned_docs/version-3.0/data-operate/import/data-source/snowflake.md
index 845e70f9ba1..b728bb75920 100644
--- a/versioned_docs/version-3.0/data-operate/import/data-source/snowflake.md
+++ b/versioned_docs/version-3.0/data-operate/import/data-source/snowflake.md
@@ -107,7 +107,7 @@ PROPERTIES (
2.1. **Export to S3 Parquet Files via COPY INTO**
- Snowflake supports exporting to [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),**Export
data partitioned by Doris' partition fields**. Example for AWS S3:
+ Snowflake supports exporting to [AWS
S3](https://docs.snowflake.com/en/user-guide/data-unload-s3),[GCS](https://docs.snowflake.com/en/user-guide/data-unload-gcs),[AZURE](https://docs.snowflake.com/en/user-guide/data-unload-azure),**Export
data partitioned by Doris' partition fields**. Example for AWS S3:
```sql
CREATE FILE FORMAT my_parquet_format TYPE = parquet;
@@ -122,7 +122,7 @@ PROPERTIES (
2.2. **Verify Exported Files on S3**
- Exported files are organized into **subdirectories by partition** on S3:
+ Exported files are organized into **subdirectories by partition** on S3:

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]