This is an automated email from the ASF dual-hosted git repository.
lidongdai pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new 81834f26a7 [Improve][Doc][connector-file] Document SaveMode options
for HdfsFile/LocalFile (#10283)
81834f26a7 is described below
commit 81834f26a7ac2653810b7d07ac512dbfe909b2e2
Author: yzeng1618 <[email protected]>
AuthorDate: Wed Jan 7 22:15:19 2026 +0800
[Improve][Doc][connector-file] Document SaveMode options for
HdfsFile/LocalFile (#10283)
Co-authored-by: zengyi <[email protected]>
---
docs/en/connector-v2/sink/HdfsFile.md | 17 +++++++++++++++++
docs/zh/connector-v2/sink/HdfsFile.md | 17 +++++++++++++++++
docs/zh/connector-v2/sink/LocalFile.md | 21 +++++++++++++++++++--
.../file/hdfs/sink/HdfsFileSinkFactory.java | 2 ++
4 files changed, 55 insertions(+), 2 deletions(-)
diff --git a/docs/en/connector-v2/sink/HdfsFile.md
b/docs/en/connector-v2/sink/HdfsFile.md
index 073a56e345..a6c4ef429b 100644
--- a/docs/en/connector-v2/sink/HdfsFile.md
+++ b/docs/en/connector-v2/sink/HdfsFile.md
@@ -88,12 +88,29 @@ Output data to hdfs file
| enable_header_write | boolean | no | false
| Only used when file_format_type is text,csv.<br/>
false:don't write header,true:write header.
[...]
| encoding | string | no | "UTF-8"
| Only used when file_format_type is
json,text,csv,xml.
[...]
| remote_user | string | no | -
| The remote user name of hdfs.
[...]
+| schema_save_mode | string | no |
CREATE_SCHEMA_WHEN_NOT_EXIST | Existing dir processing method
[...]
+| data_save_mode | string | no | APPEND_DATA
| Existing data processing method
[...]
| merge_update_event | boolean | no | false
| Only used when file_format_type is
canal_json,debezium_json or maxwell_json. When value is true, the UPDATE_AFTER
and UPDATE_BEFORE event will be merged into UPDATE event data
[...]
### Tips
> If you use spark/flink, In order to use this connector, You must ensure your
> spark/flink cluster already integrated hadoop. The tested hadoop version is
> 2.x. If you use SeaTunnel Engine, It automatically integrated the hadoop jar
> when you download and install SeaTunnel Engine. You can check the jar
> package under ${SEATUNNEL_HOME}/lib to confirm this.
+### schema_save_mode [string]
+
+Existing dir processing method.
+- RECREATE_SCHEMA: will create when the dir does not exist, delete and
recreate when the dir is exist
+- CREATE_SCHEMA_WHEN_NOT_EXIST: will create when the dir does not exist,
skipped when the dir is exist
+- ERROR_WHEN_SCHEMA_NOT_EXIST: error will be reported when the dir does not
exist
+- IGNORE :Ignore the treatment of the table
+
+### data_save_mode [string]
+
+Existing data processing method.
+- DROP_DATA: preserve dir and delete data files
+- APPEND_DATA: preserve dir, preserve data files
+- ERROR_WHEN_DATA_EXISTS: when there is data files, an error is reported
+
### merge_update_event [boolean]
Only used when file_format_type is canal_json,debezium_json or maxwell_json.
diff --git a/docs/zh/connector-v2/sink/HdfsFile.md
b/docs/zh/connector-v2/sink/HdfsFile.md
index c1f7c2eb1c..dff7dfc1cd 100644
--- a/docs/zh/connector-v2/sink/HdfsFile.md
+++ b/docs/zh/connector-v2/sink/HdfsFile.md
@@ -79,6 +79,8 @@ import ChangeLog from '../changelog/connector-file-hadoop.md';
| max_rows_in_memory | int | 否 | -
| 仅当 file_format 为 excel 时使用。当文件格式为 Excel 时,可以缓存在内存中的最大数据项数。
|
| sheet_name | string | 否 | Sheet${Random number}
| 仅当 file_format 为 excel 时使用。将工作簿的表写入指定的表名
|
| remote_user | string | 否 | -
| Hdfs的远端用户名。
|
+| schema_save_mode | string | 否 |
CREATE_SCHEMA_WHEN_NOT_EXIST | 现有目录处理方式
|
+| data_save_mode | string | 否 | APPEND_DATA
| 现有数据处理方式
|
| merge_update_event | boolean | 否 | false
| 仅当file_format_type为canal_json、debezium_json、maxwell_json.
|
### 提示
@@ -87,6 +89,21 @@ import ChangeLog from
'../changelog/connector-file-hadoop.md';
> 2.x。如果您使用 SeaTunnel Engine,则在下载和安装 SeaTunnel Engine 时会自动集成 hadoop
> jar。您可以检查 `${SEATUNNEL_HOME}/lib` 下的 jar 包来确认这一点。
+### schema_save_mode [string]
+
+现有的目录处理方法。
+- RECREATE_SCHEMA:当目录不存在时创建,当目录存在时删除并重新创建
+- CREATE_SCHEMA_WHEN_NOT_EXIST:当目录不存在时创建,当目录存在时跳过
+- ERROR_WHEN_SCHEMA_NOT_EXIST:当目录不存在时,将报告错误
+- IGNORE:忽略对表的处理
+
+### data_save_mode [string]
+
+现有的数据处理方法。
+- DROP_DATA:保留目录并删除数据文件
+- APPEND_DATA:保留目录,保留数据文件
+- ERROR_WHEN_DATA_EXISTS:当有数据文件时,会报告错误
+
### merge_update_event [boolean]
仅当file_format_type为canal_json、debezium_json、maxwell_json时使用.
diff --git a/docs/zh/connector-v2/sink/LocalFile.md
b/docs/zh/connector-v2/sink/LocalFile.md
index 7bc92a564c..018c196fb5 100644
--- a/docs/zh/connector-v2/sink/LocalFile.md
+++ b/docs/zh/connector-v2/sink/LocalFile.md
@@ -72,8 +72,10 @@ import ChangeLog from '../changelog/connector-file-local.md';
| parquet_avro_write_timestamp_as_int96 | boolean | 否 | false
| 仅在 file_format 为 parquet 时使用
|
| parquet_avro_write_fixed_as_int96 | array | 否 | -
| 仅在 file_format 为 parquet 时使用
|
| enable_header_write | boolean | 否 | false
| 仅在 file_format_type 为 text,csv 时使用。<br/>
false:不写入表头,true:写入表头。 |
-| encoding | string | 否 | "UTF-8"
| 仅在 file_format_type 为 json,text,csv,xml 时使用
|
-| merge_update_event | boolean | 否 | false
|
仅当file_format_type为canal_json、debezium_json、maxwell_json. |
+| encoding | string | 否 | "UTF-8"
| 仅在 file_format_type 为 json,text,csv,xml 时使用
|
+| schema_save_mode | string | 否 |
CREATE_SCHEMA_WHEN_NOT_EXIST | 现有目录处理方式
|
+| data_save_mode | string | 否 | APPEND_DATA
| 现有数据处理方式
|
+| merge_update_event | boolean | 否 | false
|
仅当file_format_type为canal_json、debezium_json、maxwell_json. |
### path [string]
@@ -226,6 +228,21 @@ _root_tag [string]
仅在 file_format_type 为 json,text,csv,xml 时使用。文件写入的编码。该参数将通过
`Charset.forName(encoding)` 解析。
+### schema_save_mode [string]
+
+现有的目录处理方法。
+- RECREATE_SCHEMA:当目录不存在时创建,当目录存在时删除并重新创建
+- CREATE_SCHEMA_WHEN_NOT_EXIST:当目录不存在时创建,当目录存在时跳过
+- ERROR_WHEN_SCHEMA_NOT_EXIST:当目录不存在时,将报告错误
+- IGNORE:忽略对表的处理
+
+### data_save_mode [string]
+
+现有的数据处理方法。
+- DROP_DATA:保留目录并删除数据文件
+- APPEND_DATA:保留目录,保留数据文件
+- ERROR_WHEN_DATA_EXISTS:当有数据文件时,会报告错误
+
### merge_update_event [boolean]
仅当file_format_type为canal_json、debezium_json、maxwell_json时使用.
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/sink/HdfsFileSinkFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/sink/HdfsFileSinkFactory.java
index 600a9dff7c..bd8138ab12 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/sink/HdfsFileSinkFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/sink/HdfsFileSinkFactory.java
@@ -129,6 +129,8 @@ public class HdfsFileSinkFactory extends
BaseMultipleTableFileSinkFactory {
.optional(FileBaseSinkOptions.CREATE_EMPTY_FILE_WHEN_NO_DATA)
.optional(FileBaseSinkOptions.FILENAME_EXTENSION)
.optional(FileBaseSinkOptions.TMP_PATH)
+ .optional(FileBaseSinkOptions.SCHEMA_SAVE_MODE)
+ .optional(FileBaseSinkOptions.DATA_SAVE_MODE)
.build();
}