[GitHub] [flink] RocMarshal commented on a change in pull request #18718: [FLINK-25782] [docs] Translate datastream filesystem.md page into Chi…

GitBox Tue, 15 Feb 2022 10:54:22 -0800


RocMarshal commented on a change in pull request #18718:
URL: https://github.com/apache/flink/pull/18718#discussion_r806733589




##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -27,61 +27,58 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
+<a name="filesystem"></a>
 
-# FileSystem
+# 文件系统
 
-This connector provides a unified Source and Sink for `BATCH` and `STREAMING` 
that reads or writes (partitioned) files to file systems
-supported by the [Flink `FileSystem` abstraction]({{< ref 
"docs/deployment/filesystems/overview" >}}). This filesystem
-connector provides the same guarantees for both `BATCH` and `STREAMING` and is 
designed to provide exactly-once semantics for `STREAMING` execution.
+连接器提供了 `BATCH` 模式和 `STREAMING` 模式统一的 Source 和 Sink。[Flink `FileSystem` 
abstraction]({{< ref "docs/deployment/filesystems/overview" >}}) 
支持连接器对文件系统进行（分区）文件读写。文件系统连接器为 `BATCH` 和 `STREAMING` 模式提供了相同的保证，而且对 `STREAMING` 
模式执行提供了精确一次（exactly-once）语义保证。
 
-The connector supports reading and writing a set of files from any 
(distributed) file system (e.g. POSIX, S3, HDFS)
-with a [format]({{< ref "docs/connectors/datastream/formats/overview" >}}) 
(e.g., Avro, CSV, Parquet),
-and produces a stream or records.
+连接器支持对任意（分布式的）文件系统（例如，POSIX、 S3、 HDFS）以某种数据格式 [format]({{< ref 
"docs/connectors/datastream/formats/overview" >}}) (例如，Avro、 CSV、 Parquet) 
对文件进行写入，或者读取后生成数据流或一组记录。
+<a name="file-source"></a>
 
 ## File Source
 
-The `File Source` is based on the [Source API]({{< ref 
"docs/dev/datastream/sources" >}}#the-data-source-api),
-a unified data source that reads files - both in batch and in streaming mode.
-It is divided into the following two parts: `SplitEnumerator` and 
`SourceReader`.
+`File Source` 是基于 [Source API]({{< ref "docs/dev/datastream/sources" 
>}}#the-data-source-api) 同时支持批模式和流模式文件读取的统一数据源。
+`File Source` 分为以下两个部分：`SplitEnumerator` 和 `SourceReader`。
 
-* `SplitEnumerator` is responsible for discovering and identifying the files 
to read and assigns them to the `SourceReader`.
-* `SourceReader` requests the files it needs to process and reads the file 
from the filesystem.
+* `SplitEnumerator` 负责发现和识别需要读取的文件，并将这些文件分配给 `SourceReader` 进行读取。
+* `SourceReader` 请求需要处理的文件，并从文件系统中读取该文件。
 
-You will need to combine the File Source with a [format]({{< ref 
"docs/connectors/datastream/formats/overview" >}}), which allows you to
-parse CSV, decode AVRO, or read Parquet columnar files.
+你可能需要指定某种 [format]({{< ref "docs/connectors/datastream/formats/overview" >}}) 
与 `File Source` 联合进行解析 CSV、解码AVRO、或者读取 Parquet 列式文件。
+<a name="bounded-and-unbounded-streams"></a>

Review comment:
       ```suggestion
   
   <a name="bounded-and-unbounded-streams"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -27,61 +27,58 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
+<a name="filesystem"></a>
 
-# FileSystem
+# 文件系统
 
-This connector provides a unified Source and Sink for `BATCH` and `STREAMING` 
that reads or writes (partitioned) files to file systems
-supported by the [Flink `FileSystem` abstraction]({{< ref 
"docs/deployment/filesystems/overview" >}}). This filesystem
-connector provides the same guarantees for both `BATCH` and `STREAMING` and is 
designed to provide exactly-once semantics for `STREAMING` execution.
+连接器提供了 `BATCH` 模式和 `STREAMING` 模式统一的 Source 和 Sink。[Flink `FileSystem` 
abstraction]({{< ref "docs/deployment/filesystems/overview" >}}) 
支持连接器对文件系统进行（分区）文件读写。文件系统连接器为 `BATCH` 和 `STREAMING` 模式提供了相同的保证，而且对 `STREAMING` 
模式执行提供了精确一次（exactly-once）语义保证。
 
-The connector supports reading and writing a set of files from any 
(distributed) file system (e.g. POSIX, S3, HDFS)
-with a [format]({{< ref "docs/connectors/datastream/formats/overview" >}}) 
(e.g., Avro, CSV, Parquet),
-and produces a stream or records.
+连接器支持对任意（分布式的）文件系统（例如，POSIX、 S3、 HDFS）以某种数据格式 [format]({{< ref 
"docs/connectors/datastream/formats/overview" >}}) (例如，Avro、 CSV、 Parquet) 
对文件进行写入，或者读取后生成数据流或一组记录。
+<a name="file-source"></a>

Review comment:
       ```suggestion
   
   <a name="file-source"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -146,59 +142,54 @@ final FileSource<byte[]> source =
 {{< /tab >}}
 {{< /tabs >}}
 
-An example of a `SimpleStreamFormat` is `CsvReaderFormat`. It can be 
initialized like this:
+`CsvReaderFormat` 是一个实现 `SimpleStreamFormat` 接口的例子。可以像这样进行初始化：
 ```java
 CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
 FileSource<SomePojo> source = 
         FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
 ```
 
-The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+对于 CSV 格式的解析，在这个例子中，是根据使用 `Jackson` 库的 `SomePojo` 的字段自动生成的。（注意：你可能需要添加 
`@JsonPropertyOrder({field1, field2, ...})` 这个注释到你定义的类上，并且字段顺序与 CSV 文件列的顺序完全匹配)。
 
-If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+如果需要对 CSV 模式或解析选项进行更细粒度的控制，可以使用 `CsvReaderFormat` 的更低层次的 `forSchema` 静态工厂方法：
 
 ```java
 CsvReaderFormat<T> forSchema(CsvMapper mapper, 
                              CsvSchema schema, 
                              TypeInformation<T> typeInformation) 
 ```
+<a name="bulk-format"></a>
 
-#### Bulk Format
+#### Bulk 格式
 
-The BulkFormat reads and decodes batches of records at a time. Examples of 
bulk formats
-are formats like ORC or Parquet.
-The outer `BulkFormat` class acts mainly as a configuration holder and factory 
for the
-reader. The actual reading is done by the `BulkFormat.Reader`, which is 
created in the
-`BulkFormat#createReader(Configuration, FileSourceSplit)` method. If a bulk 
reader is
-created based on a checkpoint during checkpointed streaming execution, then 
the reader is
-re-created in the `BulkFormat#restoreReader(Configuration, FileSourceSplit)` 
method.
+Bulk 格式一次读取并解析一批记录。Bulk 格式的示例包括 ORC 或 Parquet 等格式。
+外部的 `BulkFormat` 类主要充当阅读器的配置持有者和工厂。`BulkFormat.Reader` 是在 
`BulkFormat#createReader(Configuration, FileSourceSplit)` 
方法中创建的，由它来完成读取操作。如果在流的 Checkpoint 执行期间基于 Checkpoint 创建 Bulk 阅读器，那么阅读器是在 
`BulkFormat#restoreReader(Configuration, FileSourceSplit)` 方法中重新创建的。
 
-A `SimpleStreamFormat` can be turned into a `BulkFormat` by wrapping it in a 
`StreamFormatAdapter`:
+可以通过将 `SimpleStreamFormat` 包装在 `StreamFormatAdapter` 中，将其转换成 `BulkFormat`：
 ```java
 BulkFormat<SomePojo, FileSourceSplit> bulkFormat = 
         new StreamFormatAdapter<>(CsvReaderFormat.forPojo(SomePojo.class));
 ```
+<a name="customizing-file-enumeration"></a>
 
-### Customizing File Enumeration
+### 自定义文件枚举类
 
 {{< tabs "CustomizingFileEnumeration" >}}
 {{< tab "Java" >}}
 ```java
 /**
- * A FileEnumerator implementation for hive source, which generates splits 
based on 
- * HiveTablePartition.
+ * 针对 Hive 数据源的 FileEnumerator 实现类，基于 HiveTablePartition 生成拆分文件
  */
 public class HiveSourceFileEnumerator implements FileEnumerator {
     
-    // reference constructor
+    // 构造函数

Review comment:
       ```suggestion
       // 构造方法
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -146,59 +142,54 @@ final FileSource<byte[]> source =
 {{< /tab >}}
 {{< /tabs >}}
 
-An example of a `SimpleStreamFormat` is `CsvReaderFormat`. It can be 
initialized like this:
+`CsvReaderFormat` 是一个实现 `SimpleStreamFormat` 接口的例子。可以像这样进行初始化：
 ```java
 CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
 FileSource<SomePojo> source = 
         FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
 ```
 
-The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+对于 CSV 格式的解析，在这个例子中，是根据使用 `Jackson` 库的 `SomePojo` 的字段自动生成的。（注意：你可能需要添加 
`@JsonPropertyOrder({field1, field2, ...})` 这个注释到你定义的类上，并且字段顺序与 CSV 文件列的顺序完全匹配)。
 
-If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+如果需要对 CSV 模式或解析选项进行更细粒度的控制，可以使用 `CsvReaderFormat` 的更低层次的 `forSchema` 静态工厂方法：
 
 ```java
 CsvReaderFormat<T> forSchema(CsvMapper mapper, 
                              CsvSchema schema, 
                              TypeInformation<T> typeInformation) 
 ```
+<a name="bulk-format"></a>
 
-#### Bulk Format
+#### Bulk 格式
 
-The BulkFormat reads and decodes batches of records at a time. Examples of 
bulk formats
-are formats like ORC or Parquet.
-The outer `BulkFormat` class acts mainly as a configuration holder and factory 
for the
-reader. The actual reading is done by the `BulkFormat.Reader`, which is 
created in the
-`BulkFormat#createReader(Configuration, FileSourceSplit)` method. If a bulk 
reader is
-created based on a checkpoint during checkpointed streaming execution, then 
the reader is
-re-created in the `BulkFormat#restoreReader(Configuration, FileSourceSplit)` 
method.
+Bulk 格式一次读取并解析一批记录。Bulk 格式的示例包括 ORC 或 Parquet 等格式。
+外部的 `BulkFormat` 类主要充当阅读器的配置持有者和工厂。`BulkFormat.Reader` 是在 
`BulkFormat#createReader(Configuration, FileSourceSplit)` 
方法中创建的，由它来完成读取操作。如果在流的 Checkpoint 执行期间基于 Checkpoint 创建 Bulk 阅读器，那么阅读器是在 
`BulkFormat#restoreReader(Configuration, FileSourceSplit)` 方法中重新创建的。
 
-A `SimpleStreamFormat` can be turned into a `BulkFormat` by wrapping it in a 
`StreamFormatAdapter`:
+可以通过将 `SimpleStreamFormat` 包装在 `StreamFormatAdapter` 中，将其转换成 `BulkFormat`：
 ```java
 BulkFormat<SomePojo, FileSourceSplit> bulkFormat = 
         new StreamFormatAdapter<>(CsvReaderFormat.forPojo(SomePojo.class));
 ```
+<a name="customizing-file-enumeration"></a>
 
-### Customizing File Enumeration
+### 自定义文件枚举类
 
 {{< tabs "CustomizingFileEnumeration" >}}
 {{< tab "Java" >}}
 ```java
 /**
- * A FileEnumerator implementation for hive source, which generates splits 
based on 
- * HiveTablePartition.
+ * 针对 Hive 数据源的 FileEnumerator 实现类，基于 HiveTablePartition 生成拆分文件
  */
 public class HiveSourceFileEnumerator implements FileEnumerator {
     
-    // reference constructor
+    // 构造函数
     public HiveSourceFileEnumerator(...) {
         ...
     }
 
     /***
-     * Generates all file splits for the relevant files under the given paths. 
The {@code
-     * minDesiredSplits} is an optional hint indicating how many splits would 
be necessary to
-     * exploit parallelism properly.
+     * 拆分给定路径下的相关所有文件。{@code

Review comment:
       ```suggestion
        * 拆分指定路径下的所有相关文件。{@code
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话

Review comment:
       ```
   ### 后记
   ```
   ？

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
 
-Data within the bucket directories is split into part files. Each bucket will 
contain at least one part file for
-each subtask of the sink that has received data for that bucket. Additional 
part files will be created according to the configurable
-rolling policy. For `Row-encoded Formats` (see [File Formats](#file-formats)) 
the default policy rolls part files based
-on size, a timeout that specifies the maximum duration for which a file can be 
open, and a maximum inactivity
-timeout after which the file is closed. For `Bulk-encoded Formats` we roll on 
every checkpoint and the user can
-specify additional conditions based on size or time.
+桶目录中的数据被拆分成多个 Part 文件。对于相应的接收数据的桶的 Sink 的每个 Subtask ，每个桶将至少包含一个 Part 
文件。将根据配置的回滚策略来创建其他 Part 文件。
+对于 `Row-encoded Formats` （参考 [Format Types](#sink-format-types)） 默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的不活动状态的超时。

Review comment:
       nit: 
   ```
   对于 `Row-encoded Formats`（参考 [Format Types](#sink-format-types)）默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的非活动状态的超时时间。
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -958,61 +936,47 @@ val sink = FileSink
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="important-considerations"></a>
 
-### Important Considerations
+### 重要提示
+<a name="general"></a>
 
-#### General
+#### 整体提示
 
-<span class="label label-danger">Important Note 1</span>: When using Hadoop < 
2.7, please use
-the `OnCheckpointRollingPolicy` which rolls part files on every checkpoint. 
The reason is that if part files "traverse"
-the checkpoint interval, then, upon recovery from a failure the `FileSink` may 
use the `truncate()` method of the
-filesystem to discard uncommitted data from the in-progress file. This method 
is not supported by pre-2.7 Hadoop versions
-and Flink will throw an exception.
+<span class="label label-danger">重要提示 1</span>： 当使用的 Hadoop 版本 < 2.7 时，
+当每次 Checkpoint 时请使用 `OnCheckpointRollingPolicy` 回滚 Part 文件。原因是：如果 Part 文件 "穿越" 
了 Checkpoint 的时间间隔，

Review comment:
       `回滚`->`滚动` ？
   If you think so.
   Please keep it coincident in the rest of the page.

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -146,59 +142,54 @@ final FileSource<byte[]> source =
 {{< /tab >}}
 {{< /tabs >}}
 
-An example of a `SimpleStreamFormat` is `CsvReaderFormat`. It can be 
initialized like this:
+`CsvReaderFormat` 是一个实现 `SimpleStreamFormat` 接口的例子。可以像这样进行初始化：
 ```java
 CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
 FileSource<SomePojo> source = 
         FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
 ```
 
-The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+对于 CSV 格式的解析，在这个例子中，是根据使用 `Jackson` 库的 `SomePojo` 的字段自动生成的。（注意：你可能需要添加 
`@JsonPropertyOrder({field1, field2, ...})` 这个注释到你定义的类上，并且字段顺序与 CSV 文件列的顺序完全匹配)。
 
-If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+如果需要对 CSV 模式或解析选项进行更细粒度的控制，可以使用 `CsvReaderFormat` 的更低层次的 `forSchema` 静态工厂方法：
 
 ```java
 CsvReaderFormat<T> forSchema(CsvMapper mapper, 
                              CsvSchema schema, 
                              TypeInformation<T> typeInformation) 
 ```
+<a name="bulk-format"></a>
 
-#### Bulk Format
+#### Bulk 格式

Review comment:
       nit: keep original content ?

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>

Review comment:
       ```suggestion
   
   <a name="current-limitations"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -146,59 +142,54 @@ final FileSource<byte[]> source =
 {{< /tab >}}
 {{< /tabs >}}
 
-An example of a `SimpleStreamFormat` is `CsvReaderFormat`. It can be 
initialized like this:
+`CsvReaderFormat` 是一个实现 `SimpleStreamFormat` 接口的例子。可以像这样进行初始化：
 ```java
 CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
 FileSource<SomePojo> source = 
         FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
 ```
 
-The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+对于 CSV 格式的解析，在这个例子中，是根据使用 `Jackson` 库的 `SomePojo` 的字段自动生成的。（注意：你可能需要添加 
`@JsonPropertyOrder({field1, field2, ...})` 这个注释到你定义的类上，并且字段顺序与 CSV 文件列的顺序完全匹配)。
 
-If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+如果需要对 CSV 模式或解析选项进行更细粒度的控制，可以使用 `CsvReaderFormat` 的更低层次的 `forSchema` 静态工厂方法：
 
 ```java
 CsvReaderFormat<T> forSchema(CsvMapper mapper, 
                              CsvSchema schema, 
                              TypeInformation<T> typeInformation) 
 ```
+<a name="bulk-format"></a>
 
-#### Bulk Format
+#### Bulk 格式
 
-The BulkFormat reads and decodes batches of records at a time. Examples of 
bulk formats
-are formats like ORC or Parquet.
-The outer `BulkFormat` class acts mainly as a configuration holder and factory 
for the
-reader. The actual reading is done by the `BulkFormat.Reader`, which is 
created in the
-`BulkFormat#createReader(Configuration, FileSourceSplit)` method. If a bulk 
reader is
-created based on a checkpoint during checkpointed streaming execution, then 
the reader is
-re-created in the `BulkFormat#restoreReader(Configuration, FileSourceSplit)` 
method.
+Bulk 格式一次读取并解析一批记录。Bulk 格式的示例包括 ORC 或 Parquet 等格式。
+外部的 `BulkFormat` 类主要充当阅读器的配置持有者和工厂。`BulkFormat.Reader` 是在 
`BulkFormat#createReader(Configuration, FileSourceSplit)` 
方法中创建的，由它来完成读取操作。如果在流的 Checkpoint 执行期间基于 Checkpoint 创建 Bulk 阅读器，那么阅读器是在 
`BulkFormat#restoreReader(Configuration, FileSourceSplit)` 方法中重新创建的。
 
-A `SimpleStreamFormat` can be turned into a `BulkFormat` by wrapping it in a 
`StreamFormatAdapter`:
+可以通过将 `SimpleStreamFormat` 包装在 `StreamFormatAdapter` 中，将其转换成 `BulkFormat`：

Review comment:
       ```suggestion
   可以通过将 `SimpleStreamFormat` 包装在 `StreamFormatAdapter` 中转换为 `BulkFormat`：
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -93,34 +90,33 @@ final FileSource<String> source =
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="source-format-types"></a>
 
 ### Format Types
 
-The reading of each file happens through file readers defined by file formats.
-These define the parsing logic for the contents of the file. There are 
multiple classes that the source supports.
-The interfaces are a tradeoff between simplicity of implementation and 
flexibility/efficiency.
+通过文件格式定义的文件阅读器读取每个文件。
+它们定义了解析和读取文件内容的逻辑。数据源支持多个解析类。
+这些接口是实现简单性和灵活性/效率之间的折衷。
 
-* A `StreamFormat` reads the contents of a file from a file stream. It is the 
simplest format to implement,
-  and provides many features out-of-the-box (like checkpointing logic) but is 
limited in the optimizations it can apply
-  (such as object reuse, batching, etc.).
+*  `StreamFormat` 从文件流中读取文件内容。它是最简单的格式实现，
+   并且提供了许多拆箱即用的特性（如 Checkpoint 逻辑），但是在可应用的优化方面受到限制（例如对象重用，批处理等等）。
 
-* A `BulkFormat` reads batches of records from a file at a time.
-  It is the most "low level" format to implement, but offers the greatest 
flexibility to optimize the implementation.
+* `BulkFormat` 从文件中一次读取一批记录。
+  它虽然是最 "底层" 的格式实现，但是提供了优化实现的最大灵活性。
+<a name="textline-format"></a>

Review comment:
       ```suggestion
   
   <a name="textline-format"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -93,34 +90,33 @@ final FileSource<String> source =
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="source-format-types"></a>

Review comment:
       ```suggestion
   
   <a name="source-format-types"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。

Review comment:
       nit: ```
   对于无界 File Sources，枚举器会会将当前所有已处理文件的路径记录到 state 中，在某些情况下，这可能会导致状态变得相当大。
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -146,59 +142,54 @@ final FileSource<byte[]> source =
 {{< /tab >}}
 {{< /tabs >}}
 
-An example of a `SimpleStreamFormat` is `CsvReaderFormat`. It can be 
initialized like this:
+`CsvReaderFormat` 是一个实现 `SimpleStreamFormat` 接口的例子。可以像这样进行初始化：
 ```java
 CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
 FileSource<SomePojo> source = 
         FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
 ```
 
-The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+对于 CSV 格式的解析，在这个例子中，是根据使用 `Jackson` 库的 `SomePojo` 的字段自动生成的。（注意：你可能需要添加 
`@JsonPropertyOrder({field1, field2, ...})` 这个注释到你定义的类上，并且字段顺序与 CSV 文件列的顺序完全匹配)。
 
-If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+如果需要对 CSV 模式或解析选项进行更细粒度的控制，可以使用 `CsvReaderFormat` 的更低层次的 `forSchema` 静态工厂方法：
 
 ```java
 CsvReaderFormat<T> forSchema(CsvMapper mapper, 
                              CsvSchema schema, 
                              TypeInformation<T> typeInformation) 
 ```
+<a name="bulk-format"></a>
 
-#### Bulk Format
+#### Bulk 格式
 
-The BulkFormat reads and decodes batches of records at a time. Examples of 
bulk formats
-are formats like ORC or Parquet.
-The outer `BulkFormat` class acts mainly as a configuration holder and factory 
for the
-reader. The actual reading is done by the `BulkFormat.Reader`, which is 
created in the
-`BulkFormat#createReader(Configuration, FileSourceSplit)` method. If a bulk 
reader is
-created based on a checkpoint during checkpointed streaming execution, then 
the reader is
-re-created in the `BulkFormat#restoreReader(Configuration, FileSourceSplit)` 
method.
+Bulk 格式一次读取并解析一批记录。Bulk 格式的示例包括 ORC 或 Parquet 等格式。

Review comment:
       ```suggestion
   BulkFormat 一次读取并解析一批记录。Bulk 格式的实现包括 ORC 或 Parquet 等格式。
   ```
   a minor suggestion.  Maybe you could do it better.

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。

Review comment:
       nit: 
   ```
   对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急于在一个文件中推进，而下一个文件可能包含比 Watermark 
更晚的数据。
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -146,59 +142,54 @@ final FileSource<byte[]> source =
 {{< /tab >}}
 {{< /tabs >}}
 
-An example of a `SimpleStreamFormat` is `CsvReaderFormat`. It can be 
initialized like this:
+`CsvReaderFormat` 是一个实现 `SimpleStreamFormat` 接口的例子。可以像这样进行初始化：
 ```java
 CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
 FileSource<SomePojo> source = 
         FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
 ```
 
-The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+对于 CSV 格式的解析，在这个例子中，是根据使用 `Jackson` 库的 `SomePojo` 的字段自动生成的。（注意：你可能需要添加 
`@JsonPropertyOrder({field1, field2, ...})` 这个注释到你定义的类上，并且字段顺序与 CSV 文件列的顺序完全匹配)。
 
-If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+如果需要对 CSV 模式或解析选项进行更细粒度的控制，可以使用 `CsvReaderFormat` 的更低层次的 `forSchema` 静态工厂方法：
 
 ```java
 CsvReaderFormat<T> forSchema(CsvMapper mapper, 
                              CsvSchema schema, 
                              TypeInformation<T> typeInformation) 
 ```
+<a name="bulk-format"></a>
 
-#### Bulk Format
+#### Bulk 格式
 
-The BulkFormat reads and decodes batches of records at a time. Examples of 
bulk formats
-are formats like ORC or Parquet.
-The outer `BulkFormat` class acts mainly as a configuration holder and factory 
for the
-reader. The actual reading is done by the `BulkFormat.Reader`, which is 
created in the
-`BulkFormat#createReader(Configuration, FileSourceSplit)` method. If a bulk 
reader is
-created based on a checkpoint during checkpointed streaming execution, then 
the reader is
-re-created in the `BulkFormat#restoreReader(Configuration, FileSourceSplit)` 
method.
+Bulk 格式一次读取并解析一批记录。Bulk 格式的示例包括 ORC 或 Parquet 等格式。
+外部的 `BulkFormat` 类主要充当阅读器的配置持有者和工厂。`BulkFormat.Reader` 是在 
`BulkFormat#createReader(Configuration, FileSourceSplit)` 
方法中创建的，由它来完成读取操作。如果在流的 Checkpoint 执行期间基于 Checkpoint 创建 Bulk 阅读器，那么阅读器是在 
`BulkFormat#restoreReader(Configuration, FileSourceSplit)` 方法中重新创建的。

Review comment:
       `主要充当阅读器的配置持有者和工厂` -> `主要充当 reader 的配置持有者和工厂角色。`
   
   如果在流的 Checkpoint 执行期间基于 Checkpoint 创建 Bulk 阅读器，那么阅读器是在 
\`BulkFormat#restoreReader(Configuration, FileSourceSplit)\` 方法中重新创建的。`->`如果在流的 
checkpoint 执行期间基于 checkpoint 创建 Bulk reader，那么 reader 将在 
\`BulkFormat#restoreReader(Configuration, FileSourceSplit) \` 方法中被重新创建。

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。

Review comment:
       ```
   未来将计划引入一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
   ```
   Only a  minor comment. Maybe you could translate it better.

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -27,61 +27,58 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
+<a name="filesystem"></a>
 
-# FileSystem
+# 文件系统
 
-This connector provides a unified Source and Sink for `BATCH` and `STREAMING` 
that reads or writes (partitioned) files to file systems
-supported by the [Flink `FileSystem` abstraction]({{< ref 
"docs/deployment/filesystems/overview" >}}). This filesystem
-connector provides the same guarantees for both `BATCH` and `STREAMING` and is 
designed to provide exactly-once semantics for `STREAMING` execution.
+连接器提供了 `BATCH` 模式和 `STREAMING` 模式统一的 Source 和 Sink。[Flink `FileSystem` 
abstraction]({{< ref "docs/deployment/filesystems/overview" >}}) 
支持连接器对文件系统进行（分区）文件读写。文件系统连接器为 `BATCH` 和 `STREAMING` 模式提供了相同的保证，而且对 `STREAMING` 
模式执行提供了精确一次（exactly-once）语义保证。
 
-The connector supports reading and writing a set of files from any 
(distributed) file system (e.g. POSIX, S3, HDFS)
-with a [format]({{< ref "docs/connectors/datastream/formats/overview" >}}) 
(e.g., Avro, CSV, Parquet),
-and produces a stream or records.
+连接器支持对任意（分布式的）文件系统（例如，POSIX、 S3、 HDFS）以某种数据格式 [format]({{< ref 
"docs/connectors/datastream/formats/overview" >}}) (例如，Avro、 CSV、 Parquet) 
对文件进行写入，或者读取后生成数据流或一组记录。
+<a name="file-source"></a>
 
 ## File Source
 
-The `File Source` is based on the [Source API]({{< ref 
"docs/dev/datastream/sources" >}}#the-data-source-api),
-a unified data source that reads files - both in batch and in streaming mode.
-It is divided into the following two parts: `SplitEnumerator` and 
`SourceReader`.
+`File Source` 是基于 [Source API]({{< ref "docs/dev/datastream/sources" 
>}}#the-data-source-api) 同时支持批模式和流模式文件读取的统一数据源。
+`File Source` 分为以下两个部分：`SplitEnumerator` 和 `SourceReader`。
 
-* `SplitEnumerator` is responsible for discovering and identifying the files 
to read and assigns them to the `SourceReader`.
-* `SourceReader` requests the files it needs to process and reads the file 
from the filesystem.
+* `SplitEnumerator` 负责发现和识别需要读取的文件，并将这些文件分配给 `SourceReader` 进行读取。
+* `SourceReader` 请求需要处理的文件，并从文件系统中读取该文件。
 
-You will need to combine the File Source with a [format]({{< ref 
"docs/connectors/datastream/formats/overview" >}}), which allows you to
-parse CSV, decode AVRO, or read Parquet columnar files.
+你可能需要指定某种 [format]({{< ref "docs/connectors/datastream/formats/overview" >}}) 
与 `File Source` 联合进行解析 CSV、解码AVRO、或者读取 Parquet 列式文件。
+<a name="bounded-and-unbounded-streams"></a>
 
-#### Bounded and Unbounded Streams
+#### 有界流和无界流
 
-A bounded `File Source` lists all files (via SplitEnumerator - a recursive 
directory list with filtered-out hidden files) and reads them all.
+有界的 `File Source`（通过 SplitEnumerator）列出所有文件（一个过滤出隐藏文件的递归目录列表）并读取。
 
-An unbounded `File Source` is created when configuring the enumerator for 
periodic file discovery.
-In this case, the `SplitEnumerator` will enumerate like the bounded case but, 
after a certain interval, repeats the enumeration.
-For any repeated enumeration, the `SplitEnumerator` filters out previously 
detected files and only sends new ones to the `SourceReader`.
+无界的 `File Source` 由配置定期扫描文件的 enumerator 创建。
+在无界的情况下，`SplitEnumerator` 将像有界的 `File Source` 
一样列出所有文件，但是不同的是，经过一个时间间隔之后，重复上述操作。
+对于每一次列举操作，`SplitEnumerator` 会过滤掉之前已经检测过的文件，将新扫描到的文件发送给 `SourceReader`。
+<a name="usage"></a>

Review comment:
       ```suggestion
   
   <a name="usage"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。

Review comment:
       nit:
   ```
   完全可以配置为基于时间向往桶中写入数据，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考

Review comment:
       ```suggestion
   如果你对新设计的数据源 API 中的 File Sources 是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
   ```
   Please keep `文件源` -> `File Sources` or `File Source` or other proper 
substitute, but in anywhere consistent in the full page.
   
   The same is true for `File Sink`
   
   

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
 
-Data within the bucket directories is split into part files. Each bucket will 
contain at least one part file for
-each subtask of the sink that has received data for that bucket. Additional 
part files will be created according to the configurable
-rolling policy. For `Row-encoded Formats` (see [File Formats](#file-formats)) 
the default policy rolls part files based
-on size, a timeout that specifies the maximum duration for which a file can be 
open, and a maximum inactivity
-timeout after which the file is closed. For `Bulk-encoded Formats` we roll on 
every checkpoint and the user can
-specify additional conditions based on size or time.
+桶目录中的数据被拆分成多个 Part 文件。对于相应的接收数据的桶的 Sink 的每个 Subtask ，每个桶将至少包含一个 Part 
文件。将根据配置的回滚策略来创建其他 Part 文件。
+对于 `Row-encoded Formats` （参考 [Format Types](#sink-format-types)） 默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的不活动状态的超时。
+对于 `Bulk-encoded Formats` 我们在每次创建 Checkpoint 时进行回滚，并且用户也可以添加基于大小或者时间等的其他条件。
 
 {{< hint info >}}
 
-**IMPORTANT**: Checkpointing needs to be enabled when using the `FileSink` in 
`STREAMING` mode. Part files
-can only be finalized on successful checkpoints. If checkpointing is disabled, 
part files will forever stay
-in the `in-progress` or the `pending` state, and cannot be safely read by 
downstream systems.
+**重要**: 在 `STREAMING` 模式下使用 `FileSink` 需要开启 Checkpoint 功能。
+文件只在 Checkpoint 成功时生成。如果没有开启 Checkpoint 功能，文件将永远停留在 `in-progress` 或者 `pending` 
的状态，并且下游系统将不能安全读取该文件数据。
 
 {{< /hint >}}
 
 {{< img src="/fig/streamfilesink_bucketing.png"  width="100%" >}}
+<a name="sink-format-types"></a>

Review comment:
       ```suggestion
   
   <a name="sink-format-types"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
 
-Data within the bucket directories is split into part files. Each bucket will 
contain at least one part file for
-each subtask of the sink that has received data for that bucket. Additional 
part files will be created according to the configurable
-rolling policy. For `Row-encoded Formats` (see [File Formats](#file-formats)) 
the default policy rolls part files based
-on size, a timeout that specifies the maximum duration for which a file can be 
open, and a maximum inactivity
-timeout after which the file is closed. For `Bulk-encoded Formats` we roll on 
every checkpoint and the user can
-specify additional conditions based on size or time.
+桶目录中的数据被拆分成多个 Part 文件。对于相应的接收数据的桶的 Sink 的每个 Subtask ，每个桶将至少包含一个 Part 
文件。将根据配置的回滚策略来创建其他 Part 文件。
+对于 `Row-encoded Formats` （参考 [Format Types](#sink-format-types)） 默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的不活动状态的超时。
+对于 `Bulk-encoded Formats` 我们在每次创建 Checkpoint 时进行回滚，并且用户也可以添加基于大小或者时间等的其他条件。

Review comment:
       ```suggestion
   对于 `Bulk-encoded Formats` 我们在每次创建 Checkpoint 时进行滚动，并且用户也可以添加基于大小或者时间等的其他条件。
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
 
-Data within the bucket directories is split into part files. Each bucket will 
contain at least one part file for
-each subtask of the sink that has received data for that bucket. Additional 
part files will be created according to the configurable
-rolling policy. For `Row-encoded Formats` (see [File Formats](#file-formats)) 
the default policy rolls part files based
-on size, a timeout that specifies the maximum duration for which a file can be 
open, and a maximum inactivity
-timeout after which the file is closed. For `Bulk-encoded Formats` we roll on 
every checkpoint and the user can
-specify additional conditions based on size or time.
+桶目录中的数据被拆分成多个 Part 文件。对于相应的接收数据的桶的 Sink 的每个 Subtask ，每个桶将至少包含一个 Part 
文件。将根据配置的回滚策略来创建其他 Part 文件。
+对于 `Row-encoded Formats` （参考 [Format Types](#sink-format-types)） 默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的不活动状态的超时。
+对于 `Bulk-encoded Formats` 我们在每次创建 Checkpoint 时进行回滚，并且用户也可以添加基于大小或者时间等的其他条件。
 
 {{< hint info >}}
 
-**IMPORTANT**: Checkpointing needs to be enabled when using the `FileSink` in 
`STREAMING` mode. Part files
-can only be finalized on successful checkpoints. If checkpointing is disabled, 
part files will forever stay
-in the `in-progress` or the `pending` state, and cannot be safely read by 
downstream systems.
+**重要**: 在 `STREAMING` 模式下使用 `FileSink` 需要开启 Checkpoint 功能。
+文件只在 Checkpoint 成功时生成。如果没有开启 Checkpoint 功能，文件将永远停留在 `in-progress` 或者 `pending` 
的状态，并且下游系统将不能安全读取该文件数据。
 
 {{< /hint >}}
 
 {{< img src="/fig/streamfilesink_bucketing.png"  width="100%" >}}
+<a name="sink-format-types"></a>
 
 ### Format Types
 
-The `FileSink` supports both row-wise and bulk encoding formats, such as 
[Apache Parquet](http://parquet.apache.org).
-These two variants come with their respective builders that can be created 
with the following static methods:
+`FileSink` 不仅支持行编码格式也支持 Bulk 编码格式，例如 [Apache 
Parquet](http://parquet.apache.org)。
+这两种格式可以通过如下的静态方法进行构造：
 
 - Row-encoded sink: `FileSink.forRowFormat(basePath, rowEncoder)`
 - Bulk-encoded sink: `FileSink.forBulkFormat(basePath, bulkWriterFactory)`
 
-When creating either a row or a bulk encoded sink we have to specify the base 
path where the buckets will be
-stored and the encoding logic for our data.
+不论创建行或者 Bulk 格式的 Sink 时，我们都必须指定桶的路径以及对数据进行编码的逻辑。
 
-Please check out the JavaDoc for {{< javadoc 
file="org/apache/flink/connector/file/sink/FileSink.html" name="FileSink">}}
-for all the configuration options and more documentation about the 
implementation of the different data formats.
+请参考 JavaDoc 文档 {{< javadoc 
file="org/apache/flink/connector/file/sink/FileSink.html" name="FileSink">}}
+来获取所有的配置选项以及更多的不同数据格式实现的详细信息。
+<a name="row-encoded-formats"></a>

Review comment:
       ```suggestion
   
   <a name="row-encoded-formats"></a>
   ```

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -93,34 +90,33 @@ final FileSource<String> source =
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="source-format-types"></a>
 
 ### Format Types
 
-The reading of each file happens through file readers defined by file formats.
-These define the parsing logic for the contents of the file. There are 
multiple classes that the source supports.
-The interfaces are a tradeoff between simplicity of implementation and 
flexibility/efficiency.
+通过文件格式定义的文件阅读器读取每个文件。
+它们定义了解析和读取文件内容的逻辑。数据源支持多个解析类。
+这些接口是实现简单性和灵活性/效率之间的折衷。
 
-* A `StreamFormat` reads the contents of a file from a file stream. It is the 
simplest format to implement,
-  and provides many features out-of-the-box (like checkpointing logic) but is 
limited in the optimizations it can apply
-  (such as object reuse, batching, etc.).
+*  `StreamFormat` 从文件流中读取文件内容。它是最简单的格式实现，
+   并且提供了许多拆箱即用的特性（如 Checkpoint 逻辑），但是在可应用的优化方面受到限制（例如对象重用，批处理等等）。
 
-* A `BulkFormat` reads batches of records from a file at a time.
-  It is the most "low level" format to implement, but offers the greatest 
flexibility to optimize the implementation.
+* `BulkFormat` 从文件中一次读取一批记录。
+  它虽然是最 "底层" 的格式实现，但是提供了优化实现的最大灵活性。
+<a name="textline-format"></a>
 
-#### TextLine Format
+#### TextLine 格式

Review comment:
       keep original content?

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
 
-Data within the bucket directories is split into part files. Each bucket will 
contain at least one part file for
-each subtask of the sink that has received data for that bucket. Additional 
part files will be created according to the configurable
-rolling policy. For `Row-encoded Formats` (see [File Formats](#file-formats)) 
the default policy rolls part files based
-on size, a timeout that specifies the maximum duration for which a file can be 
open, and a maximum inactivity
-timeout after which the file is closed. For `Bulk-encoded Formats` we roll on 
every checkpoint and the user can
-specify additional conditions based on size or time.
+桶目录中的数据被拆分成多个 Part 文件。对于相应的接收数据的桶的 Sink 的每个 Subtask ，每个桶将至少包含一个 Part 
文件。将根据配置的回滚策略来创建其他 Part 文件。
+对于 `Row-encoded Formats` （参考 [Format Types](#sink-format-types)） 默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的不活动状态的超时。
+对于 `Bulk-encoded Formats` 我们在每次创建 Checkpoint 时进行回滚，并且用户也可以添加基于大小或者时间等的其他条件。
 
 {{< hint info >}}
 
-**IMPORTANT**: Checkpointing needs to be enabled when using the `FileSink` in 
`STREAMING` mode. Part files
-can only be finalized on successful checkpoints. If checkpointing is disabled, 
part files will forever stay
-in the `in-progress` or the `pending` state, and cannot be safely read by 
downstream systems.
+**重要**: 在 `STREAMING` 模式下使用 `FileSink` 需要开启 Checkpoint 功能。
+文件只在 Checkpoint 成功时生成。如果没有开启 Checkpoint 功能，文件将永远停留在 `in-progress` 或者 `pending` 
的状态，并且下游系统将不能安全读取该文件数据。
 
 {{< /hint >}}
 
 {{< img src="/fig/streamfilesink_bucketing.png"  width="100%" >}}
+<a name="sink-format-types"></a>
 
 ### Format Types
 
-The `FileSink` supports both row-wise and bulk encoding formats, such as 
[Apache Parquet](http://parquet.apache.org).
-These two variants come with their respective builders that can be created 
with the following static methods:
+`FileSink` 不仅支持行编码格式也支持 Bulk 编码格式，例如 [Apache 
Parquet](http://parquet.apache.org)。
+这两种格式可以通过如下的静态方法进行构造：
 
 - Row-encoded sink: `FileSink.forRowFormat(basePath, rowEncoder)`
 - Bulk-encoded sink: `FileSink.forBulkFormat(basePath, bulkWriterFactory)`
 
-When creating either a row or a bulk encoded sink we have to specify the base 
path where the buckets will be
-stored and the encoding logic for our data.
+不论创建行或者 Bulk 格式的 Sink 时，我们都必须指定桶的路径以及对数据进行编码的逻辑。
 
-Please check out the JavaDoc for {{< javadoc 
file="org/apache/flink/connector/file/sink/FileSink.html" name="FileSink">}}
-for all the configuration options and more documentation about the 
implementation of the different data formats.
+请参考 JavaDoc 文档 {{< javadoc 
file="org/apache/flink/connector/file/sink/FileSink.html" name="FileSink">}}
+来获取所有的配置选项以及更多的不同数据格式实现的详细信息。
+<a name="row-encoded-formats"></a>
 
-#### Row-encoded Formats
+#### 行编码格式

Review comment:
       keep original content ?

##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
 
-Data within the bucket directories is split into part files. Each bucket will 
contain at least one part file for
-each subtask of the sink that has received data for that bucket. Additional 
part files will be created according to the configurable
-rolling policy. For `Row-encoded Formats` (see [File Formats](#file-formats)) 
the default policy rolls part files based
-on size, a timeout that specifies the maximum duration for which a file can be 
open, and a maximum inactivity
-timeout after which the file is closed. For `Bulk-encoded Formats` we roll on 
every checkpoint and the user can
-specify additional conditions based on size or time.
+桶目录中的数据被拆分成多个 Part 文件。对于相应的接收数据的桶的 Sink 的每个 Subtask ，每个桶将至少包含一个 Part 
文件。将根据配置的回滚策略来创建其他 Part 文件。
+对于 `Row-encoded Formats` （参考 [Format Types](#sink-format-types)） 默认的策略是根据 Part 
文件大小进行回滚，需要指定文件打开状态最长时间的超时以及文件关闭后的不活动状态的超时。
+对于 `Bulk-encoded Formats` 我们在每次创建 Checkpoint 时进行回滚，并且用户也可以添加基于大小或者时间等的其他条件。
 
 {{< hint info >}}
 
-**IMPORTANT**: Checkpointing needs to be enabled when using the `FileSink` in 
`STREAMING` mode. Part files
-can only be finalized on successful checkpoints. If checkpointing is disabled, 
part files will forever stay
-in the `in-progress` or the `pending` state, and cannot be safely read by 
downstream systems.
+**重要**: 在 `STREAMING` 模式下使用 `FileSink` 需要开启 Checkpoint 功能。
+文件只在 Checkpoint 成功时生成。如果没有开启 Checkpoint 功能，文件将永远停留在 `in-progress` 或者 `pending` 
的状态，并且下游系统将不能安全读取该文件数据。
 
 {{< /hint >}}
 
 {{< img src="/fig/streamfilesink_bucketing.png"  width="100%" >}}
+<a name="sink-format-types"></a>
 
 ### Format Types
 
-The `FileSink` supports both row-wise and bulk encoding formats, such as 
[Apache Parquet](http://parquet.apache.org).
-These two variants come with their respective builders that can be created 
with the following static methods:
+`FileSink` 不仅支持行编码格式也支持 Bulk 编码格式，例如 [Apache 
Parquet](http://parquet.apache.org)。
+这两种格式可以通过如下的静态方法进行构造：
 
 - Row-encoded sink: `FileSink.forRowFormat(basePath, rowEncoder)`
 - Bulk-encoded sink: `FileSink.forBulkFormat(basePath, bulkWriterFactory)`
 
-When creating either a row or a bulk encoded sink we have to specify the base 
path where the buckets will be
-stored and the encoding logic for our data.
+不论创建行或者 Bulk 格式的 Sink 时，我们都必须指定桶的路径以及对数据进行编码的逻辑。
 
-Please check out the JavaDoc for {{< javadoc 
file="org/apache/flink/connector/file/sink/FileSink.html" name="FileSink">}}
-for all the configuration options and more documentation about the 
implementation of the different data formats.
+请参考 JavaDoc 文档 {{< javadoc 
file="org/apache/flink/connector/file/sink/FileSink.html" name="FileSink">}}
+来获取所有的配置选项以及更多的不同数据格式实现的详细信息。
+<a name="row-encoded-formats"></a>
 
-#### Row-encoded Formats
+#### 行编码格式
 
-Row-encoded formats need to specify an `Encoder`
-that is used for serializing individual rows to the `OutputStream` of the 
in-progress part files.
+行编码格式需要指定一个 `Encoder`，在输出数据到文件过程中它被用来将单个行数据序列化为 `OutputStream`。
 
-In addition to the bucket assigner, the RowFormatBuilder allows the user to 
specify:
+除了桶赋值器，允许用户指定 RowFormatBuilder：

Review comment:
       ```suggestion
   除了 bucket assigner，RowFormatBuilder 还允许用户指定 ：
   ```
   A minor comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on a change in pull request #18718: [FLINK-25782] [docs] Translate datastream filesystem.md page into Chi…

Reply via email to