[GitHub] [flink] wuchong commented on a change in pull request #8897: [FLINK-12943][docs]Translate HDFS Connector page into Chinese

GitBox Sat, 29 Jun 2019 06:40:53 -0700

wuchong commented on a change in pull request #8897: 
[FLINK-12943][docs]Translate HDFS Connector page into Chinese
URL: https://github.com/apache/flink/pull/8897#discussion_r298799798


 ##########
 File path: docs/dev/connectors/filesystem_sink.zh.md
 ##########
 @@ -65,40 +59,34 @@ input.addSink(new BucketingSink[String]("/base/path"))
 </div>
 </div>
 
-The only required parameter is the base path where the buckets will be
-stored. The sink can be further configured by specifying a custom bucketer, 
writer and batch size.
-
-By default the bucketing sink will split by the current system time when 
elements arrive and will
-use the datetime pattern `"yyyy-MM-dd--HH"` to name the buckets. This pattern 
is passed to
-`DateTimeFormatter` with the current system time and JVM's default timezone to 
form a bucket path.
-Users can also specify a timezone for the bucketer to format bucket path. A 
new bucket will be created
-whenever a new date is encountered. For example, if you have a pattern that 
contains minutes as the
-finest granularity you will get a new bucket every minute. Each bucket is 
itself a directory that
-contains several part files: each parallel instance of the sink will create 
its own part file and
-when part files get too big the sink will also create a new part file next to 
the others. When a
-bucket becomes inactive, the open part file will be flushed and closed. A 
bucket is regarded as
-inactive when it hasn't been written to recently. By default, the sink checks 
for inactive buckets
-every minute, and closes any buckets which haven't been written to for over a 
minute. This
-behaviour can be configured with `setInactiveBucketCheckInterval()` and
-`setInactiveBucketThreshold()` on a `BucketingSink`.
-
-You can also specify a custom bucketer by using `setBucketer()` on a 
`BucketingSink`. If desired,
-the bucketer can use a property of the element or tuple to determine the 
bucket directory.
+初始化时只需要一个参数，这个参数表示分桶文件存储的路径。分桶 sink 可以通过指定自定义的 bucketer、 writer 和 batch 值进一步配置。
+
+默认情况下，当数据到来时，分桶 sink 会按照系统时间对数据进行切分，并以 `"yyyy-MM-dd--HH"` 的时间格式给每个桶命名。然后 
+`DateTimeFormatter` 按照 `"yyyy-MM-dd--HH"` 和 JVM 
的默认时区格式换当前时间生成分桶的路径。用户可以自定义时区来生成
+分桶的路径。每遇到一个新的日期都会产生一个新的桶。例如，如果时间的格式以分钟为粒度，那么每分钟都会产生一个桶。每个桶都是一个目录，
+目录下包含了几个部分文件：每个 sink 的并发实例都会创建一个部分文件，另外，当这些文件大小太大的时候，sink会产生新的部分文件。
+当一个桶不再活跃时，打开的部分文件会刷盘并且关闭。如果一个桶最近一段时间都没有写入，那么这个桶被认为是不活跃的。sink 默认会每分钟
+检查不活跃的桶、关闭那些超过一分钟没有写入的桶。这些行为可以通过 `BucketingSink` 的 
`setInactiveBucketCheckInterval()` 
+和 `setInactiveBucketThreshold()` 进行设置。
+
+可以调用`BucketingSink` 的 `setBucketer()` 方法指定自定义的 bucketer，如果需要的话，也可以使用一个元素或者元组属性来
+决定桶的路径。
 
 The default writer is `StringWriter`. This will call `toString()` on the 
incoming elements
-and write them to part files, separated by newline. To specify a custom writer 
use `setWriter()`
-on a `BucketingSink`. If you want to write Hadoop SequenceFiles you can use 
the provided
-`SequenceFileWriter` which can also be configured to use compression.
+默认的 writer 是 `StringWriter`。数据到达时，通过 `toString()` 
方法得到内容，内容以换行符分隔，`StringWriter` 将数据
+内容写入部分文件。可以通过 `BucketingSink` 的 `setWriter()` 指定自定义的 
writer。`SequenceFileWriter` 支持写入 Hadoop
+SequenceFiles，并且可以配置是否开启压缩。
 
 There are two configuration options that specify when a part file should be 
closed
+关闭部分文件和打开新部分文件的时机可以通过两个配置来确定：
 and a new one started:
 
 Review comment:
   原文未删除

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] wuchong commented on a change in pull request #8897: [FLINK-12943][docs]Translate HDFS Connector page into Chinese

Reply via email to