[hudi] branch asf-site updated: [DOCS] Fix the "Edit this page" config and add 6 cn docs. (#3859)

yihua Wed, 15 Dec 2021 17:06:32 -0800

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 914c47a  [DOCS] Fix the "Edit this page" config and add 6 cn docs. 
(#3859)
914c47a is described below

commit 914c47ac1d436c98b24cef2ebced074962ac157e
Author: Laurie Li <11391675+laurieliy...@users.noreply.github.com>
AuthorDate: Thu Dec 16 09:03:25 2021 +0800

    [DOCS] Fix the "Edit this page" config and add 6 cn docs. (#3859)
---
 website/docusaurus.config.js                       |  9 +++-
 .../current/gcs_hoodie.md                          | 24 +++++-----
 .../current/ibm_cos_hoodie.md                      | 28 ++++++------
 .../current/migration_guide.md                     | 52 +++++++++-------------
 .../current/privacy.md                             | 26 +++++------
 .../current/s3_hoodie.md                           | 30 ++++++-------
 6 files changed, 81 insertions(+), 88 deletions(-)

diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index 0b387af..772e5f8 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -408,8 +408,13 @@ module.exports = {
         docs: {
           sidebarPath: require.resolve('./sidebars.js'),
           // Please change this to your repo.
-          editUrl:
-            'https://github.com/apache/hudi/edit/asf-site/website/docs/',
+          editUrl: ({ version, versionDocsDirPath, docPath, locale }) => {
+            if (locale != this.defaultLocale) {
+              return 
`https://github.com/apache/hudi/tree/asf-site/website/${versionDocsDirPath}/${docPath}`
+            } else {
+              return 
`https://github.com/apache/hudi/tree/asf-site/website/i18n/${locale}/docusaurus-plugin-content-${versionDocsDirPath}/${version}/${docPath}`
+            }
+          },
           includeCurrentVersion: true,
           versions: {
             current: {
diff --git 
a/website/i18n/cn/docusaurus-plugin-content-docs/current/gcs_hoodie.md 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/gcs_hoodie.md
index fd45664..7906b74 100644
--- a/website/i18n/cn/docusaurus-plugin-content-docs/current/gcs_hoodie.md
+++ b/website/i18n/cn/docusaurus-plugin-content-docs/current/gcs_hoodie.md
@@ -1,22 +1,22 @@
 ---
-title: GCS Filesystem
-keywords: [ hudi, hive, google cloud, storage, spark, presto]
-summary: In this page, we go over how to configure hudi with Google Cloud 
Storage.
+title: GCS 文件系统
+keywords: [ hudi, hive, google cloud, storage, spark, presto, 存储 ]
+summary: 在本页中，我们探讨如何在 Google Cloud Storage 中配置 Hudi。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
-For Hudi storage on GCS, **regional** buckets provide an DFS API with strong 
consistency.
+对于存储在 GCS 上的 Hudi ， **区域** Bucket 提供了带有强一致性的 DFS API 。
 
-## GCS Configs
+## GCS 配置
 
-There are two configurations required for Hudi GCS compatibility:
+Hudi 的 GCS 适配需要两项配置：
 
-- Adding GCS Credentials for Hudi
-- Adding required jars to classpath
+- 为 Hudi 添加 GCS 凭证
+- 将需要的 jar 包添加到类路径
 
-### GCS Credentials
+### GCS 凭证
 
-Add the required configs in your core-site.xml from where Hudi can fetch them. 
Replace the `fs.defaultFS` with your GCS bucket name and Hudi should be able to 
read/write from the bucket.
+在你的 core-site.xml 文件中添加必要的配置，Hudi 将从那里获取这些配置。 用你的 GCS 分区名称替换掉 `fs.defaultFS` 
，以便 Hudi 能够在 Bucket 中读取/写入。
 
 ```xml
   <property>
@@ -54,8 +54,8 @@ Add the required configs in your core-site.xml from where 
Hudi can fetch them. R
   </property>
 ```
 
-### GCS Libs
+### GCS 库
 
-GCS hadoop libraries to add to our classpath
+将 GCS Hadoop 库添加到我们的类路径
 
 - com.google.cloud.bigdataoss:gcs-connector:1.6.0-hadoop2
diff --git 
a/website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md
index d7749e6..b93841e 100644
--- a/website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md
+++ b/website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md
@@ -1,26 +1,26 @@
 ---
-title: IBM Cloud Object Storage Filesystem
+title: IBM Cloud Object Storage 文件系统
 keywords: [ hudi, hive, ibm, cos, spark, presto]
-summary: In this page, we go over how to configure Hudi with IBM Cloud Object 
Storage filesystem.
+summary: 在本页中，我们讨论在 IBM Cloud Object Storage 文件系统中配置 Hudi 。
 last_modified_at: 2020-10-01T11:38:24-10:00
 language: cn
 ---
-In this page, we explain how to get your Hudi spark job to store into IBM 
Cloud Object Storage.
+在本页中，我们解释如何将你的 Hudi Spark 作业存储到 IBM Cloud Object Storage 当中。
 
-## IBM COS configs
+## IBM COS 配置
 
-There are two configurations required for Hudi-IBM Cloud Object Storage 
compatibility:
+Hudi 适配 IBM Cloud Object Storage 需要两项配置：
 
-- Adding IBM COS Credentials for Hudi
-- Adding required Jars to classpath
+- 为 Hudi 添加 IBM COS 凭证
+- 添加需要的 jar 包到类路径
 
-### IBM Cloud Object Storage Credentials
+### IBM Cloud Object Storage 凭证
 
-Simplest way to use Hudi with IBM Cloud Object Storage, is to configure your 
`SparkSession` or `SparkContext` with IBM Cloud Object Storage credentials 
using [Stocator](https://github.com/CODAIT/stocator) storage connector for 
Spark. Hudi will automatically pick this up and talk to IBM Cloud Object 
Storage.
+在 IBM Cloud Object Storage 上使用 Hudi 的最简单的办法，就是使用 
[Stocator](https://github.com/CODAIT/stocator) 的 Spark 存储连接器为 `SparkSession` 或 
`SparkContext` 配置 IBM Cloud Object Storage 凭证。 Hudi 将自动拾取配置并告知 IBM Cloud Object 
Storage 。
 
-Alternatively, add the required configs in your core-site.xml from where Hudi 
can fetch them. Replace the `fs.defaultFS` with your IBM Cloud Object Storage 
bucket name and Hudi should be able to read/write from the bucket.
+或者，向你的 core-site.xml 文件中添加必要的配置，Hudi 可以从那里获取这些配置。用你的 IBM Cloud Object Storage 
的 Bucket 名称替换 `fs.defaultFS` 以便 Hudi 能够在 Bucket 中读取/写入。
 
-For example, using HMAC keys and service name `myCOS`:
+例如，使用 HMAC 密钥以及服务名 `myCOS` ：
 ```xml
   <property>
       <name>fs.defaultFS</name>
@@ -69,10 +69,10 @@ For example, using HMAC keys and service name `myCOS`:
 
 ```
 
-For more options see Stocator 
[documentation](https://github.com/CODAIT/stocator/blob/master/README.md).
+更多信息请参考 Stocator 
[文档](https://github.com/CODAIT/stocator/blob/master/README.md) 。
 
-### IBM Cloud Object Storage Libs
+### IBM Cloud Object Storage 库
 
-IBM Cloud Object Storage hadoop libraries to add to our classpath
+将 IBM Cloud Object Storage Hadoop 库添加到我们的类路径中：
 
  - com.ibm.stocator:stocator:1.1.3
diff --git 
a/website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
index 5df3c18..c7b61ca 100644
--- a/website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
+++ b/website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
@@ -1,47 +1,37 @@
 ---
-title: Migration Guide
-keywords: [ hudi, migration, use case]
-summary: In this page, we will discuss some available tools for migrating your 
existing dataset into a Hudi dataset
+title: 迁移指南
+keywords: [ hudi, migration, use case, 迁移, 用例]
+summary: 在本页中，我们将讨论有效的工具，他们能将你的现有数据集迁移到 Hudi 数据集。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
-Hudi maintains metadata such as commit timeline and indexes to manage a 
dataset. The commit timelines helps to understand the actions happening on a 
dataset as well as the current state of a dataset. Indexes are used by Hudi to 
maintain a record key to file id mapping to efficiently locate a record. At the 
moment, Hudi supports writing only parquet columnar formats.
-To be able to start using Hudi for your existing dataset, you will need to 
migrate your existing dataset into a Hudi managed dataset. There are a couple 
of ways to achieve this.
+Hudi 维护了元数据，包括提交的时间线和索引，来管理一个数据集。提交的时间线帮助理解一个数据集上发生的操作，以及数据集的当前状态。索引则被 Hudi 
用来维护记录键到文件 ID 的映射，它能高效地定位一条记录。目前， Hudi 仅支持写 Parquet 列式格式 。
 
+为了在你的现有数据集上开始使用 Hudi ，你需要将你的现有数据集迁移到 Hudi 管理的数据集中。以下有多种方法实现这个目的。
 
-## Approaches
 
+## 方法
 
-### Use Hudi for new partitions alone
 
-Hudi can be used to manage an existing dataset without affecting/altering the 
historical data already present in the
-dataset. Hudi has been implemented to be compatible with such a mixed dataset 
with a caveat that either the complete
-Hive partition is Hudi managed or not. Thus the lowest granularity at which 
Hudi manages a dataset is a Hive
-partition. Start using the datasource API or the WriteClient to write to the 
dataset and make sure you start writing
-to a new partition or convert your last N partitions into Hudi instead of the 
entire table. Note, since the historical
- partitions are not managed by HUDI, none of the primitives provided by HUDI 
work on the data in those partitions. More concretely, one cannot perform 
upserts or incremental pull on such older partitions not managed by the HUDI 
dataset.
-Take this approach if your dataset is an append only type of dataset and you 
do not expect to perform any updates to existing (or non Hudi managed) 
partitions.
+### 将 Hudi 仅用于新分区
 
+Hudi 可以被用来在不影响/改变数据集历史数据的情况下管理一个现有的数据集。 Hudi 已经实现兼容这样的数据集，需要注意的是，单个 Hive 
分区要么完全由 Hudi 管理，要么不由 Hudi 管理。因此， Hudi 管理一个数据集的最低粒度是一个 Hive 分区。使用数据源 API 或 
WriteClient 来写入数据集，并确保你开始写入的是一个新分区，或者将过去的 N 个分区而非整张表转换为 Hudi 。需要注意的是，由于历史分区不是由 
Hudi 管理的， Hudi 提供的任何操作在那些分区上都不生效。更具体地说，无法在这些非 Hudi 管理的旧分区上进行插入更新或增量拉取。
 
-### Convert existing dataset to Hudi
+如果你的数据集是追加型的数据集，并且你不指望在已经存在的（或者非 Hudi 管理的）分区上进行更新操作，就使用这个方法。
 
-Import your existing dataset into a Hudi managed dataset. Since all the data 
is Hudi managed, none of the limitations
- of Approach 1 apply here. Updates spanning any partitions can be applied to 
this dataset and Hudi will efficiently
- make the update available to queries. Note that not only do you get to use 
all Hudi primitives on this dataset,
- there are other additional advantages of doing this. Hudi automatically 
manages file sizes of a Hudi managed dataset
- . You can define the desired file size when converting this dataset and Hudi 
will ensure it writes out files
- adhering to the config. It will also ensure that smaller files later get 
corrected by routing some new inserts into
- small files rather than writing new small ones thus maintaining the health of 
your cluster.
+### 将现有的数据集转换为 Hudi
 
-There are a few options when choosing this approach.
+将你的现有数据集导入到一个 Hudi 管理的数据集。由于全部数据都是 Hudi 管理的，方法 1 
的任何限制在这里都不适用。跨分区的更新可以被应用到这个数据集，而 Hudi 会高效地让这些更新对查询可用。值得注意的是，你不仅可以在这个数据集上使用所有 
Hudi 提供的操作，这样做还有额外的好处。 Hudi 会自动管理受管数据集的文件大小。你可以在转换数据集的时候设置期望的文件大小， Hudi 
将确保它写出的文件符合这个配置。Hudi 
还会确保小文件在后续被修正，这个过程是通过将新的插入引导到这些小文件而不是写入新的小文件来实现的，这样能维持你的集群的健康度。
 
-**Option 1**
-Use the HDFSParquetImporter tool. As the name suggests, this only works if 
your existing dataset is in parquet file format.
-This tool essentially starts a Spark Job to read the existing parquet dataset 
and converts it into a HUDI managed dataset by re-writing all the data.
+选择这个方法后，有几种选择。
 
-**Option 2**
-For huge datasets, this could be as simple as : 
+**选择 1**
+使用 HDFSParquetImporter 工具。正如名字表明的那样，这仅仅适用于你的现有数据集是 Parquet 文件格式的。
+这个工具本质上是启动一个 Spark 作业来读取现有的 Parquet 数据集，并通过重写全部记录的方式将它转换为 HUDI 管理的数据集。
+
+**选择 2**
+对于大数据集，这可以简单地： 
 ```java
 for partition in [list of partitions in source dataset] {
         val inputDF = 
spark.read.format("any_input_format").load("partition_path")
@@ -49,10 +39,8 @@ for partition in [list of partitions in source dataset] {
 }
 ```  
 
-**Option 3**
-Write your own custom logic of how to load an existing dataset into a Hudi 
managed one. Please read about the RDD API
- [here](/cn/docs/quick-start-guide). Using the HDFSParquetImporter Tool. Once 
hudi has been built via `mvn clean install -DskipTests`, the shell can be
-fired by via `cd hudi-cli && ./hudi-cli.sh`.
+**选择 3**
+写下你自定义的逻辑来定义如何将现有数据集加载到一个 Hudi 管理的数据集中。请在 [这里](/cn/docs/quick-start-guide) 阅读 
RDD API 的相关资料。使用 HDFSParquetImporter 工具。一旦 Hudi 通过 `mvn clean install 
-DskipTests` 被构建了， Shell 将被 `cd hudi-cli && ./hudi-cli.sh` 调启。
 
 ```java
 hudi->hdfsparquetimport
diff --git a/website/i18n/cn/docusaurus-plugin-content-docs/current/privacy.md 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/privacy.md
index a6bde57..afa167d 100644
--- a/website/i18n/cn/docusaurus-plugin-content-docs/current/privacy.md
+++ b/website/i18n/cn/docusaurus-plugin-content-docs/current/privacy.md
@@ -1,23 +1,23 @@
 ---
-title: Privacy Policy
-keywords: [ hudi, privacy]
+title: 隐私协议
+keywords: [ hudi, privacy, 隐私]
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
-Information about your use of this website is collected using server access 
logs and a tracking cookie.
-The collected information consists of the following:
+关于你使用本网站的信息，将通过服务器访问日志和 Cookie 跟踪被收集。
+收集的信息由以下内容构成：
 
-* The IP address from which you access the website;
-* The type of browser and operating system you use to access our site;
-* The date and time you access our site;
-* The pages you visit;
-* The addresses of pages from where you followed a link to our site.
+* 你访问网站使用的 IP 地址；
+* 你访问我们的网站时使用的浏览器和操作系统；
+* 你访问我们的网站的日期和时间；
+* 你浏览的页面；
+* 引导你链接到我们的网站的页面地址；
 
-Part of this information is gathered using a tracking cookie set by the 
[Google Analytics](http://www.google.com/analytics) service and handled by 
Google as described in their [privacy policy](http://www.google.com/privacy). 
See your browser documentation for instructions on how to disable the cookie if 
you prefer not to share this data with Google.
+这些信息中的一部分将使用由 [Google Analytics](http://www.google.com/analytics) 服务设置的 Cookie 
跟踪进行收集，并由 Google 按照在他们的 [隐私协议](http://www.google.com/privacy) 
中描述的方式进行处理。如果你不希望与 Google 分享这些数据，请参考你的浏览器文档中关于如何禁用 Cookie 的说明。
 
-We use the gathered information to help us make our site more useful to 
visitors and to better understand how and when our site is used. We do not 
track or collect personally identifiable information or associate gathered data 
with any personally identifying information from other sources.
+我们使用收集的数据来帮助让我们的网站对访问者更有用，并更好地了解我们的网站是如何、在何时被使用的。我们不跟踪也不收集个人隐私信息，同时也不与任何收集包含个人隐私数据的数据源合作。
 
-By using this website, you consent to the collection of this data in the 
manner and for the purpose described above.
+使用本网站，即代表你许可以上述的方式和目的收集这些数据。
 
-The Hudi development community welcomes your questions or comments regarding 
this Privacy Policy. Send them to d...@hudi.apache.org
+Hudi 开发者社区欢迎你提出关于本隐私协议的问题或评论。请将他们发送至 d...@hudi.apache.org 。
diff --git 
a/website/i18n/cn/docusaurus-plugin-content-docs/current/s3_hoodie.md 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/s3_hoodie.md
index ebf8681..1e0b329 100644
--- a/website/i18n/cn/docusaurus-plugin-content-docs/current/s3_hoodie.md
+++ b/website/i18n/cn/docusaurus-plugin-content-docs/current/s3_hoodie.md
@@ -1,24 +1,25 @@
 ---
-title: S3 Filesystem
+title: S3 文件系统
 keywords: [ hudi, hive, aws, s3, spark, presto]
-summary: In this page, we go over how to configure Hudi with S3 filesystem.
+summary: 在本页中，我们将讨论如何在 S3 文件系统中配置 Hudi 。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
-In this page, we explain how to get your Hudi spark job to store into AWS S3.
+在本页中，我们将解释如何让你的 Hudi Spark 作业存储到 AWS S3 。
 
-## AWS configs
+## AWS 配置
 
-There are two configurations required for Hudi-S3 compatibility:
 
-- Adding AWS Credentials for Hudi
-- Adding required Jars to classpath
+Hudi 与 S3 的适配需要两项配置：
 
-### AWS Credentials
+- 为 Hudi 加 AWS 凭证
+- 将需要的 jar 包添加到类路径
 
-Simplest way to use Hudi with S3, is to configure your `SparkSession` or 
`SparkContext` with S3 credentials. Hudi will automatically pick this up and 
talk to S3.
+### AWS 凭证
 
-Alternatively, add the required configs in your core-site.xml from where Hudi 
can fetch them. Replace the `fs.defaultFS` with your S3 bucket name and Hudi 
should be able to read/write from the bucket.
+在 S3 上使用 Hudi 的最简单的办法，是为你的 `SparkSession` 或 `SparkContext` 设置 S3 凭证。 Hudi 
将自动拾取并通知 S3 。
+
+或者，将需要的配置添加到你的 core-site.xml 文件中， Hudi 可以从那里获取它们。用你的 S3 Bucket 名称替换 
`fs.defaultFS` ，之后 Hudi 应该能够从 Bucket 中读取/写入.
 
 ```xml
   <property>
@@ -53,8 +54,7 @@ Alternatively, add the required configs in your core-site.xml 
from where Hudi ca
 ```
 
 
-Utilities such as hudi-cli or deltastreamer tool, can pick up s3 creds via 
environmental variable prefixed with `HOODIE_ENV_`. For e.g below is a bash 
snippet to setup
-such variables and then have cli be able to work on datasets stored in s3
+`hudi-cli` 或 DeltaStreamer 这些工具集能通过 `HOODIE_ENV_` 
前缀的环境变量拾取。以下是一个作为示例的基础代码片段，它设置了这些变量并让 CLI 能够在保存在 S3 上的数据集上工作。
 
 ```java
 export HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key=$accessKey
@@ -68,14 +68,14 @@ export 
HOODIE_ENV_fs_DOT_s3n_DOT_impl=org.apache.hadoop.fs.s3a.S3AFileSystem
 
 
 
-### AWS Libs
+### AWS 库
 
-AWS hadoop libraries to add to our classpath
+将 AWS Hadoop 库添加到我们的类路径。
 
  - com.amazonaws:aws-java-sdk:1.10.34
  - org.apache.hadoop:hadoop-aws:2.7.3
 
-AWS glue data libraries are needed if AWS glue data is used
+如果使用了 AWS Glue 的数据，则需要 AWS Glue 库。
 
  - com.amazonaws.glue:aws-glue-datacatalog-hive2-client:1.11.0
  - com.amazonaws:aws-java-sdk-glue:1.11.475
\ No newline at end of file

[hudi] branch asf-site updated: [DOCS] Fix the "Edit this page" config and add 6 cn docs. (#3859)

Reply via email to