[hudi] branch asf-site updated: [DOCS] Fix typo in chinese docs (#9465)

danny0405 Fri, 18 Aug 2023 17:53:16 -0700

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new dd93985ecd7 [DOCS] Fix typo in chinese docs (#9465)
dd93985ecd7 is described below

commit dd93985ecd788b5bb0b80717c59b412aebe74838
Author: Sting <zpen...@connect.ust.hk>
AuthorDate: Sat Aug 19 08:52:04 2023 +0800

    [DOCS] Fix typo in chinese docs (#9465)
---
 content/cn/docs/0.9.0/flink-quick-start-guide/index.html                | 2 +-
 content/cn/docs/next/flink-quick-start-guide/index.html                 | 2 +-
 .../docusaurus-plugin-content-docs/current/flink-quick-start-guide.md   | 2 +-
 .../version-0.9.0/flink-quick-start-guide.md                            | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/content/cn/docs/0.9.0/flink-quick-start-guide/index.html 
b/content/cn/docs/0.9.0/flink-quick-start-guide/index.html
index b216a6ba269..d81b51455c5 100644
--- a/content/cn/docs/0.9.0/flink-quick-start-guide/index.html
+++ b/content/cn/docs/0.9.0/flink-quick-start-guide/index.html
@@ -30,7 +30,7 @@ Flink SQL Client 是逐行执行 SQL 的。</p><h3 class="anchor 
anchorWithStick
 运行了 <code>2</code> 个 <code>StreamWriteFunction</code>，那每个 write function 能分到 
<code>2GB</code>，尽量预留一些缓存。因为网络缓存，taskManager 上其他类型的 task (比如 
<code>BucketAssignFunction</code>)也会消耗一些内存</li><li>需要关注 compaction 的内存变化。 
<code>compaction.max_memory</code> 控制了每个 compaction task 读 log 
时可以利用的内存大小。<code>compaction.tasks</code> 控制了 compaction task 的并发</li></ol><h3 
class="anchor anchorWithStickyNavbar_y2LR" id="cow">COW<a class="hash-link" 
href="#cow" title="Direct link to heading"></a></h3><ol><li><a [...]
 运行了 <code>2</code> 个 <code>StreamWriteFunction</code>，那每个 write function 能分到 
<code>2GB</code>，尽量预留一些缓存。因为网络缓存，taskManager 上其他类型的 task （比如 
<code>BucketAssignFunction</code>）也会消耗一些内存</li></ol><h2 class="anchor 
anchorWithStickyNavbar_y2LR" id="离线批量导入">离线批量导入<a class="hash-link" 
href="#离线批量导入" title="Direct link to 
heading"></a></h2><p>针对存量数据导入的需求，如果存量数据来源于其他数据源，可以使用离线批量导入功能（<code>bulk_insert</code>），快速将存量数据导入
 Hudi。</p><div class="admonition admonition-note alert alert--secondary"><div 
clas [...]
 避免 file handle 频繁切换导致性能下降。</p></div></div><div class="admonition 
admonition-note alert alert--secondary"><div 
class="admonition-heading"><h5><span class="admonition-icon"><svg 
xmlns="http://www.w3.org/2000/svg"; width="14" height="16" viewBox="0 0 14 
16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 
1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 
.52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 
0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.6 [...]
-当然每个 bucket 在写到文件大小上限（parquet 120 MB）的时候会回滚到新的文件句柄，所以最后：写文件数量 &gt;= <a 
href="#%E5%B9%B6%E8%A1%8C%E5%BA%A6"><code>write.bucket_assign.tasks</code></a>。</p></div></div><h3
 class="anchor anchorWithStickyNavbar_y2LR" id="参数">参数<a class="hash-link" 
href="#参数" title="Direct link to 
heading"></a></h3><table><thead><tr><th>名称</th><th>Required</th><th>默认值</th><th>备注</th></tr></thead><tbody><tr><td><code>write.operation</code></td><td><code>true</code></td><td><code>upsert</code></td><td>开启
 <code [...]
+当然每个 bucket 在写到文件大小上限（parquet 120 MB）的时候会回滚到新的文件句柄，所以最后：写文件数量 &gt;= <a 
href="#%E5%B9%B6%E8%A1%8C%E5%BA%A6"><code>write.bucket_assign.tasks</code></a>。</p></div></div><h3
 class="anchor anchorWithStickyNavbar_y2LR" id="参数">参数<a class="hash-link" 
href="#参数" title="Direct link to 
heading"></a></h3><table><thead><tr><th>名称</th><th>Required</th><th>默认值</th><th>备注</th></tr></thead><tbody><tr><td><code>write.operation</code></td><td><code>true</code></td><td><code>upsert</code></td><td>开启
 <code [...]
 通过行存原生支持保留消息的所有变更（format 层面的集成），通过流读 MOR 表可以消费到所有的变更记录。</p><h3 class="anchor 
anchorWithStickyNavbar_y2LR" id="参数-2">参数<a class="hash-link" href="#参数-2" 
title="Direct link to 
heading"></a></h3><table><thead><tr><th>名称</th><th>Required</th><th>默认值</th><th>备注</th></tr></thead><tbody><tr><td><code>changelog.enabled</code></td><td><code>false</code></td><td><code>false</code></td><td>默认是关闭状态，即
 <code>UPSERT</code> 语义，所有的消息仅保证最后一条合并消息，中间的变更可能会被 merge 掉；改成 <code>true</code> 
支持消费所有变更</td></tr></ [...]
 只能读到最后一条记录。当然，通过调整压缩的缓存时间可以预留一定的时间缓冲给 reader，比如调整压缩的两个参数：<a 
href="#compaction"><code>compaction.delta_commits</code></a> and <a 
href="#compaction"><code>compaction.delta_seconds</code></a>。</p></div></div><h2
 class="anchor anchorWithStickyNavbar_y2LR" id="insert-模式">Insert 模式<a 
class="hash-link" href="#insert-模式" title="Direct link to 
heading"></a></h2><p>当前 Hudi 对于 <code>Insert 模式</code> 默认会采用小文件策略：MOR 会追加写 
avro log 文件，COW 会不断合并之前的 parquet 文件（并且增量的数据会去重），这样会导致性能下降。</p><p>如果想关闭文件合并，可以设置 
[...]
 如果想打入 Hive 的依赖，需要显示指定 Profile 为 <code>flink-bundle-shade-hive</code>。执行以下命令打入 
Hive 依赖：</p><div class="codeBlockContainer_J+bg language-bash 
theme-code-block"><div class="codeBlockContent_csEI bash"><pre tabindex="0" 
class="prism-code language-bash codeBlock_rtdJ thin-scrollbar" 
style="color:#F8F8F2;background-color:#282A36"><code 
class="codeBlockLines_1zSZ"><span class="token-line" 
style="color:#F8F8F2"><span class="token comment" style="color:rgb(98, 114, 
164)"># Maven 打包命令</span><span  [...]
diff --git a/content/cn/docs/next/flink-quick-start-guide/index.html 
b/content/cn/docs/next/flink-quick-start-guide/index.html
index 40b8481fd91..940bfd1caed 100644
--- a/content/cn/docs/next/flink-quick-start-guide/index.html
+++ b/content/cn/docs/next/flink-quick-start-guide/index.html
@@ -30,7 +30,7 @@ Flink SQL Client 是逐行执行 SQL 的。</p><h3 class="anchor 
anchorWithStick
 运行了 <code>2</code> 个 <code>StreamWriteFunction</code>，那每个 write function 能分到 
<code>2GB</code>，尽量预留一些缓存。因为网络缓存，taskManager 上其他类型的 task (比如 
<code>BucketAssignFunction</code>)也会消耗一些内存</li><li>需要关注 compaction 的内存变化。 
<code>compaction.max_memory</code> 控制了每个 compaction task 读 log 
时可以利用的内存大小。<code>compaction.tasks</code> 控制了 compaction task 的并发</li></ol><h3 
class="anchor anchorWithStickyNavbar_y2LR" id="cow">COW<a class="hash-link" 
href="#cow" title="Direct link to heading"></a></h3><ol><li><a [...]
 运行了 <code>2</code> 个 <code>StreamWriteFunction</code>，那每个 write function 能分到 
<code>2GB</code>，尽量预留一些缓存。因为网络缓存，taskManager 上其他类型的 task （比如 
<code>BucketAssignFunction</code>）也会消耗一些内存</li></ol><h2 class="anchor 
anchorWithStickyNavbar_y2LR" id="离线批量导入">离线批量导入<a class="hash-link" 
href="#离线批量导入" title="Direct link to 
heading"></a></h2><p>针对存量数据导入的需求，如果存量数据来源于其他数据源，可以使用离线批量导入功能（<code>bulk_insert</code>），快速将存量数据导入
 Hudi。</p><div class="admonition admonition-note alert alert--secondary"><div 
clas [...]
 避免 file handle 频繁切换导致性能下降。</p></div></div><div class="admonition 
admonition-note alert alert--secondary"><div 
class="admonition-heading"><h5><span class="admonition-icon"><svg 
xmlns="http://www.w3.org/2000/svg"; width="14" height="16" viewBox="0 0 14 
16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 
1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 
.52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 
0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.6 [...]
-当然每个 bucket 在写到文件大小上限（parquet 120 MB）的时候会回滚到新的文件句柄，所以最后：写文件数量 &gt;= <a 
href="#%E5%B9%B6%E8%A1%8C%E5%BA%A6"><code>write.bucket_assign.tasks</code></a>。</p></div></div><h3
 class="anchor anchorWithStickyNavbar_y2LR" id="参数">参数<a class="hash-link" 
href="#参数" title="Direct link to 
heading"></a></h3><table><thead><tr><th>名称</th><th>Required</th><th>默认值</th><th>备注</th></tr></thead><tbody><tr><td><code>write.operation</code></td><td><code>true</code></td><td><code>upsert</code></td><td>开启
 <code [...]
+当然每个 bucket 在写到文件大小上限（parquet 120 MB）的时候会回滚到新的文件句柄，所以最后：写文件数量 &gt;= <a 
href="#%E5%B9%B6%E8%A1%8C%E5%BA%A6"><code>write.bucket_assign.tasks</code></a>。</p></div></div><h3
 class="anchor anchorWithStickyNavbar_y2LR" id="参数">参数<a class="hash-link" 
href="#参数" title="Direct link to 
heading"></a></h3><table><thead><tr><th>名称</th><th>Required</th><th>默认值</th><th>备注</th></tr></thead><tbody><tr><td><code>write.operation</code></td><td><code>true</code></td><td><code>upsert</code></td><td>开启
 <code [...]
 通过行存原生支持保留消息的所有变更（format 层面的集成），通过流读 MOR 表可以消费到所有的变更记录。</p><h3 class="anchor 
anchorWithStickyNavbar_y2LR" id="参数-2">参数<a class="hash-link" href="#参数-2" 
title="Direct link to 
heading"></a></h3><table><thead><tr><th>名称</th><th>Required</th><th>默认值</th><th>备注</th></tr></thead><tbody><tr><td><code>changelog.enabled</code></td><td><code>false</code></td><td><code>false</code></td><td>默认是关闭状态，即
 <code>UPSERT</code> 语义，所有的消息仅保证最后一条合并消息，中间的变更可能会被 merge 掉；改成 <code>true</code> 
支持消费所有变更</td></tr></ [...]
 只能读到最后一条记录。当然，通过调整压缩的缓存时间可以预留一定的时间缓冲给 reader，比如调整压缩的两个参数：<a 
href="#compaction"><code>compaction.delta_commits</code></a> and <a 
href="#compaction"><code>compaction.delta_seconds</code></a>。</p></div></div><h2
 class="anchor anchorWithStickyNavbar_y2LR" id="insert-模式">Insert 模式<a 
class="hash-link" href="#insert-模式" title="Direct link to 
heading"></a></h2><p>当前 Hudi 对于 <code>Insert 模式</code> 默认会采用小文件策略：MOR 会追加写 
avro log 文件，COW 会不断合并之前的 parquet 文件（并且增量的数据会去重），这样会导致性能下降。</p><p>如果想关闭文件合并，可以设置 
[...]
 如果想打入 Hive 的依赖，需要显示指定 Profile 为 <code>flink-bundle-shade-hive</code>。执行以下命令打入 
Hive 依赖：</p><div class="codeBlockContainer_J+bg language-bash 
theme-code-block"><div class="codeBlockContent_csEI bash"><pre tabindex="0" 
class="prism-code language-bash codeBlock_rtdJ thin-scrollbar" 
style="color:#F8F8F2;background-color:#282A36"><code 
class="codeBlockLines_1zSZ"><span class="token-line" 
style="color:#F8F8F2"><span class="token comment" style="color:rgb(98, 114, 
164)"># Maven 打包命令</span><span  [...]
diff --git 
a/website/i18n/cn/docusaurus-plugin-content-docs/current/flink-quick-start-guide.md
 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/flink-quick-start-guide.md
index 79a150d7aff..0a811ed8105 100644
--- 
a/website/i18n/cn/docusaurus-plugin-content-docs/current/flink-quick-start-guide.md
+++ 
b/website/i18n/cn/docusaurus-plugin-content-docs/current/flink-quick-start-guide.md
@@ -307,7 +307,7 @@ select * from t1;
 :::note
 1. 索引加载是阻塞式，所以在索引加载过程中 Checkpoint 无法完成
 2. 索引加载由数据流触发，需要确保每个 partition 都至少有1条数据，即上游 source 有数据进来
-3. 索引加载为并发加载，根据数据量大小加载时间不同，可以在log中搜索 `finish loading the index under 
partition` 和 `Load record form file` 日志内容来观察索引加载的进
+3. 索引加载为并发加载，根据数据量大小加载时间不同，可以在log中搜索 `finish loading the index under 
partition` 和 `Load record form file` 日志内容来观察索引加载的进度
 4. 第一次 Checkpoint 成功就表示索引已经加载完成，后续从 Checkpoint 恢复时无需再次加载索引
 :::
 
diff --git 
a/website/i18n/cn/docusaurus-plugin-content-docs/version-0.9.0/flink-quick-start-guide.md
 
b/website/i18n/cn/docusaurus-plugin-content-docs/version-0.9.0/flink-quick-start-guide.md
index 79a150d7aff..0a811ed8105 100644
--- 
a/website/i18n/cn/docusaurus-plugin-content-docs/version-0.9.0/flink-quick-start-guide.md
+++ 
b/website/i18n/cn/docusaurus-plugin-content-docs/version-0.9.0/flink-quick-start-guide.md
@@ -307,7 +307,7 @@ select * from t1;
 :::note
 1. 索引加载是阻塞式，所以在索引加载过程中 Checkpoint 无法完成
 2. 索引加载由数据流触发，需要确保每个 partition 都至少有1条数据，即上游 source 有数据进来
-3. 索引加载为并发加载，根据数据量大小加载时间不同，可以在log中搜索 `finish loading the index under 
partition` 和 `Load record form file` 日志内容来观察索引加载的进
+3. 索引加载为并发加载，根据数据量大小加载时间不同，可以在log中搜索 `finish loading the index under 
partition` 和 `Load record form file` 日志内容来观察索引加载的进度
 4. 第一次 Checkpoint 成功就表示索引已经加载完成，后续从 Checkpoint 恢复时无需再次加载索引
 :::

[hudi] branch asf-site updated: [DOCS] Fix typo in chinese docs (#9465)

Reply via email to