[kylin] branch document updated: KYLIN-4715 Wrong function with kylin document about how to optimize cube build

xxyu Wed, 26 Aug 2020 19:13:49 -0700

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git



The following commit(s) were added to refs/heads/document by this push:
     new 23a7e3c  KYLIN-4715 Wrong function with kylin document about how to 
optimize cube build
23a7e3c is described below

commit 23a7e3c9ada4ad5f298302553e9661c36b235e3d
Author: rupengwang <wangrup...@live.cn>
AuthorDate: Tue Aug 25 11:54:39 2020 +0800

    KYLIN-4715 Wrong function with kylin document about how to optimize cube 
build
---
 website/_docs/howto/howto_optimize_build.cn.md   | 2 +-
 website/_docs/howto/howto_optimize_build.md      | 2 +-
 website/_docs/tutorial/cube_spark.cn.md          | 1 +
 website/_docs/tutorial/cube_spark.md             | 4 ++--
 website/_docs24/howto/howto_optimize_build.cn.md | 2 +-
 website/_docs24/howto/howto_optimize_build.md    | 2 +-
 website/_docs30/howto/howto_optimize_build.cn.md | 2 +-
 website/_docs30/howto/howto_optimize_build.md    | 2 +-
 website/_docs30/tutorial/cube_spark.cn.md        | 3 ++-
 website/_docs30/tutorial/cube_spark.md           | 2 +-
 10 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/website/_docs/howto/howto_optimize_build.cn.md 
b/website/_docs/howto/howto_optimize_build.cn.md
index 85448c6..72734cd 100644
--- a/website/_docs/howto/howto_optimize_build.cn.md
+++ b/website/_docs/howto/howto_optimize_build.cn.md
@@ -118,7 +118,7 @@ INSERT OVERWRITE TABLE 
kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c03
 
 
有些cuboid可以从一个以上的父cuboid聚合得到，这种情况下，Kylin会选择最小的一个父cuboid。举例,AB可以从ABC(id:1110)和ABD(id:1101)生成，则ABD会被选中，因为它的比ABC要小。在这基础上，如果D的基数较小，聚合运算的成本就会比较低。所以，当设计rowkey序列的时候，请记得将基数较小的维度放在末尾。这样不仅有利于cube构建，而且有助于cube查询，因为预聚合也遵循相同的规则。
 
-通常来说，从N维到(N/2)维的构建比较慢，因为这是cuboid数量爆炸性增长的阶段：N维有1个cuboid，(N-1)维有N个cuboid，(N-2)维有N*(N-1)个cuboid，以此类推。经过(N/2)维构建的步骤，整个构建任务会逐渐变快。
+通常来说，从N维到(N/2)维的构建比较慢，因为这是cuboid数量爆炸性增长的阶段：N维有1个cuboid，(N-1)维有N个cuboid，(N-2)维有N*(N-1)/2个cuboid，以此类推。经过(N/2)维构建的步骤，整个构建任务会逐渐变快。
 
 ## 构建cube
 
diff --git a/website/_docs/howto/howto_optimize_build.md 
b/website/_docs/howto/howto_optimize_build.md
index 0f68740..d1509be 100644
--- a/website/_docs/howto/howto_optimize_build.md
+++ b/website/_docs/howto/howto_optimize_build.md
@@ -132,7 +132,7 @@ These steps are the "by-layer" cubing process, each step 
uses the output of prev
 
 Some cuboid can be aggregated from more than 1 parent cubiods, in this case, 
Kylin will select the minimal parent cuboid. For example, AB can be generated 
from ABC (id: 1110) and ABD (id: 1101), so ABD will be used as its id is 
smaller than ABC. Based on this, if D's cardinality is small, the aggregation 
will be cost-efficient. So, when you design the Cube rowkey sequence, please 
remember to put low cardinality dimensions to the tail position. This not only 
benefit the Cube build, but al [...]
 
-Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid 
explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1) 
cuboids, etc. After (N/2)-D step, the building gets faster gradually.
+Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid 
explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has 
N*(N-1)/2 cuboids, etc. After (N/2)-D step, the building gets faster gradually.
 
 
 
diff --git a/website/_docs/tutorial/cube_spark.cn.md 
b/website/_docs/tutorial/cube_spark.cn.md
index 68d3597..dd91c8e 100644
--- a/website/_docs/tutorial/cube_spark.cn.md
+++ b/website/_docs/tutorial/cube_spark.cn.md
@@ -152,6 +152,7 @@ 
kylin.engine.livy-conf.livy-arr.jars=hdfs:///path/hbase-client-1.2.0-{$env.versi
 {% highlight Groff markup %}
 kylin.engine.spark-fact-distinct=true
 kylin.engine.spark-dimension-dictionary=true 
+kylin.engine.spark-uhc-dictionary=true
 {% endhighlight %}
 
 ## 疑难解答
diff --git a/website/_docs/tutorial/cube_spark.md 
b/website/_docs/tutorial/cube_spark.md
index e56c464..c614b75 100644
--- a/website/_docs/tutorial/cube_spark.md
+++ b/website/_docs/tutorial/cube_spark.md
@@ -147,8 +147,8 @@ As we all know, the cubing job includes several steps and 
the steps 'extract fac
 
 {% highlight Groff markup %}
 kylin.engine.spark-fact-distinct=true
-kylin.engine.spark-dimension-dictionary=true 
-kylin.engine.spark-udc-dictionary=true
+kylin.engine.spark-dimension-dictionary=true
+kylin.engine.spark-uhc-dictionary=true
 {% endhighlight %}
 
 
diff --git a/website/_docs24/howto/howto_optimize_build.cn.md 
b/website/_docs24/howto/howto_optimize_build.cn.md
index a622c36..76972ad 100644
--- a/website/_docs24/howto/howto_optimize_build.cn.md
+++ b/website/_docs24/howto/howto_optimize_build.cn.md
@@ -118,7 +118,7 @@ INSERT OVERWRITE TABLE 
kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c03
 
 
有些cuboid可以从一个以上的父cuboid聚合得到，这种情况下，Kylin会选择最小的一个父cuboid。举例,AB可以从ABC(id:1110)和ABD(id:1101)生成，则ABD会被选中，因为它的比ABC要小。在这基础上，如果D的基数较小，聚合运算的成本就会比较低。所以，当设计rowkey序列的时候，请记得将基数较小的维度放在末尾。这样不仅有利于cube构建，而且有助于cube查询，因为预聚合也遵循相同的规则。
 
-通常来说，从N维到(N/2)维的构建比较慢，因为这是cuboid数量爆炸性增长的阶段：N维有1个cuboid，(N-1)维有N个cuboid，(N-2)维有N*(N-1)个cuboid，以此类推。经过(N/2)维构建的步骤，整个构建任务会逐渐变快。
+通常来说，从N维到(N/2)维的构建比较慢，因为这是cuboid数量爆炸性增长的阶段：N维有1个cuboid，(N-1)维有N个cuboid，(N-2)维有N*(N-1)/2个cuboid，以此类推。经过(N/2)维构建的步骤，整个构建任务会逐渐变快。
 
 ## 构建cube
 
diff --git a/website/_docs24/howto/howto_optimize_build.md 
b/website/_docs24/howto/howto_optimize_build.md
index ce5394d..4ebb2e0 100644
--- a/website/_docs24/howto/howto_optimize_build.md
+++ b/website/_docs24/howto/howto_optimize_build.md
@@ -132,7 +132,7 @@ These steps are the "by-layer" cubing process, each step 
uses the output of prev
 
 Some cuboid can be aggregated from more than 1 parent cubiods, in this case, 
Kylin will select the minimal parent cuboid. For example, AB can be generated 
from ABC (id: 1110) and ABD (id: 1101), so ABD will be used as its id is 
smaller than ABC. Based on this, if D's cardinality is small, the aggregation 
will be cost-efficient. So, when you design the Cube rowkey sequence, please 
remember to put low cardinality dimensions to the tail position. This not only 
benefit the Cube build, but al [...]
 
-Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid 
explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1) 
cuboids, etc. After (N/2)-D step, the building gets faster gradually.
+Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid 
explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has 
N*(N-1)/2 cuboids, etc. After (N/2)-D step, the building gets faster gradually.
 
 
 
diff --git a/website/_docs30/howto/howto_optimize_build.cn.md 
b/website/_docs30/howto/howto_optimize_build.cn.md
index b027a0e..a358d22 100644
--- a/website/_docs30/howto/howto_optimize_build.cn.md
+++ b/website/_docs30/howto/howto_optimize_build.cn.md
@@ -118,7 +118,7 @@ INSERT OVERWRITE TABLE 
kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c03
 
 
有些cuboid可以从一个以上的父cuboid聚合得到，这种情况下，Kylin会选择最小的一个父cuboid。举例,AB可以从ABC(id:1110)和ABD(id:1101)生成，则ABD会被选中，因为它的比ABC要小。在这基础上，如果D的基数较小，聚合运算的成本就会比较低。所以，当设计rowkey序列的时候，请记得将基数较小的维度放在末尾。这样不仅有利于cube构建，而且有助于cube查询，因为预聚合也遵循相同的规则。
 
-通常来说，从N维到(N/2)维的构建比较慢，因为这是cuboid数量爆炸性增长的阶段：N维有1个cuboid，(N-1)维有N个cuboid，(N-2)维有N*(N-1)个cuboid，以此类推。经过(N/2)维构建的步骤，整个构建任务会逐渐变快。
+通常来说，从N维到(N/2)维的构建比较慢，因为这是cuboid数量爆炸性增长的阶段：N维有1个cuboid，(N-1)维有N个cuboid，(N-2)维有N*(N-1)/2个cuboid，以此类推。经过(N/2)维构建的步骤，整个构建任务会逐渐变快。
 
 ## 构建cube
 
diff --git a/website/_docs30/howto/howto_optimize_build.md 
b/website/_docs30/howto/howto_optimize_build.md
index 231fe34..4eb4ae1 100644
--- a/website/_docs30/howto/howto_optimize_build.md
+++ b/website/_docs30/howto/howto_optimize_build.md
@@ -132,7 +132,7 @@ These steps are the "by-layer" cubing process, each step 
uses the output of prev
 
 Some cuboid can be aggregated from more than 1 parent cubiods, in this case, 
Kylin will select the minimal parent cuboid. For example, AB can be generated 
from ABC (id: 1110) and ABD (id: 1101), so ABD will be used as its id is 
smaller than ABC. Based on this, if D's cardinality is small, the aggregation 
will be cost-efficient. So, when you design the Cube rowkey sequence, please 
remember to put low cardinality dimensions to the tail position. This not only 
benefit the Cube build, but al [...]
 
-Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid 
explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1) 
cuboids, etc. After (N/2)-D step, the building gets faster gradually.
+Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid 
explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has 
N*(N-1)/2 cuboids, etc. After (N/2)-D step, the building gets faster gradually.
 
 
 
diff --git a/website/_docs30/tutorial/cube_spark.cn.md 
b/website/_docs30/tutorial/cube_spark.cn.md
index 4599991..5f87cd5 100644
--- a/website/_docs30/tutorial/cube_spark.cn.md
+++ b/website/_docs30/tutorial/cube_spark.cn.md
@@ -151,7 +151,8 @@ 
kylin.engine.livy-conf.livy-arr.jars=hdfs:///path/hbase-client-1.2.0-{$env.versi
 
 {% highlight Groff markup %}
 kylin.engine.spark-fact-distinct=true
-kylin.engine.spark-dimension-dictionary=true 
+kylin.engine.spark-dimension-dictionary=true
+kylin.engine.spark-uhc-dictionary=true
 {% endhighlight %}
 
 ## 疑难解答
diff --git a/website/_docs30/tutorial/cube_spark.md 
b/website/_docs30/tutorial/cube_spark.md
index 6ad3949..06fc7a4 100644
--- a/website/_docs30/tutorial/cube_spark.md
+++ b/website/_docs30/tutorial/cube_spark.md
@@ -148,7 +148,7 @@ As we all know, the cubing job includes several steps and 
the steps 'extract fac
 {% highlight Groff markup %}
 kylin.engine.spark-fact-distinct=true
 kylin.engine.spark-dimension-dictionary=true 
-kylin.engine.spark-udc-dictionary=true
+kylin.engine.spark-uhc-dictionary=true
 {% endhighlight %}
 
 ## Troubleshooting

[kylin] branch document updated: KYLIN-4715 Wrong function with kylin document about how to optimize cube build

Reply via email to