This is an automated email from the ASF dual-hosted git repository. xxyu pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
The following commit(s) were added to refs/heads/document by this push: new 23a7e3c KYLIN-4715 Wrong function with kylin document about how to optimize cube build 23a7e3c is described below commit 23a7e3c9ada4ad5f298302553e9661c36b235e3d Author: rupengwang <wangrup...@live.cn> AuthorDate: Tue Aug 25 11:54:39 2020 +0800 KYLIN-4715 Wrong function with kylin document about how to optimize cube build --- website/_docs/howto/howto_optimize_build.cn.md | 2 +- website/_docs/howto/howto_optimize_build.md | 2 +- website/_docs/tutorial/cube_spark.cn.md | 1 + website/_docs/tutorial/cube_spark.md | 4 ++-- website/_docs24/howto/howto_optimize_build.cn.md | 2 +- website/_docs24/howto/howto_optimize_build.md | 2 +- website/_docs30/howto/howto_optimize_build.cn.md | 2 +- website/_docs30/howto/howto_optimize_build.md | 2 +- website/_docs30/tutorial/cube_spark.cn.md | 3 ++- website/_docs30/tutorial/cube_spark.md | 2 +- 10 files changed, 12 insertions(+), 10 deletions(-) diff --git a/website/_docs/howto/howto_optimize_build.cn.md b/website/_docs/howto/howto_optimize_build.cn.md index 85448c6..72734cd 100644 --- a/website/_docs/howto/howto_optimize_build.cn.md +++ b/website/_docs/howto/howto_optimize_build.cn.md @@ -118,7 +118,7 @@ INSERT OVERWRITE TABLE kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c03 有些cuboid可以从一个以上的父cuboid聚合得到,这种情况下,Kylin会选择最小的一个父cuboid。举例,AB可以从ABC(id:1110)和ABD(id:1101)生成,则ABD会被选中,因为它的比ABC要小。在这基础上,如果D的基数较小,聚合运算的成本就会比较低。所以,当设计rowkey序列的时候,请记得将基数较小的维度放在末尾。这样不仅有利于cube构建,而且有助于cube查询,因为预聚合也遵循相同的规则。 -通常来说,从N维到(N/2)维的构建比较慢,因为这是cuboid数量爆炸性增长的阶段:N维有1个cuboid,(N-1)维有N个cuboid,(N-2)维有N*(N-1)个cuboid,以此类推。经过(N/2)维构建的步骤,整个构建任务会逐渐变快。 +通常来说,从N维到(N/2)维的构建比较慢,因为这是cuboid数量爆炸性增长的阶段:N维有1个cuboid,(N-1)维有N个cuboid,(N-2)维有N*(N-1)/2个cuboid,以此类推。经过(N/2)维构建的步骤,整个构建任务会逐渐变快。 ## 构建cube diff --git a/website/_docs/howto/howto_optimize_build.md b/website/_docs/howto/howto_optimize_build.md index 0f68740..d1509be 100644 --- a/website/_docs/howto/howto_optimize_build.md +++ b/website/_docs/howto/howto_optimize_build.md @@ -132,7 +132,7 @@ These steps are the "by-layer" cubing process, each step uses the output of prev Some cuboid can be aggregated from more than 1 parent cubiods, in this case, Kylin will select the minimal parent cuboid. For example, AB can be generated from ABC (id: 1110) and ABD (id: 1101), so ABD will be used as its id is smaller than ABC. Based on this, if D's cardinality is small, the aggregation will be cost-efficient. So, when you design the Cube rowkey sequence, please remember to put low cardinality dimensions to the tail position. This not only benefit the Cube build, but al [...] -Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1) cuboids, etc. After (N/2)-D step, the building gets faster gradually. +Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1)/2 cuboids, etc. After (N/2)-D step, the building gets faster gradually. diff --git a/website/_docs/tutorial/cube_spark.cn.md b/website/_docs/tutorial/cube_spark.cn.md index 68d3597..dd91c8e 100644 --- a/website/_docs/tutorial/cube_spark.cn.md +++ b/website/_docs/tutorial/cube_spark.cn.md @@ -152,6 +152,7 @@ kylin.engine.livy-conf.livy-arr.jars=hdfs:///path/hbase-client-1.2.0-{$env.versi {% highlight Groff markup %} kylin.engine.spark-fact-distinct=true kylin.engine.spark-dimension-dictionary=true +kylin.engine.spark-uhc-dictionary=true {% endhighlight %} ## 疑难解答 diff --git a/website/_docs/tutorial/cube_spark.md b/website/_docs/tutorial/cube_spark.md index e56c464..c614b75 100644 --- a/website/_docs/tutorial/cube_spark.md +++ b/website/_docs/tutorial/cube_spark.md @@ -147,8 +147,8 @@ As we all know, the cubing job includes several steps and the steps 'extract fac {% highlight Groff markup %} kylin.engine.spark-fact-distinct=true -kylin.engine.spark-dimension-dictionary=true -kylin.engine.spark-udc-dictionary=true +kylin.engine.spark-dimension-dictionary=true +kylin.engine.spark-uhc-dictionary=true {% endhighlight %} diff --git a/website/_docs24/howto/howto_optimize_build.cn.md b/website/_docs24/howto/howto_optimize_build.cn.md index a622c36..76972ad 100644 --- a/website/_docs24/howto/howto_optimize_build.cn.md +++ b/website/_docs24/howto/howto_optimize_build.cn.md @@ -118,7 +118,7 @@ INSERT OVERWRITE TABLE kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c03 有些cuboid可以从一个以上的父cuboid聚合得到,这种情况下,Kylin会选择最小的一个父cuboid。举例,AB可以从ABC(id:1110)和ABD(id:1101)生成,则ABD会被选中,因为它的比ABC要小。在这基础上,如果D的基数较小,聚合运算的成本就会比较低。所以,当设计rowkey序列的时候,请记得将基数较小的维度放在末尾。这样不仅有利于cube构建,而且有助于cube查询,因为预聚合也遵循相同的规则。 -通常来说,从N维到(N/2)维的构建比较慢,因为这是cuboid数量爆炸性增长的阶段:N维有1个cuboid,(N-1)维有N个cuboid,(N-2)维有N*(N-1)个cuboid,以此类推。经过(N/2)维构建的步骤,整个构建任务会逐渐变快。 +通常来说,从N维到(N/2)维的构建比较慢,因为这是cuboid数量爆炸性增长的阶段:N维有1个cuboid,(N-1)维有N个cuboid,(N-2)维有N*(N-1)/2个cuboid,以此类推。经过(N/2)维构建的步骤,整个构建任务会逐渐变快。 ## 构建cube diff --git a/website/_docs24/howto/howto_optimize_build.md b/website/_docs24/howto/howto_optimize_build.md index ce5394d..4ebb2e0 100644 --- a/website/_docs24/howto/howto_optimize_build.md +++ b/website/_docs24/howto/howto_optimize_build.md @@ -132,7 +132,7 @@ These steps are the "by-layer" cubing process, each step uses the output of prev Some cuboid can be aggregated from more than 1 parent cubiods, in this case, Kylin will select the minimal parent cuboid. For example, AB can be generated from ABC (id: 1110) and ABD (id: 1101), so ABD will be used as its id is smaller than ABC. Based on this, if D's cardinality is small, the aggregation will be cost-efficient. So, when you design the Cube rowkey sequence, please remember to put low cardinality dimensions to the tail position. This not only benefit the Cube build, but al [...] -Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1) cuboids, etc. After (N/2)-D step, the building gets faster gradually. +Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1)/2 cuboids, etc. After (N/2)-D step, the building gets faster gradually. diff --git a/website/_docs30/howto/howto_optimize_build.cn.md b/website/_docs30/howto/howto_optimize_build.cn.md index b027a0e..a358d22 100644 --- a/website/_docs30/howto/howto_optimize_build.cn.md +++ b/website/_docs30/howto/howto_optimize_build.cn.md @@ -118,7 +118,7 @@ INSERT OVERWRITE TABLE kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c03 有些cuboid可以从一个以上的父cuboid聚合得到,这种情况下,Kylin会选择最小的一个父cuboid。举例,AB可以从ABC(id:1110)和ABD(id:1101)生成,则ABD会被选中,因为它的比ABC要小。在这基础上,如果D的基数较小,聚合运算的成本就会比较低。所以,当设计rowkey序列的时候,请记得将基数较小的维度放在末尾。这样不仅有利于cube构建,而且有助于cube查询,因为预聚合也遵循相同的规则。 -通常来说,从N维到(N/2)维的构建比较慢,因为这是cuboid数量爆炸性增长的阶段:N维有1个cuboid,(N-1)维有N个cuboid,(N-2)维有N*(N-1)个cuboid,以此类推。经过(N/2)维构建的步骤,整个构建任务会逐渐变快。 +通常来说,从N维到(N/2)维的构建比较慢,因为这是cuboid数量爆炸性增长的阶段:N维有1个cuboid,(N-1)维有N个cuboid,(N-2)维有N*(N-1)/2个cuboid,以此类推。经过(N/2)维构建的步骤,整个构建任务会逐渐变快。 ## 构建cube diff --git a/website/_docs30/howto/howto_optimize_build.md b/website/_docs30/howto/howto_optimize_build.md index 231fe34..4eb4ae1 100644 --- a/website/_docs30/howto/howto_optimize_build.md +++ b/website/_docs30/howto/howto_optimize_build.md @@ -132,7 +132,7 @@ These steps are the "by-layer" cubing process, each step uses the output of prev Some cuboid can be aggregated from more than 1 parent cubiods, in this case, Kylin will select the minimal parent cuboid. For example, AB can be generated from ABC (id: 1110) and ABD (id: 1101), so ABD will be used as its id is smaller than ABC. Based on this, if D's cardinality is small, the aggregation will be cost-efficient. So, when you design the Cube rowkey sequence, please remember to put low cardinality dimensions to the tail position. This not only benefit the Cube build, but al [...] -Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1) cuboids, etc. After (N/2)-D step, the building gets faster gradually. +Usually from the N-D to (N/2)-D the building is slow, because it is the cuboid explosion process: N-D has 1 Cuboid, (N-1)-D has N cuboids, (N-2)-D has N*(N-1)/2 cuboids, etc. After (N/2)-D step, the building gets faster gradually. diff --git a/website/_docs30/tutorial/cube_spark.cn.md b/website/_docs30/tutorial/cube_spark.cn.md index 4599991..5f87cd5 100644 --- a/website/_docs30/tutorial/cube_spark.cn.md +++ b/website/_docs30/tutorial/cube_spark.cn.md @@ -151,7 +151,8 @@ kylin.engine.livy-conf.livy-arr.jars=hdfs:///path/hbase-client-1.2.0-{$env.versi {% highlight Groff markup %} kylin.engine.spark-fact-distinct=true -kylin.engine.spark-dimension-dictionary=true +kylin.engine.spark-dimension-dictionary=true +kylin.engine.spark-uhc-dictionary=true {% endhighlight %} ## 疑难解答 diff --git a/website/_docs30/tutorial/cube_spark.md b/website/_docs30/tutorial/cube_spark.md index 6ad3949..06fc7a4 100644 --- a/website/_docs30/tutorial/cube_spark.md +++ b/website/_docs30/tutorial/cube_spark.md @@ -148,7 +148,7 @@ As we all know, the cubing job includes several steps and the steps 'extract fac {% highlight Groff markup %} kylin.engine.spark-fact-distinct=true kylin.engine.spark-dimension-dictionary=true -kylin.engine.spark-udc-dictionary=true +kylin.engine.spark-uhc-dictionary=true {% endhighlight %} ## Troubleshooting