Author: lidong
Date: Sat Jul 22 01:40:55 2017
New Revision: 1802649

URL: http://svn.apache.org/viewvc?rev=1802649&view=rev
Log:
Add kaisen's blog

Added:
    kylin/site/2017/
    kylin/site/2017/07/
    kylin/site/2017/07/21/
    kylin/site/2017/07/21/Improving-Spark-Cubing/
    kylin/site/2017/07/21/Improving-Spark-Cubing/index.html
Modified:
    kylin/site/blog/index.html
    kylin/site/docs20/tutorial/cube_spark.html
    kylin/site/feed.xml

Added: kylin/site/2017/07/21/Improving-Spark-Cubing/index.html
URL: 
http://svn.apache.org/viewvc/kylin/site/2017/07/21/Improving-Spark-Cubing/index.html?rev=1802649&view=auto
==============================================================================
--- kylin/site/2017/07/21/Improving-Spark-Cubing/index.html (added)
+++ kylin/site/2017/07/21/Improving-Spark-Cubing/index.html Sat Jul 22 01:40:55 
2017
@@ -0,0 +1,151 @@
+<h1 id="improving-spark-cubing-in-kylin-20">Improving Spark Cubing in Kylin 
2.0</h1>
+
+<p>Author: Kaisen Kang</p>
+
+<hr />
+
+<p>Apache Kylin is a OALP Engine that speeding up query by Cube 
precomputation. The Cube is multi-dimensional dataset which contain precomputed 
all measures in all dimension combinations. Before v2.0, Kylin uses MapReduce 
to build Cube. In order to get better performance, Kylin 2.0 introduced the 
Spark Cubing. About the principle of Spark Cubing, please refer to the article 
<a 
href="http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/";>By-layer 
Spark Cubing</a>.</p>
+
+<p>In this blog, I will talk about the following topics:</p>
+
+<ul>
+  <li>How to make Spark Cubing support HBase cluster with Kerberos enabled</li>
+  <li>Spark configurations for Cubing</li>
+  <li>Performance of Spark Cubing</li>
+  <li>Pros and cons of Spark Cubing</li>
+  <li>Applicable scenarios of Spark Cubing</li>
+  <li>Improvement for dictionary loading in Spark Cubing</li>
+</ul>
+
+<p>In currently Spark Cubing(2.0) version, it doesn’t support HBase cluster 
using Kerberos bacause Spark Cubing need to get matadata from HBase. To solve 
this problem, we have two solutions: one is to make Spark could connect HBase 
with Kerberos, the other is to avoid Spark connect to HBase in Spark Cubing.</p>
+
+<h3 id="make-spark-connect-hbase-with-kerberos-enabled">Make Spark connect 
HBase with Kerberos enabled</h3>
+<p>If just want to run Spark Cubing in Yarn client mode, we only need to add 
three line code before new SparkConf() in SparkCubingByLayer:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>        
Configuration configuration = HBaseConnection.getCurrentHBaseConfiguration();   
     
+        HConnection connection = 
HConnectionManager.createConnection(configuration);
+        //Obtain an authentication token for the given user and add it to the 
user's credentials.
+        TokenUtil.obtainAndCacheToken(connection, 
UserProvider.instantiate(configuration).create(UserGroupInformation.getCurrentUser()));
+</code></pre>
+</div>
+
+<p>As for How to make Spark connect HBase using Kerberos in Yarn cluster mode, 
please refer to SPARK-6918, SPARK-12279, and HBASE-17040. The solution may 
work, but not elegant. So I tried the sencond solution.</p>
+
+<h3 id="use-hdfs-metastore-for-spark-cubing">Use HDFS metastore for Spark 
Cubing</h3>
+
+<p>The core idea here is uploading the necessary metadata job related to HDFS 
and using HDFSResourceStore manage the metadata.</p>
+
+<p>Before introducing how to use HDFSResourceStore instead of 
HBaseResourceStore in Spark Cubing. Let’s see what’s Kylin metadata format 
and how Kylin manages the metadata.</p>
+
+<p>Every concrete metadata for table, cube, model and project is a JSON file 
in Kylin. The whole metadata is organized by file directory. The picture below 
is the root directory for Kylin metadata,<br />
+<img 
src="http://static.zybuluo.com/kangkaisen/t1tc6neiaebiyfoir4fdhs11/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202017-07-02%20%E4%B8%8B%E5%8D%883.51.43.png";
 alt="屏幕快照 2017-07-02 下午3.51.43.png-20.7kB" /><br />
+This following picture shows the content of project dir, the “learn_kylin” 
and “kylin_test” are both project names.<br />
+<img 
src="http://static.zybuluo.com/kangkaisen/4dtiioqnw08w6vtj0r9u5f27/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202017-07-02%20%E4%B8%8B%E5%8D%883.54.59.png";
 alt="屏幕快照 2017-07-02 下午3.54.59.png-11.8kB" /></p>
+
+<p>Kylin manage the metadata using ResourceStore, ResourceStore is a abstract 
class, which abstract the CRUD Interface for metadata. ResourceStore has three 
implementation classes:</p>
+
+<ul>
+  <li>FileResourceStore  (store with Local FileSystem)</li>
+  <li>HDFSResourceStore</li>
+  <li>HBaseResourceStore</li>
+</ul>
+
+<p>Currently, only HBaseResourceStore could use in production env. 
FileResourceStore mainly used for testing. HDFSResourceStore doesn’t support 
massive concurrent write, but it is ideal to use for read only scenario like 
Cubing. Kylin use the “kylin.metadata.url” config to decide which kind of 
ResourceStore will be used.</p>
+
+<p>Now, Let’s see How to use HDFSResourceStore instead of HBaseResourceStore 
in Spark Cubing.</p>
+
+<ol>
+  <li>Determine the necessary metadata for Spark Cubing job</li>
+  <li>Dump the necessary metadata from HBase to local</li>
+  <li>Update the kylin.metadata.url and then write all Kylin config to 
“kylin.properties” file in local metadata dir.</li>
+  <li>Use ResourceTool upload the local metadata to HDFS.</li>
+  <li>Construct the HDFSResourceStore from the HDFS “kylin.properties” 
file in Spark executor.</li>
+</ol>
+
+<p>Of course, We need to delete the HDFS metadata dir on complete. I’m 
working on a patch for this, please watch KYLIN-2653 for update.</p>
+
+<h3 id="spark-configurations-for-cubing">Spark configurations for Cubing</h3>
+
+<p>Following is the Spark configuration I used in our environment. It enables 
Spark dynamic resource allocation; the goal is to let our user set less Spark 
configurations.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>//running in 
yarn-cluster mode
+kylin.engine.spark-conf.spark.master=yarn
+kylin.engine.spark-conf.spark.submit.deployMode=cluster 
+
+//enable the dynamic allocation for Spark to avoid user set the number of 
executors explicitly
+kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
+kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=10
+kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1024
+kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
+kylin.engine.spark-conf.spark.shuffle.service.enabled=true
+kylin.engine.spark-conf.spark.shuffle.service.port=7337
+
+//the memory config
+kylin.engine.spark-conf.spark.driver.memory=4G
+//should enlarge the executor.memory when the cube dict is huge
+kylin.engine.spark-conf.spark.executor.memory=4G 
+//because kylin need to load the cube dict in executor
+kylin.engine.spark-conf.spark.executor.cores=1
+
+//enlarge the timeout
+kylin.engine.spark-conf.spark.network.timeout=600
+
+kylin.engine.spark-conf.spark.yarn.queue=root.hadoop.test
+
+kylin.engine.spark.rdd-partition-cut-mb=100
+</code></pre>
+</div>
+
+<h3 id="performance-test-of-spark-cubing">Performance test of Spark Cubing</h3>
+
+<p>For the source data scale from millions to hundreds of millions, my test 
result is consistent with the blog <a 
href="http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/";>By-layer 
Spark Cubing</a>. The improvement is remarkable. Moreover, I also tested with 
billions of source data and having huge dictionary specially.</p>
+
+<p>The test Cube1 has 2.7 billion source data, 9 dimensions, one precise 
distinct count measure having 70 million cardinality (which means the dict also 
has 70 million cardinality).</p>
+
+<p>Test test Cube2 has 2.4 billion source data, 13 dimensions, 38 
measures(contains 9 precise distinct count measures).</p>
+
+<p>The test result is shown in below picture, the unit of time is minute.<br />
+<img 
src="http://static.zybuluo.com/kangkaisen/1urzfkal8od52fodi1l6u0y5/image.png"; 
alt="image.png-38.1kB" /></p>
+
+<p>In one word, <strong>Spark Cubing is much faster than MR cubing in most 
scenes</strong>.</p>
+
+<h3 id="pros-and-cons-of-spark-cubing">Pros and Cons of Spark Cubing</h3>
+<p>In my opinion, the advantage for Spark Cubing includes:</p>
+
+<ol>
+  <li>Because of the RDD cache, Spark Cubing could take full advantage of 
memory to avoid disk I/O.</li>
+  <li>When we have enough memory resource, Spark Cubing could use more memory 
resource to get better build performance.</li>
+</ol>
+
+<p>On the contrary,the drawback for Spark Cubing includes:</p>
+
+<ol>
+  <li>Spark Cubing couldn’t handle huge dictionary well (hundreds of 
millions of cardinality);</li>
+  <li>Spark Cubing isn’t stable enough for very large scale data.</li>
+</ol>
+
+<h3 id="applicable-scenarios-of-spark-cubing">Applicable scenarios of Spark 
Cubing</h3>
+<p>In my opinion, except the huge dictionary scenario, we all could use Spark 
Cubing to replace MR Cubing, especially under the following scenarios:</p>
+
+<ol>
+  <li>Many dimensions</li>
+  <li>Normal dictionaries (e.g, cardinality &lt; 1 hundred millions)</li>
+  <li>Normal scale data (e.g, less than 10 billion rows to build at once).</li>
+</ol>
+
+<h3 id="improvement-for-dictionary-loading-in-spark-cubing">Improvement for 
dictionary loading in Spark Cubing</h3>
+
+<p>As we all known, a big difference for MR and Spark is, the task for MR is 
running in process, but the task for Spark is running in thread. So, in MR 
Cubing, the dict of Cube only load once, but in Spark Cubing, the dict will be 
loaded many times in one executor, which will cause frequent GC.</p>
+
+<p>So, I made the two improvements:</p>
+
+<ol>
+  <li>Only load the dict once in one executor.</li>
+  <li>Add maximumSize for LoadingCache in the AppendTrieDictionary to make the 
dict removed as early as possible.</li>
+</ol>
+
+<p>These two improvements have been contributed into Kylin repository.</p>
+
+<h3 id="summary">Summary</h3>
+<p>Spark Cubing is a great feature for Kylin 2.0, Thanks Kylin community. We 
will apply Spark Cubing in real scenarios in our company. I believe Spark 
Cubing will be more robust and efficient in the future releases.</p>
+

Modified: kylin/site/blog/index.html
URL: 
http://svn.apache.org/viewvc/kylin/site/blog/index.html?rev=1802649&r1=1802648&r2=1802649&view=diff
==============================================================================
--- kylin/site/blog/index.html (original)
+++ kylin/site/blog/index.html Sat Jul 22 01:40:55 2017
@@ -283,25 +283,25 @@
     
             <li>
         <h2 align="left" style="margin:0px">
-          <a class="post-link" href="/blog/2016/05/26/release-v1.5.2/">Apache 
Kylin v1.5.2 Release Announcement</a></h2><div align="left" 
class="post-meta">posted: May 26, 2016</div>
+          <a class="post-link" 
href="/cn/blog/2016/05/26/release-v1.5.2/">Apache Kylin v1.5.2 
正式发布</a></h2><div align="left" class="post-meta">posted: May 26, 
2016</div>
         
       </li>
     
             <li>
         <h2 align="left" style="margin:0px">
-          <a class="post-link" 
href="/cn/blog/2016/05/26/release-v1.5.2/">Apache Kylin v1.5.2 
正式发布</a></h2><div align="left" class="post-meta">posted: May 26, 
2016</div>
+          <a class="post-link" href="/blog/2016/05/26/release-v1.5.2/">Apache 
Kylin v1.5.2 Release Announcement</a></h2><div align="left" 
class="post-meta">posted: May 26, 2016</div>
         
       </li>
     
             <li>
         <h2 align="left" style="margin:0px">
-          <a class="post-link" 
href="/cn/blog/2016/04/12/release-v1.5.1/">Apache Kylin v1.5.1 
正式发布</a></h2><div align="left" class="post-meta">posted: Apr 12, 
2016</div>
+          <a class="post-link" href="/blog/2016/04/12/release-v1.5.1/">Apache 
Kylin v1.5.1 Release Announcement</a></h2><div align="left" 
class="post-meta">posted: Apr 12, 2016</div>
         
       </li>
     
             <li>
         <h2 align="left" style="margin:0px">
-          <a class="post-link" href="/blog/2016/04/12/release-v1.5.1/">Apache 
Kylin v1.5.1 Release Announcement</a></h2><div align="left" 
class="post-meta">posted: Apr 12, 2016</div>
+          <a class="post-link" 
href="/cn/blog/2016/04/12/release-v1.5.1/">Apache Kylin v1.5.1 
正式发布</a></h2><div align="left" class="post-meta">posted: Apr 12, 
2016</div>
         
       </li>
     
@@ -361,13 +361,13 @@
     
             <li>
         <h2 align="left" style="margin:0px">
-          <a class="post-link" href="/blog/2015/12/23/release-v1.2/">Apache 
Kylin v1.2 Release Announcement</a></h2><div align="left" 
class="post-meta">posted: Dec 23, 2015</div>
+          <a class="post-link" href="/cn/blog/2015/12/23/release-v1.2/">Apache 
Kylin v1.2 正式发布</a></h2><div align="left" class="post-meta">posted: Dec 
23, 2015</div>
         
       </li>
     
             <li>
         <h2 align="left" style="margin:0px">
-          <a class="post-link" href="/cn/blog/2015/12/23/release-v1.2/">Apache 
Kylin v1.2 正式发布</a></h2><div align="left" class="post-meta">posted: Dec 
23, 2015</div>
+          <a class="post-link" href="/blog/2015/12/23/release-v1.2/">Apache 
Kylin v1.2 Release Announcement</a></h2><div align="left" 
class="post-meta">posted: Dec 23, 2015</div>
         
       </li>
     

Modified: kylin/site/docs20/tutorial/cube_spark.html
URL: 
http://svn.apache.org/viewvc/kylin/site/docs20/tutorial/cube_spark.html?rev=1802649&r1=1802648&r2=1802649&view=diff
==============================================================================
--- kylin/site/docs20/tutorial/cube_spark.html (original)
+++ kylin/site/docs20/tutorial/cube_spark.html Sat Jul 22 01:40:55 2017
@@ -2827,7 +2827,7 @@ $KYLIN_HOME/bin/kylin.sh start</code></p
 
 <h2 id="go-further">Go further</h2>
 
-<p>If you’re a Kylin administrator but new to Spark, suggest you go through 
<a href="https://spark.apache.org/docs/1.6.3/";>Spark documents</a>, and don’t 
forget to update the configurations accordingly. Spark’s performance relies 
on Cluster’s memory and CPU resource, while Kylin’s Cube build is a heavy 
task when having a complex data model and a huge dataset to build at one time. 
If your cluster resource couldn’t fulfill, errors like “OutOfMemorry” 
will be thrown in Spark executors, so please use it properly. For Cube which 
has UHC dimension, many combinations (e.g, a full cube with more than 12 
dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use 
the MapReduce engine. If your Cube model is simple, all measures are 
SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark engine would be 
a good choice. Besides, Streaming build isn’t supported in this engine so far 
(KYLIN-2484).</p>
+<p>If you’re a Kylin administrator but new to Spark, suggest you go through 
<a href="https://spark.apache.org/docs/1.6.3/";>Spark documents</a>, and don’t 
forget to update the configurations accordingly. You can enable Spark <a 
href="https://spark.apache.org/docs/1.6.1/configuration.html#dynamic-allocation";>Dynamic
 Resource Allocation</a> so that it can auto scale/shrink for different work 
load. Spark’s performance relies on Cluster’s memory and CPU resource, 
while Kylin’s Cube build is a heavy task when having a complex data model and 
a huge dataset to build at one time. If your cluster resource couldn’t 
fulfill, errors like “OutOfMemorry” will be thrown in Spark executors, so 
please use it properly. For Cube which has UHC dimension, many combinations 
(e.g, a full cube with more than 12 dimensions), or memory hungry measures 
(Count Distinct, Top-N), suggest to use the MapReduce engine. If your Cube 
model is simple, all measures are SUM/MIN/MAX
 /COUNT, source data is small to medium scale, Spark engine would be a good 
choice. Besides, Streaming build isn’t supported in this engine so far 
(KYLIN-2484).</p>
 
 <p>Now the Spark engine is in public beta; If you have any question, comment, 
or bug fix, welcome to discuss in [email protected].</p>
 

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1802649&r1=1802648&r2=1802649&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Sat Jul 22 01:40:55 2017
@@ -19,11 +19,172 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Tue, 27 Jun 2017 02:35:13 -0700</pubDate>
-    <lastBuildDate>Tue, 27 Jun 2017 02:35:13 -0700</lastBuildDate>
+    <pubDate>Fri, 21 Jul 2017 18:39:35 -0700</pubDate>
+    <lastBuildDate>Fri, 21 Jul 2017 18:39:35 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>Improving Spark Cubing</title>
+        <description>&lt;h1 
id=&quot;improving-spark-cubing-in-kylin-20&quot;&gt;Improving Spark Cubing in 
Kylin 2.0&lt;/h1&gt;
+
+&lt;p&gt;Author: Kaisen Kang&lt;/p&gt;
+
+&lt;hr /&gt;
+
+&lt;p&gt;Apache Kylin is a OALP Engine that speeding up query by Cube 
precomputation. The Cube is multi-dimensional dataset which contain precomputed 
all measures in all dimension combinations. Before v2.0, Kylin uses MapReduce 
to build Cube. In order to get better performance, Kylin 2.0 introduced the 
Spark Cubing. About the principle of Spark Cubing, please refer to the article 
&lt;a 
href=&quot;http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/&quot;&gt;By-layer
 Spark Cubing&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;In this blog, I will talk about the following topics:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;How to make Spark Cubing support HBase cluster with Kerberos 
enabled&lt;/li&gt;
+  &lt;li&gt;Spark configurations for Cubing&lt;/li&gt;
+  &lt;li&gt;Performance of Spark Cubing&lt;/li&gt;
+  &lt;li&gt;Pros and cons of Spark Cubing&lt;/li&gt;
+  &lt;li&gt;Applicable scenarios of Spark Cubing&lt;/li&gt;
+  &lt;li&gt;Improvement for dictionary loading in Spark Cubing&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;In currently Spark Cubing(2.0) version, it doesn’t support HBase 
cluster using Kerberos bacause Spark Cubing need to get matadata from HBase. To 
solve this problem, we have two solutions: one is to make Spark could connect 
HBase with Kerberos, the other is to avoid Spark connect to HBase in Spark 
Cubing.&lt;/p&gt;
+
+&lt;h3 id=&quot;make-spark-connect-hbase-with-kerberos-enabled&quot;&gt;Make 
Spark connect HBase with Kerberos enabled&lt;/h3&gt;
+&lt;p&gt;If just want to run Spark Cubing in Yarn client mode, we only need to 
add three line code before new SparkConf() in SparkCubingByLayer:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;        Configuration configuration 
= HBaseConnection.getCurrentHBaseConfiguration();        
+        HConnection connection = 
HConnectionManager.createConnection(configuration);
+        //Obtain an authentication token for the given user and add it to the 
user&#39;s credentials.
+        TokenUtil.obtainAndCacheToken(connection, 
UserProvider.instantiate(configuration).create(UserGroupInformation.getCurrentUser()));
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;As for How to make Spark connect HBase using Kerberos in Yarn cluster 
mode, please refer to SPARK-6918, SPARK-12279, and HBASE-17040. The solution 
may work, but not elegant. So I tried the sencond solution.&lt;/p&gt;
+
+&lt;h3 id=&quot;use-hdfs-metastore-for-spark-cubing&quot;&gt;Use HDFS 
metastore for Spark Cubing&lt;/h3&gt;
+
+&lt;p&gt;The core idea here is uploading the necessary metadata job related to 
HDFS and using HDFSResourceStore manage the metadata.&lt;/p&gt;
+
+&lt;p&gt;Before introducing how to use HDFSResourceStore instead of 
HBaseResourceStore in Spark Cubing. Let’s see what’s Kylin metadata format 
and how Kylin manages the metadata.&lt;/p&gt;
+
+&lt;p&gt;Every concrete metadata for table, cube, model and project is a JSON 
file in Kylin. The whole metadata is organized by file directory. The picture 
below is the root directory for Kylin metadata,&lt;br /&gt;
+&lt;img 
src=&quot;http://static.zybuluo.com/kangkaisen/t1tc6neiaebiyfoir4fdhs11/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202017-07-02%20%E4%B8%8B%E5%8D%883.51.43.png&quot;
 alt=&quot;屏幕快照 2017-07-02 下午3.51.43.png-20.7kB&quot; /&gt;&lt;br 
/&gt;
+This following picture shows the content of project dir, the “learn_kylin” 
and “kylin_test” are both project names.&lt;br /&gt;
+&lt;img 
src=&quot;http://static.zybuluo.com/kangkaisen/4dtiioqnw08w6vtj0r9u5f27/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202017-07-02%20%E4%B8%8B%E5%8D%883.54.59.png&quot;
 alt=&quot;屏幕快照 2017-07-02 下午3.54.59.png-11.8kB&quot; 
/&gt;&lt;/p&gt;
+
+&lt;p&gt;Kylin manage the metadata using ResourceStore, ResourceStore is a 
abstract class, which abstract the CRUD Interface for metadata. ResourceStore 
has three implementation classes:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;FileResourceStore  (store with Local FileSystem)&lt;/li&gt;
+  &lt;li&gt;HDFSResourceStore&lt;/li&gt;
+  &lt;li&gt;HBaseResourceStore&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Currently, only HBaseResourceStore could use in production env. 
FileResourceStore mainly used for testing. HDFSResourceStore doesn’t support 
massive concurrent write, but it is ideal to use for read only scenario like 
Cubing. Kylin use the “kylin.metadata.url” config to decide which kind of 
ResourceStore will be used.&lt;/p&gt;
+
+&lt;p&gt;Now, Let’s see How to use HDFSResourceStore instead of 
HBaseResourceStore in Spark Cubing.&lt;/p&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Determine the necessary metadata for Spark Cubing job&lt;/li&gt;
+  &lt;li&gt;Dump the necessary metadata from HBase to local&lt;/li&gt;
+  &lt;li&gt;Update the kylin.metadata.url and then write all Kylin config to 
“kylin.properties” file in local metadata dir.&lt;/li&gt;
+  &lt;li&gt;Use ResourceTool upload the local metadata to HDFS.&lt;/li&gt;
+  &lt;li&gt;Construct the HDFSResourceStore from the HDFS 
“kylin.properties” file in Spark executor.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;Of course, We need to delete the HDFS metadata dir on complete. I’m 
working on a patch for this, please watch KYLIN-2653 for update.&lt;/p&gt;
+
+&lt;h3 id=&quot;spark-configurations-for-cubing&quot;&gt;Spark configurations 
for Cubing&lt;/h3&gt;
+
+&lt;p&gt;Following is the Spark configuration I used in our environment. It 
enables Spark dynamic resource allocation; the goal is to let our user set less 
Spark configurations.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;//running in yarn-cluster mode
+kylin.engine.spark-conf.spark.master=yarn
+kylin.engine.spark-conf.spark.submit.deployMode=cluster 
+
+//enable the dynamic allocation for Spark to avoid user set the number of 
executors explicitly
+kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
+kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=10
+kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1024
+kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
+kylin.engine.spark-conf.spark.shuffle.service.enabled=true
+kylin.engine.spark-conf.spark.shuffle.service.port=7337
+
+//the memory config
+kylin.engine.spark-conf.spark.driver.memory=4G
+//should enlarge the executor.memory when the cube dict is huge
+kylin.engine.spark-conf.spark.executor.memory=4G 
+//because kylin need to load the cube dict in executor
+kylin.engine.spark-conf.spark.executor.cores=1
+
+//enlarge the timeout
+kylin.engine.spark-conf.spark.network.timeout=600
+
+kylin.engine.spark-conf.spark.yarn.queue=root.hadoop.test
+
+kylin.engine.spark.rdd-partition-cut-mb=100
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h3 id=&quot;performance-test-of-spark-cubing&quot;&gt;Performance test of 
Spark Cubing&lt;/h3&gt;
+
+&lt;p&gt;For the source data scale from millions to hundreds of millions, my 
test result is consistent with the blog &lt;a 
href=&quot;http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/&quot;&gt;By-layer
 Spark Cubing&lt;/a&gt;. The improvement is remarkable. Moreover, I also tested 
with billions of source data and having huge dictionary specially.&lt;/p&gt;
+
+&lt;p&gt;The test Cube1 has 2.7 billion source data, 9 dimensions, one precise 
distinct count measure having 70 million cardinality (which means the dict also 
has 70 million cardinality).&lt;/p&gt;
+
+&lt;p&gt;Test test Cube2 has 2.4 billion source data, 13 dimensions, 38 
measures(contains 9 precise distinct count measures).&lt;/p&gt;
+
+&lt;p&gt;The test result is shown in below picture, the unit of time is 
minute.&lt;br /&gt;
+&lt;img 
src=&quot;http://static.zybuluo.com/kangkaisen/1urzfkal8od52fodi1l6u0y5/image.png&quot;
 alt=&quot;image.png-38.1kB&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;In one word, &lt;strong&gt;Spark Cubing is much faster than MR cubing 
in most scenes&lt;/strong&gt;.&lt;/p&gt;
+
+&lt;h3 id=&quot;pros-and-cons-of-spark-cubing&quot;&gt;Pros and Cons of Spark 
Cubing&lt;/h3&gt;
+&lt;p&gt;In my opinion, the advantage for Spark Cubing includes:&lt;/p&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Because of the RDD cache, Spark Cubing could take full advantage 
of memory to avoid disk I/O.&lt;/li&gt;
+  &lt;li&gt;When we have enough memory resource, Spark Cubing could use more 
memory resource to get better build performance.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;On the contrary,the drawback for Spark Cubing includes:&lt;/p&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Spark Cubing couldn’t handle huge dictionary well (hundreds of 
millions of cardinality);&lt;/li&gt;
+  &lt;li&gt;Spark Cubing isn’t stable enough for very large scale 
data.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;h3 id=&quot;applicable-scenarios-of-spark-cubing&quot;&gt;Applicable 
scenarios of Spark Cubing&lt;/h3&gt;
+&lt;p&gt;In my opinion, except the huge dictionary scenario, we all could use 
Spark Cubing to replace MR Cubing, especially under the following 
scenarios:&lt;/p&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Many dimensions&lt;/li&gt;
+  &lt;li&gt;Normal dictionaries (e.g, cardinality &amp;lt; 1 hundred 
millions)&lt;/li&gt;
+  &lt;li&gt;Normal scale data (e.g, less than 10 billion rows to build at 
once).&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;h3 
id=&quot;improvement-for-dictionary-loading-in-spark-cubing&quot;&gt;Improvement
 for dictionary loading in Spark Cubing&lt;/h3&gt;
+
+&lt;p&gt;As we all known, a big difference for MR and Spark is, the task for 
MR is running in process, but the task for Spark is running in thread. So, in 
MR Cubing, the dict of Cube only load once, but in Spark Cubing, the dict will 
be loaded many times in one executor, which will cause frequent GC.&lt;/p&gt;
+
+&lt;p&gt;So, I made the two improvements:&lt;/p&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Only load the dict once in one executor.&lt;/li&gt;
+  &lt;li&gt;Add maximumSize for LoadingCache in the AppendTrieDictionary to 
make the dict removed as early as possible.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;These two improvements have been contributed into Kylin 
repository.&lt;/p&gt;
+
+&lt;h3 id=&quot;summary&quot;&gt;Summary&lt;/h3&gt;
+&lt;p&gt;Spark Cubing is a great feature for Kylin 2.0, Thanks Kylin 
community. We will apply Spark Cubing in real scenarios in our company. I 
believe Spark Cubing will be more robust and efficient in the future 
releases.&lt;/p&gt;
+
+</description>
+        <pubDate>Fri, 21 Jul 2017 00:00:00 -0700</pubDate>
+        <link>http://kylin.apache.org/2017/07/21/Improving-Spark-Cubing/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/2017/07/21/Improving-Spark-Cubing/</guid>
+        
+        
+      </item>
+    
+      <item>
         <title>A new measure for Percentile precalculation</title>
         <description>&lt;h2 
id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
@@ -65,54 +226,54 @@
       </item>
     
       <item>
-        <title>Apache Kylin v2.0.0 Beta Announcement</title>
-        <description>&lt;p&gt;The Apache Kylin community is pleased to 
announce the &lt;a href=&quot;http://kylin.apache.org/download/&quot;&gt;v2.0.0 
beta package&lt;/a&gt; is ready for download and test.&lt;/p&gt;
+        <title>Apache Kylin v2.0.0 beta 发布</title>
+        <description>&lt;p&gt;Apache Kylin社区非常高兴地宣布 &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;v2.0.0 beta 
package&lt;/a&gt; 已经可以下载并测试了。&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;Download link: &lt;a 
href=&quot;http://kylin.apache.org/download/&quot;&gt;http://kylin.apache.org/download/&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Source code: 
https://github.com/apache/kylin/tree/kylin-2.0.0-beta&lt;/li&gt;
+  &lt;li&gt;下载链接: &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;http://kylin.apache.org/cn/download/&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;源代码: 
https://github.com/apache/kylin/tree/kylin-2.0.0-beta&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;It has been more than 2 month since the v1.6.0 release. The community 
has been working hard to deliver some long wanted features, hoping to move 
Apache Kylin to the next level.&lt;/p&gt;
+&lt;p&gt;自从v1.6.0版本发布已经2个多月了。这段时间里,整个社区协力开发完成了一系列重大的功能,希望能将Apache
 Kylin提升到一个新的高度。&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;Support snowflake data model (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1875&quot;&gt;KYLIN-1875&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Support TPC-H queries (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2467&quot;&gt;KYLIN-2467&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Spark cubing engine (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2331&quot;&gt;KYLIN-2331&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Job engine HA (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2006&quot;&gt;KYLIN-2006&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Percentile measure (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2396&quot;&gt;KYLIN-2396&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Cloud tested (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2351&quot;&gt;KYLIN-2351&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;支持雪花模型 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1875&quot;&gt;KYLIN-1875&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;支持 TPC-H 查询 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2467&quot;&gt;KYLIN-2467&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Spark 构建引擎 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2331&quot;&gt;KYLIN-2331&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Job Engine 高可用性 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2006&quot;&gt;KYLIN-2006&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Percentile 度量 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2396&quot;&gt;KYLIN-2396&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;在 Cloud 上通过测试 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2351&quot;&gt;KYLIN-2351&lt;/a&gt;)&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;You are very welcome to give the v2.0.0 beta a try, and please do 
send feedbacks to &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;非常欢迎大家下载并测试 v2.0.0 
beta。您的反馈对我们非常重要,请发邮件到 &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;。&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;h2 id=&quot;install&quot;&gt;Install&lt;/h2&gt;
+&lt;h2 id=&quot;section&quot;&gt;安装&lt;/h2&gt;
 
-&lt;p&gt;The v2.0.0 beta requires a refresh install at the moment. It cannot 
be upgraded from v1.6.0 due to the incompatible metadata. However the 
underlying cube is backward compatible. We are working on an upgrade tool to 
transform the metadata, so that a smooth upgrade will be possible.&lt;/p&gt;
+&lt;p&gt;暂时 v2.0.0 beta 无法从 v1.6.0 直接升级,必需全新安装
。这是由于新版本的元数据并不向前兼容。好在 Cube 
数据是向前兼容的,因此只需要开发一个元数据转换工å…
·ï¼Œå°±èƒ½åœ¨ä¸ä¹…
的将来实现平滑升级。我们正在为此努力。&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;h2 id=&quot;run-tpc-h-benchmark&quot;&gt;Run TPC-H Benchmark&lt;/h2&gt;
+&lt;h2 id=&quot;tpc-h-&quot;&gt;运行 TPC-H 基准测试&lt;/h2&gt;
 
-&lt;p&gt;Steps to run TPC-H benchmark on Apache Kylin can be found here: &lt;a 
href=&quot;https://github.com/Kyligence/kylin-tpch&quot;&gt;https://github.com/Kyligence/kylin-tpch&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;在 Apache Kylin 上运行 TPC-H 的具体步骤: &lt;a 
href=&quot;https://github.com/Kyligence/kylin-tpch&quot;&gt;https://github.com/Kyligence/kylin-tpch&lt;/a&gt;&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;h2 id=&quot;spark-cubing-engine&quot;&gt;Spark Cubing Engine&lt;/h2&gt;
+&lt;h2 id=&quot;spark-&quot;&gt;Spark 构建引擎&lt;/h2&gt;
 
-&lt;p&gt;Apache Kylin v2.0.0 introduced a new cubing engine based on Apache 
Spark that can be selected to replace the original MR engine. Initial tests 
showed that the spark engine could cut the build time to 50% in most 
cases.&lt;/p&gt;
+&lt;p&gt;Apache Kylin v2.0.0 引入了一个全新的基于 Apache Spark 
的构建引擎。它可用于替换原有的 MapReduce 
构建引擎。初步测试显示 Cube 的构建时间一般能缩短到原å…
ˆçš„ 50% 左右。&lt;/p&gt;
 
-&lt;p&gt;To enable the Spark cubing engine, check &lt;a 
href=&quot;/docs16/tutorial/cube_spark.html&quot;&gt;this 
tutorial&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;启用 Spark 构建引擎,请参考&lt;a 
href=&quot;/docs16/tutorial/cube_spark.html&quot;&gt;这篇文档&lt;/a&gt;.&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;p&gt;&lt;em&gt;Great thanks to everyone who 
contributed!&lt;/em&gt;&lt;/p&gt;
+&lt;p&gt;&lt;em&gt;感谢每一位朋友的参与和贡献!&lt;/em&gt;&lt;/p&gt;
 </description>
         <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate>
-        <link>http://kylin.apache.org/blog/2017/02/25/v2.0.0-beta-ready/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2017/02/25/v2.0.0-beta-ready/</guid>
+        
<link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid>
         
         
         <category>blog</category>
@@ -120,54 +281,54 @@
       </item>
     
       <item>
-        <title>Apache Kylin v2.0.0 beta 发布</title>
-        <description>&lt;p&gt;Apache Kylin社区非常高兴地宣布 &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;v2.0.0 beta 
package&lt;/a&gt; 已经可以下载并测试了。&lt;/p&gt;
+        <title>Apache Kylin v2.0.0 Beta Announcement</title>
+        <description>&lt;p&gt;The Apache Kylin community is pleased to 
announce the &lt;a href=&quot;http://kylin.apache.org/download/&quot;&gt;v2.0.0 
beta package&lt;/a&gt; is ready for download and test.&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;下载链接: &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;http://kylin.apache.org/cn/download/&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;源代码: 
https://github.com/apache/kylin/tree/kylin-2.0.0-beta&lt;/li&gt;
+  &lt;li&gt;Download link: &lt;a 
href=&quot;http://kylin.apache.org/download/&quot;&gt;http://kylin.apache.org/download/&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Source code: 
https://github.com/apache/kylin/tree/kylin-2.0.0-beta&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;自从v1.6.0版本发布已经2个多月了。这段时间里,整个社区协力开发完成了一系列重大的功能,希望能将Apache
 Kylin提升到一个新的高度。&lt;/p&gt;
+&lt;p&gt;It has been more than 2 month since the v1.6.0 release. The community 
has been working hard to deliver some long wanted features, hoping to move 
Apache Kylin to the next level.&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;支持雪花模型 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1875&quot;&gt;KYLIN-1875&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;支持 TPC-H 查询 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2467&quot;&gt;KYLIN-2467&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Spark 构建引擎 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2331&quot;&gt;KYLIN-2331&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Job Engine 高可用性 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2006&quot;&gt;KYLIN-2006&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Percentile 度量 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2396&quot;&gt;KYLIN-2396&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;在 Cloud 上通过测试 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2351&quot;&gt;KYLIN-2351&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Support snowflake data model (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1875&quot;&gt;KYLIN-1875&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Support TPC-H queries (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2467&quot;&gt;KYLIN-2467&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Spark cubing engine (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2331&quot;&gt;KYLIN-2331&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Job engine HA (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2006&quot;&gt;KYLIN-2006&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Percentile measure (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2396&quot;&gt;KYLIN-2396&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Cloud tested (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2351&quot;&gt;KYLIN-2351&lt;/a&gt;)&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;非常欢迎大家下载并测试 v2.0.0 
beta。您的反馈对我们非常重要,请发邮件到 &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;。&lt;/p&gt;
+&lt;p&gt;You are very welcome to give the v2.0.0 beta a try, and please do 
send feedbacks to &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;.&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;h2 id=&quot;section&quot;&gt;安装&lt;/h2&gt;
+&lt;h2 id=&quot;install&quot;&gt;Install&lt;/h2&gt;
 
-&lt;p&gt;暂时 v2.0.0 beta 无法从 v1.6.0 直接升级,必需全新安装
。这是由于新版本的元数据并不向前兼容。好在 Cube 
数据是向前兼容的,因此只需要开发一个元数据转换工å…
·ï¼Œå°±èƒ½åœ¨ä¸ä¹…
的将来实现平滑升级。我们正在为此努力。&lt;/p&gt;
+&lt;p&gt;The v2.0.0 beta requires a refresh install at the moment. It cannot 
be upgraded from v1.6.0 due to the incompatible metadata. However the 
underlying cube is backward compatible. We are working on an upgrade tool to 
transform the metadata, so that a smooth upgrade will be possible.&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;h2 id=&quot;tpc-h-&quot;&gt;运行 TPC-H 基准测试&lt;/h2&gt;
+&lt;h2 id=&quot;run-tpc-h-benchmark&quot;&gt;Run TPC-H Benchmark&lt;/h2&gt;
 
-&lt;p&gt;在 Apache Kylin 上运行 TPC-H 的具体步骤: &lt;a 
href=&quot;https://github.com/Kyligence/kylin-tpch&quot;&gt;https://github.com/Kyligence/kylin-tpch&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;Steps to run TPC-H benchmark on Apache Kylin can be found here: &lt;a 
href=&quot;https://github.com/Kyligence/kylin-tpch&quot;&gt;https://github.com/Kyligence/kylin-tpch&lt;/a&gt;&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;h2 id=&quot;spark-&quot;&gt;Spark 构建引擎&lt;/h2&gt;
+&lt;h2 id=&quot;spark-cubing-engine&quot;&gt;Spark Cubing Engine&lt;/h2&gt;
 
-&lt;p&gt;Apache Kylin v2.0.0 引入了一个全新的基于 Apache Spark 
的构建引擎。它可用于替换原有的 MapReduce 
构建引擎。初步测试显示 Cube 的构建时间一般能缩短到原å…
ˆçš„ 50% 左右。&lt;/p&gt;
+&lt;p&gt;Apache Kylin v2.0.0 introduced a new cubing engine based on Apache 
Spark that can be selected to replace the original MR engine. Initial tests 
showed that the spark engine could cut the build time to 50% in most 
cases.&lt;/p&gt;
 
-&lt;p&gt;启用 Spark 构建引擎,请参考&lt;a 
href=&quot;/docs16/tutorial/cube_spark.html&quot;&gt;这篇文档&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;To enable the Spark cubing engine, check &lt;a 
href=&quot;/docs16/tutorial/cube_spark.html&quot;&gt;this 
tutorial&lt;/a&gt;.&lt;/p&gt;
 
 &lt;hr /&gt;
 
-&lt;p&gt;&lt;em&gt;感谢每一位朋友的参与和贡献!&lt;/em&gt;&lt;/p&gt;
+&lt;p&gt;&lt;em&gt;Great thanks to everyone who 
contributed!&lt;/em&gt;&lt;/p&gt;
 </description>
         <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate>
-        
<link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid>
+        <link>http://kylin.apache.org/blog/2017/02/25/v2.0.0-beta-ready/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2017/02/25/v2.0.0-beta-ready/</guid>
         
         
         <category>blog</category>
@@ -590,56 +751,6 @@ group by grouping sets((dim1, dim2), (di
         
         
         <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>Query Metrics in Apache Kylin</title>
-        <description>&lt;p&gt;Apache Kylin support query metrics since 1.5.4. 
This blog will introduce why Kylin need query metrics, the concrete contents 
and meaning of query metrics, the daily function of query metrics and how to 
collect query metrics.&lt;/p&gt;
-
-&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
-&lt;p&gt;When Kylin become an enterprise application, you must ensure Kylin 
query service is high availability and high performance, besides, you need to 
provide commitment of the SLA of query service to users, Which need Kylin to 
support query metrics.&lt;/p&gt;
-
-&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
-&lt;p&gt;The query metrics have Server, Project, Cube three levels.&lt;/p&gt;
-
-&lt;p&gt;For example, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryCount&lt;/code&gt; will have three 
kinds of metrics:&lt;br /&gt;
-```&lt;br /&gt;
-Hadoop:name=Server_Total,service=Kylin.QueryCount&lt;br /&gt;
-Hadoop:name=learn_kylin,service=Kylin.QueryCount&lt;br /&gt;
-Hadoop:name=learn_kylin,service=Kylin,sub=kylin_sales_cube.QueryCount&lt;/p&gt;
-
-&lt;p&gt;Server_Total is represent for a query server node,&lt;br /&gt;
-learn_kylin is a project name,&lt;br /&gt;
-kylin_sales_cube is a cube name.&lt;br /&gt;
-```&lt;br /&gt;
-### The Key Query Metrics&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryCount&lt;/code&gt;: the total of 
query count.&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryFailCount&lt;/code&gt;: the total 
of failed query count.&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;QuerySuccessCount&lt;/code&gt;: the 
total of successful query count.&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;CacheHitCount&lt;/code&gt;: the total of 
query cache hit count.&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryLatency60s99thPercentile&lt;/code&gt;:
 the 99th percentile of query latency in the 60s.(there are 99th, 95th, 90th, 
75th, 50th five percentiles and 60s, 360s, 3600s three time intervals in Kylin 
query metrics. the time intervals could set by &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin.query.metrics.percentiles.intervals&lt;/code&gt;,
 which default value is &lt;code class=&quot;highlighter-rouge&quot;&gt;60, 
360, 3600&lt;/code&gt;)&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryLatencyAvgTime&lt;/code&gt;,&lt;code
 
class=&quot;highlighter-rouge&quot;&gt;QueryLatencyIMaxTime&lt;/code&gt;,&lt;code
 class=&quot;highlighter-rouge&quot;&gt;QueryLatencyIMinTime&lt;/code&gt;: the 
average, max, min of query latency.&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;ScanRowCount&lt;/code&gt;: the rows 
count of scan HBase, it’s like &lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryLatency&lt;/code&gt;.&lt;/li&gt;
-  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;ResultRowCount&lt;/code&gt;: the result 
count of query, it’s like &lt;code 
class=&quot;highlighter-rouge&quot;&gt;QueryLatency&lt;/code&gt;.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;daily-function&quot;&gt;Daily Function&lt;/h2&gt;
-&lt;p&gt;Besides providing SLA of query service to users, in the daily 
operation and maintenance, you could make Kylin query daily and Kylin query 
dashboard by query metrics. Which will help you know the rules, performance of 
Kylin query and analyze the Kylin query accident case.&lt;/p&gt;
-
-&lt;h2 id=&quot;how-to-use&quot;&gt;How To Use&lt;/h2&gt;
-&lt;p&gt;Firstly, you should set config &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin.query.metrics.enabled&lt;/code&gt; 
as true to collect query metrics to JMX.&lt;/p&gt;
-
-&lt;p&gt;Secondly, you could use arbitrary JMX collection tool to collect the 
query metrics to your monitor system. Notice that, The query metrics have 
Server, Project, Cube three levels,  which was implemented by dynamic &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ObjectName&lt;/code&gt;, so you should 
get &lt;code class=&quot;highlighter-rouge&quot;&gt;ObjectName&lt;/code&gt; by 
regular expression.&lt;/p&gt;
-</description>
-        <pubDate>Sat, 27 Aug 2016 10:30:00 -0700</pubDate>
-        
<link>http://kylin.apache.org/blog/2016/08/27/query-metrics-in-kylin/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2016/08/27/query-metrics-in-kylin/</guid>
-        
-        
-        <category>blog</category>
         
       </item>
     


Reply via email to