Author: lidong
Date: Sun Jan  5 14:10:51 2020
New Revision: 1872352

URL: http://svn.apache.org/viewvc?rev=1872352&view=rev
Log:
Modify AWS EMR doc and AWS Glue doc

Modified:
    kylin/site/cn/docs31/install/kylin_aws_emr.html
    kylin/site/docs31/install/kylin_aws_emr.html
    kylin/site/feed.xml

Modified: kylin/site/cn/docs31/install/kylin_aws_emr.html
URL: 
http://svn.apache.org/viewvc/kylin/site/cn/docs31/install/kylin_aws_emr.html?rev=1872352&r1=1872351&r2=1872352&view=diff
==============================================================================
--- kylin/site/cn/docs31/install/kylin_aws_emr.html (original)
+++ kylin/site/cn/docs31/install/kylin_aws_emr.html Sun Jan  5 14:10:51 2020
@@ -182,8 +182,8 @@ var _hmt = _hmt || [];
 
 <h3 id="section">推荐版本</h3>
 <ul>
-  <li>AWS EMR 5.7 (EMR 5.8 及以上,请查看 <a 
href="https://issues.apache.org/jira/browse/KYLIN-3129";>KYLIN-3129</a>)</li>
-  <li>Apache Kylin v2.2.0 or above for HBase 1.x</li>
+  <li>AWS EMR 5.27</li>
+  <li>Apache Kylin v3.0.0 or above for HBase 1.x</li>
 </ul>
 
 <h3 id="emr-">启动 EMR 集群</h3>
@@ -194,23 +194,32 @@ var _hmt = _hmt || [];
 
 <p>如果您使用 S3 作为 HBase 的存储,您需要自定义配置为 
<code class="highlighter-rouge">hbase.rpc.timeout</code>,由于 S3 
的大容量负载是一个复制操作,当数据规模比较大时,HBase 
Region 服务器比在 HDFS 上将花费更多的时间等待其完成。</p>
 
-<div class="highlighter-rouge"><pre class="highlight"><code>[  {
-    "Classification": "hbase-site",
-    "Properties": {
-      "hbase.rpc.timeout": "3600000",
-      "hbase.rootdir": "s3://yourbucket/EMRROOT"
-    }
-  },
-  {
-    "Classification": "hbase",
-    "Properties": {
-      "hbase.emr.storageMode": "s3"
-    }
-  }
-]
+<p>如果您希望EMR的Hive使用一个外部的å…
ƒæ•°æ®ï¼Œæ‚¨å¯ä»¥è€ƒè™‘使用RDS或者AWS Glue。那æ 
·æ‚¨å°±å¯ä»¥åœ¨äº‘上环境构建一个stateless的OLAP服务了。</p>
+
+<p>让我们通过AWS CLI创建一个EMR 
集群,并且开启(当然以下几项是可选的)<br />
+1. S3作为HBase数据存储<br />
+2. AWS Glue作为Hive元数据<br />
+3. 开启S3元数据一致性以防止数据文件丢失</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr 
create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase 
Name=Spark Name=Sqoop Name=Tez  Name=ZooKeeper \
+       --release-label emr-5.28.0 \
+       --instance-groups 
'[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Worker
 
Node"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master
 Node"}]' \
+       --configurations 
'[{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://{S3_BUCKET}/hbase/data","hbase.rpc.timeout":
 
"3600000"}},{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'
 \
+       --name 'Kylin3.0Cluster_Original' \
+       --emrfs Consistent=true \
+       --region cn-northwest-1
 </code></pre>
 </div>
 
+<h3 id="aws-gluehive">支持AWS Glue作为Hive元数据存储</h3>
+
+<p>如果你需要开启Glue作为Hive元数据, 请参考<code 
class="highlighter-rouge">https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore</code>
 来进行打包。你需要获取以下jar:</p>
+
+<ol>
+  <li>aws-glue-datacatalog-client-common-xxx.jar</li>
+  <li>aws-glue-datacatalog-hive2-client-xxx.jar</li>
+</ol>
+
 <h3 id="kylin">安装 Kylin</h3>
 
 <p>当 EMR 集群处于 “Waiting” 状态,您可以 SSH 到 master 
节点,下载 Kylin 然后解压 tar 包:</p>
@@ -218,8 +227,8 @@ var _hmt = _hmt || [];
 <div class="highlighter-rouge"><pre class="highlight"><code>sudo mkdir 
/usr/local/kylin
 sudo chown hadoop /usr/local/kylin
 <span class="nb">cd</span> /usr/local/kylin
-wget 
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
-tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
+wget 
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-hbase1x.tar.gz
+tar -zxvf apache-kylin-3.0.0-bin-hbase1x.tar.gz
 </code></pre>
 </div>
 
@@ -314,35 +323,88 @@ tar -zxvf apache-kylin-2.5.0-bin-hbase1x
 </code></pre>
 </div>
 
-<h3 id="kylin-2">启动 Kylin</h3>
+<h3 id="section-1">解决包冲突</h3>
 
-<p>启动和在普通 Hadoop 上一样:</p>
+<ul>
+  <li>将以下内容添加到 ~/.bashrc</li>
+</ul>
 
-<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">KYLIN_HOME</span><span 
class="o">=</span>/usr/local/kylin/apache-kylin-2.2.0-bin
-<span class="nv">$KYLIN_HOME</span>/bin/sample.sh
-<span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start
+<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">HIVE_HOME</span><span 
class="o">=</span>/usr/lib/hive
+<span class="nb">export </span><span class="nv">HADOOP_HOME</span><span 
class="o">=</span>/usr/lib/hadoop
+<span class="nb">export </span><span class="nv">HBASE_HOME</span><span 
class="o">=</span>/usr/lib/hbase
+<span class="nb">export </span><span class="nv">SPARK_HOME</span><span 
class="o">=</span>/usr/lib/spark
+
+<span class="nb">export </span><span class="nv">KYLIN_HOME</span><span 
class="o">=</span>/home/ec2-user/apache-kylin-3.0.0-SNAPSHOT-bin
+<span class="nb">export </span><span class="nv">HCAT_HOME</span><span 
class="o">=</span>/usr/lib/hive-hcatalog
+<span class="nb">export </span><span class="nv">KYLIN_CONF_HOME</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/conf
+<span class="nb">export </span><span class="nv">tomcat_root</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/tomcat
+<span class="nb">export </span><span class="nv">hive_dependency</span><span 
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span 
class="nv">$HIVE_HOME</span>/lib/:<span 
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:<span 
class="nv">$SPARK_HOME</span>/jars/
+<span class="nb">export </span><span class="nv">PATH</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/bin:<span 
class="nv">$PATH</span>
+
+<span class="nb">export </span><span class="nv">hive_dependency</span><span 
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span 
class="nv">$HIVE_HOME</span>/lib/<span class="k">*</span>:<span 
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:/usr/share/aws/hmclient/lib/<span
 class="k">*</span>:<span class="nv">$SPARK_HOME</span>/jars/<span 
class="k">*</span>:<span class="nv">$HBASE_HOME</span>/lib/<span 
class="k">*</span>.jar:<span class="nv">$HBASE_HOME</span>/<span 
class="k">*</span>.jar
 </code></pre>
 </div>
 
-<p>别忘记在 EMR master - “ElasticMapReduce-master” 的安å…
¨ç»„中启用 7070 端口访问,或使用 SSH 连接 master 
节点,然后您可以使用 <code 
class="highlighter-rouge">http://&lt;master-dns&gt;:7070/kylin</code> 访问 
Kylin Web GUI。</p>
+<ul>
+  <li>暂时删除 joda.jar</li>
+</ul>
 
-<p>Build 同一个 Cube,当 Cube 准备好后运行查询。您可以浏览 
S3 查看数据是否安全的持久化了。</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>mv <span 
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar <span 
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar.backup
+</code></pre>
+</div>
+
+<ul>
+  <li>修改 bin/kylin.sh</li>
+</ul>
+
+<p>将以下内容添加到 bin/kylin.sh的 开始</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">HBASE_CLASSPATH_PREFIX</span><span 
class="o">=</span><span class="k">${</span><span 
class="nv">tomcat_root</span><span class="k">}</span>/bin/bootstrap.jar:<span 
class="k">${</span><span class="nv">tomcat_root</span><span 
class="k">}</span>/bin/tomcat-juli.jar:<span class="k">${</span><span 
class="nv">tomcat_root</span><span class="k">}</span>/lib/<span 
class="k">*</span>:<span class="nv">$hive_dependency</span>:<span 
class="nv">$HBASE_CLASSPATH_PREFIX</span>
+</code></pre>
+</div>
+
+<h3 id="gluehive">开启支持Glue作为Hive数据源(可选的)</h3>
+<ul>
+  <li>把<code 
class="highlighter-rouge">aws-glue-datacatalog-client-common-xxx.jar</code>和<code
 
class="highlighter-rouge">aws-glue-datacatalog-hive2-client-xxx.jar</code>放到
 <code class="highlighter-rouge">$KYLIN_HOME/lib</code>目录下</li>
+  <li>在<code class="highlighter-rouge">kylin.properties</code>中修改<code 
class="highlighter-rouge">kylin.source.hive.metadata-type=gluecatalog</code></li>
+</ul>
+
+<h3 id="spark">配置 Spark</h3>
 
-<h3 id="spark-">Spark 配置</h3>
+<ul>
+  <li>对Spark进行打包</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>rm -rf <span 
class="nv">$KYLIN_HOME</span>/spark_jars
+mkdir <span class="nv">$KYLIN_HOME</span>/spark_jars
+cp /usr/lib/spark/jars/<span class="k">*</span>.jar <span 
class="nv">$KYLIN_HOME</span>/spark_jars
+cp -f /usr/lib/hbase/lib/<span class="k">*</span>.jar <span 
class="nv">$KYLIN_HOME</span>/spark_jars
+
+rm -f netty-3.9.9.Final.jar 
+rm -f netty-all-4.1.8.Final.jar
+
+
jar cv0f spark-libs.jar -C <span class="nv">$KYLIN_HOME</span>/spark_jars .
+aws s3 cp spark-libs.jar s3://<span class="o">{</span>YOUR_BUCKET<span 
class="o">}</span>/kylin/package/  <span class="c"># You choose s3 as your 
working-dir</span>
+hadoop fs -put spark-libs.jar hdfs://kylin/package/  <span class="c"># You 
choose hdfs as your working-dir</span>
+</code></pre>
+</div>
 
-<p>EMR 的 Spark 版本很可能与 Kylin 编译的版本不一致,因
此您通常不能直接使用 EMR 打包的 Spark 用于 Kylin 的任务。 
您需要在启动 Kylin 之前,将 “SPARK_HOME” 环境变量设置指向 
Kylin 的 Spark 子目录 (KYLIN_HOME/spark) 。此外,为了从 Spark 
中访问 S3 或 EMRFS 上的文件,您需要将 EMR 的扩展类从 EMR 
的目录拷贝到 Kylin 的 Spark 下。</p>
+<ul>
+  <li>在 <code class="highlighter-rouge">kylin.properties</code>设置<code 
class="highlighter-rouge">kylin.engine.spark-conf.spark.yarn.archive=PATH_TO_SPARK_LIB</code></li>
+</ul>
 
-<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">SPARK_HOME</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/spark
+<h3 id="kylin-2">启动 Kylin</h3>
 
-cp /usr/lib/hadoop-lzo/lib/<span class="k">*</span>.jar <span 
class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-<span 
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/lib/hadoop/hadoop-common<span class="k">*</span>-amzn-<span 
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
+<p>启动和在普通 Hadoop 上一样:</p>
 
+<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nv">$KYLIN_HOME</span>/bin/sample.sh
 <span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start
 </code></pre>
 </div>
 
-<p>您也可以参考 EMR Spark 的 spark-defaults 来设置 Kylin 的 Spark é…
ç½®ï¼Œä»¥èŽ·å¾—更好的对集群资源的适配。</p>
+<p>别忘记在 EMR master - “ElasticMapReduce-master” 的安å…
¨ç»„中启用 7070 端口访问,或使用 SSH 连接 master 
节点,然后您可以使用 <code 
class="highlighter-rouge">http://&lt;master-dns&gt;:7070/kylin</code> 访问 
Kylin Web GUI。</p>
+
+<p>Build 同一个 Cube,当 Cube 准备好后运行查询。您可以浏览 
S3 查看数据是否安全的持久化了。</p>
 
 <h3 id="emr--1">关闭 EMR 集群</h3>
 
@@ -356,10 +418,21 @@ cp /usr/lib/hadoop/hadoop-common<span cl
 
 <p>为了用同样的 Hbase 数据重启一个集群,可在 AWS Management 
Console 中指定和之前集群相同的 Amazon S3 位置或使用 <code 
class="highlighter-rouge">hbase.rootdir</code> 配置属性。更多的 EMR 
HBase 信息,参考 <a 
href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html";>HBase
 on Amazon S3</a></p>
 
-<h2 id="ec2--kylin">在专用的 EC2 上部署 Kylin</h2>
+<h3 id="ec2--kylin">在专用的 EC2 上部署 Kylin</h3>
 
 <p>推荐在专门的 client 节点上运行 Kylin (而不是 master,core 
或 task)。启动一个和您 EMR 有同样 VPC 与子网的独立 EC2 
实例,从 master 节点复制 Hadoop clients 到该实例,然后在å…
¶ä¸­å®‰è£… Kylin。这可提升 Kylin 自身与 master 
节点中服务的稳定性。</p>
 
+<h3 id="section-2">其他问题</h3>
+
+<p>如果将S3配置为您的working-dir,并且发现了”Wrong 
FS”异常,请尝试修改 <code 
class="highlighter-rouge">$KYLIN_HOME/conf/kylin_hive_conf.xml</code>,<code 
class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>,<code 
class="highlighter-rouge">/etc/hadoop/conf/core-site.xml</code>。</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>  <span 
class="nt">&lt;property&gt;</span>
+    <span class="nt">&lt;name&gt;</span>fs.defaultFS<span 
class="nt">&lt;/name&gt;</span>
+    <span class="nt">&lt;value&gt;</span>s3://{YOUR_BUCKET}<span 
class="nt">&lt;/value&gt;</span>
+    <span 
class="c">&lt;!--&lt;value&gt;hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020&lt;/value&gt;--&gt;</span>
+  <span class="nt">&lt;/property&gt;</span>
+</code></pre>
+</div>
 
                                                        </article>
                                                </div>

Modified: kylin/site/docs31/install/kylin_aws_emr.html
URL: 
http://svn.apache.org/viewvc/kylin/site/docs31/install/kylin_aws_emr.html?rev=1872352&r1=1872351&r2=1872352&view=diff
==============================================================================
--- kylin/site/docs31/install/kylin_aws_emr.html (original)
+++ kylin/site/docs31/install/kylin_aws_emr.html Sun Jan  5 14:10:51 2020
@@ -7050,35 +7050,42 @@ var _hmt = _hmt || [];
 <h3 id="recommended-version">Recommended Version</h3>
 
 <ul>
-  <li>AWS EMR 5.7 (for EMR 5.8 and above, please refer to <a 
href="https://issues.apache.org/jira/browse/KYLIN-3129";>KYLIN-3129</a>)</li>
-  <li>Apache Kylin v2.2.0 or above for HBase 1.x</li>
+  <li>AWS EMR 5.27 or later</li>
+  <li>Apache Kylin v3.0.0 or above for HBase 1.x</li>
 </ul>
 
 <h3 id="start-emr-cluster">Start EMR cluster</h3>
 
 <p>Launch an EMR cluster with AWS web console, command line or API. Select 
<em>HBase</em> in the applications as Kylin need HBase service.</p>
 
-<p>You can select “HDFS” or “S3” as the storage for HBase, depending 
on whether you need Cube data be persisted after shutting down the cluster. EMR 
HDFS uses the local disk of EC2 instances, which will erase the data when 
cluster is stopped, then Kylin metadata and Cube data can be lost.</p>
+<p>You can choose “HDFS” or “S3” as the storage for HBase, depending 
on whether you need Cube data be persisted after shutting down the cluster. EMR 
HDFS uses the local disk of EC2 instances, which will erase the data when 
cluster is stopped, then Kylin metadata and Cube data will be lost.<br />
+If you use S3 as HBase’s storage, you need customize its configuration for 
<code class="highlighter-rouge">hbase.rpc.timeout</code>, because the bulk load 
to S3 is a copy operation, when data size is huge, HBase region server need 
wait much longer to finish than on HDFS.<br />
+If you want your metadata of Hive is persisted outside of EMR cluster, you can 
choose AWS Glue or RDS of the metadata of Hive. Thus you can build a state-less 
OLAP service by Kylin in cloud.</p>
 
-<p>If you use S3 as HBase’s storage, you need customize its configuration 
for <code class="highlighter-rouge">hbase.rpc.timeout</code>, because the bulk 
load to S3 is a copy operation, when data size is huge, HBase region server 
need wait much longer to finish than on HDFS.</p>
+<p>Let create a demo EMR cluster via AWS CLI,with <br />
+1. S3 as HBase storage (optional)<br />
+2. Glue as Hive Metadata (optional)<br />
+3. Enable consist metadata of S3 to make sure data wouldn’t lose 
(optional)</p>
 
-<div class="highlighter-rouge"><pre class="highlight"><code>[  {
-    "Classification": "hbase-site",
-    "Properties": {
-      "hbase.rpc.timeout": "3600000",
-      "hbase.rootdir": "s3://yourbucket/EMRROOT"
-    }
-  },
-  {
-    "Classification": "hbase",
-    "Properties": {
-      "hbase.emr.storageMode": "s3"
-    }
-  }
-]
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr 
create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase 
Name=Spark Name=Sqoop Name=Tez  Name=ZooKeeper \
+       --release-label emr-5.28.0 \
+       --instance-groups 
'[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Worker
 
Node"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master
 Node"}]' \
+       --configurations 
'[{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://{S3_BUCKET}/hbase/data","hbase.rpc.timeout":
 
"3600000"}},{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'
 \
+       --name 'Kylin3.0Cluster_Original' \
+       --emrfs Consistent=true \
+       --region cn-northwest-1
 </code></pre>
 </div>
 
+<h3 id="support-glue-as-metadata-of-hive">Support Glue as metadata of Hive</h3>
+
+<p>If you want to enable support read metadata from Glue, please refer to 
<code 
class="highlighter-rouge">https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore</code>
 and build two jars.</p>
+
+<ol>
+  <li>aws-glue-datacatalog-client-common-xxx.jar</li>
+  <li>aws-glue-datacatalog-hive2-client-xxx.jar</li>
+</ol>
+
 <h3 id="install-kylin">Install Kylin</h3>
 
 <p>When EMR cluster is in “Waiting” status, you can SSH into its master  
node, download Kylin and then uncompress the tar-ball file:</p>
@@ -7086,8 +7093,8 @@ var _hmt = _hmt || [];
 <div class="highlighter-rouge"><pre class="highlight"><code>sudo mkdir 
/usr/local/kylin
 sudo chown hadoop /usr/local/kylin
 <span class="nb">cd</span> /usr/local/kylin
-wget 
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
-tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
+wget 
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-hbase1x.tar.gz
+tar -zxvf apache-kylin-3.0.0-bin-hbase1x.tar.gz
 </code></pre>
 </div>
 
@@ -7181,35 +7188,85 @@ tar -zxvf apache-kylin-2.5.0-bin-hbase1x
 </code></pre>
 </div>
 
-<h3 id="start-kylin">Start Kylin</h3>
+<h3 id="solve-jar-conflict">Solve jar conflict</h3>
+<ul>
+  <li>Add following env variable in ~/.bashrc</li>
+</ul>
 
-<p>The start is the same as on normal Hadoop:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">HIVE_HOME</span><span 
class="o">=</span>/usr/lib/hive
+<span class="nb">export </span><span class="nv">HADOOP_HOME</span><span 
class="o">=</span>/usr/lib/hadoop
+<span class="nb">export </span><span class="nv">HBASE_HOME</span><span 
class="o">=</span>/usr/lib/hbase
+<span class="nb">export </span><span class="nv">SPARK_HOME</span><span 
class="o">=</span>/usr/lib/spark
+
+<span class="nb">export </span><span class="nv">KYLIN_HOME</span><span 
class="o">=</span>/home/ec2-user/apache-kylin-3.0.0-SNAPSHOT-bin
+<span class="nb">export </span><span class="nv">HCAT_HOME</span><span 
class="o">=</span>/usr/lib/hive-hcatalog
+<span class="nb">export </span><span class="nv">KYLIN_CONF_HOME</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/conf
+<span class="nb">export </span><span class="nv">tomcat_root</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/tomcat
+<span class="nb">export </span><span class="nv">hive_dependency</span><span 
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span 
class="nv">$HIVE_HOME</span>/lib/:<span 
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:<span 
class="nv">$SPARK_HOME</span>/jars/
+<span class="nb">export </span><span class="nv">PATH</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/bin:<span 
class="nv">$PATH</span>
 
-<div class="highlighter-rouge"><pre class="highlight"><code>export 
KYLIN_HOME=/usr/local/kylin/apache-kylin-2.2.0-bin
-$KYLIN_HOME/bin/sample.sh
-$KYLIN_HOME/bin/kylin.sh start
+<span class="nb">export </span><span class="nv">hive_dependency</span><span 
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span 
class="nv">$HIVE_HOME</span>/lib/<span class="k">*</span>:<span 
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:/usr/share/aws/hmclient/lib/<span
 class="k">*</span>:<span class="nv">$SPARK_HOME</span>/jars/<span 
class="k">*</span>:<span class="nv">$HBASE_HOME</span>/lib/<span 
class="k">*</span>.jar:<span class="nv">$HBASE_HOME</span>/<span 
class="k">*</span>.jar
 </code></pre>
 </div>
 
-<p>Don’t forget to enable the 7070 port access in the security group for EMR 
master - “ElasticMapReduce-master”, or with SSH tunnel to the master node, 
then you can access Kylin Web GUI at http://&lt;master-dns&gt;:7070/kylin</p>
+<ul>
+  <li>Remove joda.jar</li>
+</ul>
 
-<p>Build the sample Cube, and then run queries when the Cube is ready. You can 
browse S3 to see whether the data is safely persisted.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>mv <span 
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar <span 
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar.backup
+</code></pre>
+</div>
+
+<ul>
+  <li>Modify bin/kylin.sh<br />
+Add following content on the top of bin/kylin.sh</li>
+</ul>
 
-<h3 id="spark-configuration">Spark Configuration</h3>
+<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">HBASE_CLASSPATH_PREFIX</span><span 
class="o">=</span><span class="k">${</span><span 
class="nv">tomcat_root</span><span class="k">}</span>/bin/bootstrap.jar:<span 
class="k">${</span><span class="nv">tomcat_root</span><span 
class="k">}</span>/bin/tomcat-juli.jar:<span class="k">${</span><span 
class="nv">tomcat_root</span><span class="k">}</span>/lib/<span 
class="k">*</span>:<span class="nv">$hive_dependency</span>:<span 
class="nv">$HBASE_CLASSPATH_PREFIX</span>
+</code></pre>
+</div>
 
-<p>EMR’s Spark version may be incompatible with Kylin, so you couldn’t 
directly use EMR’s Spark. You need to set “SPARK_HOME” environment 
variable to Kylin’s Spark folder (KYLIN_HOME/spark) before start Kylin. To 
access files on S3 or EMRFS, we need to copy EMR’s implementation jars to 
Spark.</p>
+<h3 id="enable-glue-as-metadata-for-hiveoptional">Enable glue as metadata for 
Hive(Optional)</h3>
+<ol>
+  <li>Put <code 
class="highlighter-rouge">aws-glue-datacatalog-client-common-xxx.jar</code> and 
<code 
class="highlighter-rouge">aws-glue-datacatalog-hive2-client-xxx.jar</code> 
under $KYLIN_HOME/lib.</li>
+  <li>Set <code 
class="highlighter-rouge">kylin.source.hive.metadata-type=gluecatalog</code> in 
<code class="highlighter-rouge">kylin.properties</code></li>
+</ol>
 
-<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nb">export </span><span class="nv">SPARK_HOME</span><span 
class="o">=</span><span class="nv">$KYLIN_HOME</span>/spark
+<h3 id="configure-spark">Configure Spark</h3>
 
-cp /usr/lib/hadoop-lzo/lib/<span class="k">*</span>.jar <span 
class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-<span 
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/lib/hadoop/hadoop-common<span class="k">*</span>-amzn-<span 
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
+<ul>
+  <li>Build a Spark’s flat jar</li>
+</ul>
 
+<div class="highlighter-rouge"><pre class="highlight"><code>rm -rf <span 
class="nv">$KYLIN_HOME</span>/spark_jars
+mkdir <span class="nv">$KYLIN_HOME</span>/spark_jars
+cp /usr/lib/spark/jars/<span class="k">*</span>.jar <span 
class="nv">$KYLIN_HOME</span>/spark_jars
+cp -f /usr/lib/hbase/lib/<span class="k">*</span>.jar <span 
class="nv">$KYLIN_HOME</span>/spark_jars
+
+rm -f netty-3.9.9.Final.jar 
+rm -f netty-all-4.1.8.Final.jar
+
+
jar cv0f spark-libs.jar -C <span class="nv">$KYLIN_HOME</span>/spark_jars .
+aws s3 cp spark-libs.jar s3://<span class="o">{</span>YOUR_BUCKET<span 
class="o">}</span>/kylin/package/  <span class="c"># You choose s3 as your 
working-dir</span>
+hadoop fs -put spark-libs.jar hdfs://kylin/package/  <span class="c"># You 
choose hdfs as your working-dir</span>
+</code></pre>
+</div>
+<ul>
+  <li>Set <code 
class="highlighter-rouge">kylin.engine.spark-conf.spark.yarn.archive=PATH_TO_SPARK_LIB</code>
 in <code class="highlighter-rouge">kylin.properties</code></li>
+</ul>
+
+<h3 id="start-kylin">Start Kylin</h3>
+
+<p>The start is the same as on normal Hadoop:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span 
class="nv">$KYLIN_HOME</span>/bin/sample.sh
 <span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start
 </code></pre>
 </div>
 
-<p>You can also copy EMR’s spark-defaults configuration to Kylin’s spark 
for a better utilization of the cluster resources.</p>
+<p>Don’t forget to enable the 7070 port access in the security group for EMR 
master - “ElasticMapReduce-master”, or with SSH tunnel to the master node, 
then you can access Kylin Web GUI at http://&lt;master-dns&gt;:7070/kylin</p>
+
+<p>Build the sample Cube, and then run queries when the Cube is ready. You can 
browse S3 to see whether the data is safely persisted.</p>
 
 <h3 id="shut-down-emr-cluster">Shut down EMR Cluster</h3>
 
@@ -7223,10 +7280,23 @@ cp /usr/lib/hadoop/hadoop-common<span cl
 
 <p>To restart a cluster with the same HBase data, specify the same Amazon S3 
location as the previous cluster either in the AWS Management Console or using 
the “hbase.rootdir” configuration property. For more information about EMR 
HBase, refer to <a 
href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html";>HBase
 on Amazon S3</a></p>
 
-<h2 id="deploy-kylin-in-a-dedicated-ec2">Deploy Kylin in a dedicated EC2</h2>
+<h3 id="deploy-kylin-in-a-dedicated-ec2">Deploy Kylin in a dedicated EC2</h3>
 
 <p>Running Kylin in a dedicated client node (not master, core or task) is 
recommended. You can start a separate EC2 instance within the same VPC and 
subnet as your EMR, copy the Hadoop clients from master node to it, and then 
install Kylin in it. This can improve the stability of services in master node 
as well as Kylin itself.</p>
 
+<h3 id="trouble-shotting">Trouble shotting</h3>
+
+<ul>
+  <li>If you set S3 as your working dir and find some “Wrong FS” exception 
in kylin.log(if you enable shrunken dictionary), please try to modify 
$KYLIN_HOME/conf/kylin_hive_conf.xml, /etc/hive/conf/hive-site.xml, 
/etc/hadoop/conf/core-site.xml.</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>  <span 
class="nt">&lt;property&gt;</span>
+    <span class="nt">&lt;name&gt;</span>fs.defaultFS<span 
class="nt">&lt;/name&gt;</span>
+    <span class="nt">&lt;value&gt;</span>s3://{YOUR_BUCKET}<span 
class="nt">&lt;/value&gt;</span>
+    <span 
class="c">&lt;!--&lt;value&gt;hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020&lt;/value&gt;--&gt;</span>
+  <span class="nt">&lt;/property&gt;</span>
+</code></pre>
+</div>
 
                                                        </article>
                                                </div>

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1872352&r1=1872351&r2=1872352&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Sun Jan  5 14:10:51 2020
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Fri, 03 Jan 2020 05:59:19 -0800</pubDate>
-    <lastBuildDate>Fri, 03 Jan 2020 05:59:19 -0800</lastBuildDate>
+    <pubDate>Sun, 05 Jan 2020 05:59:15 -0800</pubDate>
+    <lastBuildDate>Sun, 05 Jan 2020 05:59:15 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>


Reply via email to