Author: lidong Date: Sun Jan 5 14:10:51 2020 New Revision: 1872352 URL: http://svn.apache.org/viewvc?rev=1872352&view=rev Log: Modify AWS EMR doc and AWS Glue doc
Modified: kylin/site/cn/docs31/install/kylin_aws_emr.html kylin/site/docs31/install/kylin_aws_emr.html kylin/site/feed.xml Modified: kylin/site/cn/docs31/install/kylin_aws_emr.html URL: http://svn.apache.org/viewvc/kylin/site/cn/docs31/install/kylin_aws_emr.html?rev=1872352&r1=1872351&r2=1872352&view=diff ============================================================================== --- kylin/site/cn/docs31/install/kylin_aws_emr.html (original) +++ kylin/site/cn/docs31/install/kylin_aws_emr.html Sun Jan 5 14:10:51 2020 @@ -182,8 +182,8 @@ var _hmt = _hmt || []; <h3 id="section">æ¨èçæ¬</h3> <ul> - <li>AWS EMR 5.7 (EMR 5.8 å以ä¸ï¼è¯·æ¥ç <a href="https://issues.apache.org/jira/browse/KYLIN-3129">KYLIN-3129</a>)</li> - <li>Apache Kylin v2.2.0 or above for HBase 1.x</li> + <li>AWS EMR 5.27</li> + <li>Apache Kylin v3.0.0 or above for HBase 1.x</li> </ul> <h3 id="emr-">å¯å¨ EMR é群</h3> @@ -194,23 +194,32 @@ var _hmt = _hmt || []; <p>å¦ææ¨ä½¿ç¨ S3 ä½ä¸º HBase çåå¨ï¼æ¨éè¦èªå®ä¹é 置为 <code class="highlighter-rouge">hbase.rpc.timeout</code>ï¼ç±äº S3 ç大容éè´è½½æ¯ä¸ä¸ªå¤å¶æä½ï¼å½æ°æ®è§æ¨¡æ¯è¾å¤§æ¶ï¼HBase Region æå¡å¨æ¯å¨ HDFS ä¸å°è±è´¹æ´å¤çæ¶é´çå¾ å ¶å®æã</p> -<div class="highlighter-rouge"><pre class="highlight"><code>[ { - "Classification": "hbase-site", - "Properties": { - "hbase.rpc.timeout": "3600000", - "hbase.rootdir": "s3://yourbucket/EMRROOT" - } - }, - { - "Classification": "hbase", - "Properties": { - "hbase.emr.storageMode": "s3" - } - } -] +<p>å¦ææ¨å¸æEMRçHive使ç¨ä¸ä¸ªå¤é¨çå æ°æ®ï¼æ¨å¯ä»¥èè使ç¨RDSæè AWS Glueãé£æ ·æ¨å°±å¯ä»¥å¨äºä¸ç¯å¢æ建ä¸ä¸ªstatelessçOLAPæå¡äºã</p> + +<p>让æ们éè¿AWS CLIå建ä¸ä¸ªEMR é群ï¼å¹¶ä¸å¼å¯ï¼å½ç¶ä»¥ä¸å 项æ¯å¯éçï¼<br /> +1. S3ä½ä¸ºHBaseæ°æ®åå¨<br /> +2. AWS Glueä½ä¸ºHiveå æ°æ®<br /> +3. å¼å¯S3å æ°æ®ä¸è´æ§ä»¥é²æ¢æ°æ®æ件丢失</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase Name=Spark Name=Sqoop Name=Tez Name=ZooKeeper \ + --release-label emr-5.28.0 \ + --instance-groups '[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Worker Node"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master Node"}]' \ + --configurations '[{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://{S3_BUCKET}/hbase/data","hbase.rpc.timeout": "3600000"}},{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]' \ + --name 'Kylin3.0Cluster_Original' \ + --emrfs Consistent=true \ + --region cn-northwest-1 </code></pre> </div> +<h3 id="aws-gluehive">æ¯æAWS Glueä½ä¸ºHiveå æ°æ®åå¨</h3> + +<p>å¦æä½ éè¦å¼å¯Glueä½ä¸ºHiveå æ°æ®, 请åè<code class="highlighter-rouge">https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore</code> æ¥è¿è¡æå ãä½ éè¦è·å以ä¸jarï¼</p> + +<ol> + <li>aws-glue-datacatalog-client-common-xxx.jar</li> + <li>aws-glue-datacatalog-hive2-client-xxx.jar</li> +</ol> + <h3 id="kylin">å®è£ Kylin</h3> <p>å½ EMR é群å¤äº âWaitingâ ç¶æï¼æ¨å¯ä»¥ SSH å° master èç¹ï¼ä¸è½½ Kylin ç¶å解å tar å :</p> @@ -218,8 +227,8 @@ var _hmt = _hmt || []; <div class="highlighter-rouge"><pre class="highlight"><code>sudo mkdir /usr/local/kylin sudo chown hadoop /usr/local/kylin <span class="nb">cd</span> /usr/local/kylin -wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz -tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz +wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-hbase1x.tar.gz +tar -zxvf apache-kylin-3.0.0-bin-hbase1x.tar.gz </code></pre> </div> @@ -314,35 +323,88 @@ tar -zxvf apache-kylin-2.5.0-bin-hbase1x </code></pre> </div> -<h3 id="kylin-2">å¯å¨ Kylin</h3> +<h3 id="section-1">解å³å å²çª</h3> -<p>å¯å¨åå¨æ®é Hadoop ä¸ä¸æ ·:</p> +<ul> + <li>å°ä»¥ä¸å 容添å å° ~/.bashrc</li> +</ul> -<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">KYLIN_HOME</span><span class="o">=</span>/usr/local/kylin/apache-kylin-2.2.0-bin -<span class="nv">$KYLIN_HOME</span>/bin/sample.sh -<span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">HIVE_HOME</span><span class="o">=</span>/usr/lib/hive +<span class="nb">export </span><span class="nv">HADOOP_HOME</span><span class="o">=</span>/usr/lib/hadoop +<span class="nb">export </span><span class="nv">HBASE_HOME</span><span class="o">=</span>/usr/lib/hbase +<span class="nb">export </span><span class="nv">SPARK_HOME</span><span class="o">=</span>/usr/lib/spark + +<span class="nb">export </span><span class="nv">KYLIN_HOME</span><span class="o">=</span>/home/ec2-user/apache-kylin-3.0.0-SNAPSHOT-bin +<span class="nb">export </span><span class="nv">HCAT_HOME</span><span class="o">=</span>/usr/lib/hive-hcatalog +<span class="nb">export </span><span class="nv">KYLIN_CONF_HOME</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/conf +<span class="nb">export </span><span class="nv">tomcat_root</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/tomcat +<span class="nb">export </span><span class="nv">hive_dependency</span><span class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span class="nv">$HIVE_HOME</span>/lib/:<span class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:<span class="nv">$SPARK_HOME</span>/jars/ +<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/bin:<span class="nv">$PATH</span> + +<span class="nb">export </span><span class="nv">hive_dependency</span><span class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span class="nv">$HIVE_HOME</span>/lib/<span class="k">*</span>:<span class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:/usr/share/aws/hmclient/lib/<span class="k">*</span>:<span class="nv">$SPARK_HOME</span>/jars/<span class="k">*</span>:<span class="nv">$HBASE_HOME</span>/lib/<span class="k">*</span>.jar:<span class="nv">$HBASE_HOME</span>/<span class="k">*</span>.jar </code></pre> </div> -<p>å«å¿è®°å¨ EMR master - âElasticMapReduce-masterâ çå®å ¨ç»ä¸å¯ç¨ 7070 端å£è®¿é®ï¼æä½¿ç¨ SSH è¿æ¥ master èç¹ï¼ç¶åæ¨å¯ä»¥ä½¿ç¨ <code class="highlighter-rouge">http://<master-dns>:7070/kylin</code> è®¿é® Kylin Web GUIã</p> +<ul> + <li>ææ¶å é¤ joda.jar</li> +</ul> -<p>Build åä¸ä¸ª Cubeï¼å½ Cube åå¤å¥½åè¿è¡æ¥è¯¢ãæ¨å¯ä»¥æµè§ S3 æ¥çæ°æ®æ¯å¦å®å ¨çæä¹ åäºã</p> +<div class="highlighter-rouge"><pre class="highlight"><code>mv <span class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar <span class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar.backup +</code></pre> +</div> + +<ul> + <li>ä¿®æ¹ bin/kylin.sh</li> +</ul> + +<p>å°ä»¥ä¸å 容添å å° bin/kylin.shç å¼å§</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">HBASE_CLASSPATH_PREFIX</span><span class="o">=</span><span class="k">${</span><span class="nv">tomcat_root</span><span class="k">}</span>/bin/bootstrap.jar:<span class="k">${</span><span class="nv">tomcat_root</span><span class="k">}</span>/bin/tomcat-juli.jar:<span class="k">${</span><span class="nv">tomcat_root</span><span class="k">}</span>/lib/<span class="k">*</span>:<span class="nv">$hive_dependency</span>:<span class="nv">$HBASE_CLASSPATH_PREFIX</span> +</code></pre> +</div> + +<h3 id="gluehive">å¼å¯æ¯æGlueä½ä¸ºHiveæ°æ®æº(å¯éç)</h3> +<ul> + <li>æ<code class="highlighter-rouge">aws-glue-datacatalog-client-common-xxx.jar</code>å<code class="highlighter-rouge">aws-glue-datacatalog-hive2-client-xxx.jar</code>æ¾å° <code class="highlighter-rouge">$KYLIN_HOME/lib</code>ç®å½ä¸</li> + <li>å¨<code class="highlighter-rouge">kylin.properties</code>ä¸ä¿®æ¹<code class="highlighter-rouge">kylin.source.hive.metadata-type=gluecatalog</code></li> +</ul> + +<h3 id="spark">é ç½® Spark</h3> -<h3 id="spark-">Spark é ç½®</h3> +<ul> + <li>对Sparkè¿è¡æå </li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code>rm -rf <span class="nv">$KYLIN_HOME</span>/spark_jars +mkdir <span class="nv">$KYLIN_HOME</span>/spark_jars +cp /usr/lib/spark/jars/<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark_jars +cp -f /usr/lib/hbase/lib/<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark_jars + +rm -f netty-3.9.9.Final.jar +rm -f netty-all-4.1.8.Final.jar + +â¨jar cv0f spark-libs.jar -C <span class="nv">$KYLIN_HOME</span>/spark_jars . +aws s3 cp spark-libs.jar s3://<span class="o">{</span>YOUR_BUCKET<span class="o">}</span>/kylin/package/ <span class="c"># You choose s3 as your working-dir</span> +hadoop fs -put spark-libs.jar hdfs://kylin/package/ <span class="c"># You choose hdfs as your working-dir</span> +</code></pre> +</div> -<p>EMR ç Spark çæ¬å¾å¯è½ä¸ Kylin ç¼è¯ççæ¬ä¸ä¸è´ï¼å æ¤æ¨é常ä¸è½ç´æ¥ä½¿ç¨ EMR æå ç Spark ç¨äº Kylin çä»»å¡ã æ¨éè¦å¨å¯å¨ Kylin ä¹åï¼å° âSPARK_HOMEâ ç¯å¢åé设置æå Kylin ç Spark åç®å½ (KYLIN_HOME/spark) ãæ¤å¤ï¼ä¸ºäºä» Spark ä¸è®¿é® S3 æ EMRFS ä¸çæ件ï¼æ¨éè¦å° EMR çæ©å±ç±»ä» EMR çç®å½æ·è´å° Kylin ç Spark ä¸ã</p> +<ul> + <li>å¨ <code class="highlighter-rouge">kylin.properties</code>设置<code class="highlighter-rouge">kylin.engine.spark-conf.spark.yarn.archive=PATH_TO_SPARK_LIB</code></li> +</ul> -<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">SPARK_HOME</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/spark +<h3 id="kylin-2">å¯å¨ Kylin</h3> -cp /usr/lib/hadoop-lzo/lib/<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/ -cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/ -cp /usr/lib/hadoop/hadoop-common<span class="k">*</span>-amzn-<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/ +<p>å¯å¨åå¨æ®é Hadoop ä¸ä¸æ ·:</p> +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nv">$KYLIN_HOME</span>/bin/sample.sh <span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start </code></pre> </div> -<p>æ¨ä¹å¯ä»¥åè EMR Spark ç spark-defaults æ¥è®¾ç½® Kylin ç Spark é ç½®ï¼ä»¥è·å¾æ´å¥½ç对é群èµæºçéé ã</p> +<p>å«å¿è®°å¨ EMR master - âElasticMapReduce-masterâ çå®å ¨ç»ä¸å¯ç¨ 7070 端å£è®¿é®ï¼æä½¿ç¨ SSH è¿æ¥ master èç¹ï¼ç¶åæ¨å¯ä»¥ä½¿ç¨ <code class="highlighter-rouge">http://<master-dns>:7070/kylin</code> è®¿é® Kylin Web GUIã</p> + +<p>Build åä¸ä¸ª Cubeï¼å½ Cube åå¤å¥½åè¿è¡æ¥è¯¢ãæ¨å¯ä»¥æµè§ S3 æ¥çæ°æ®æ¯å¦å®å ¨çæä¹ åäºã</p> <h3 id="emr--1">å ³é EMR é群</h3> @@ -356,10 +418,21 @@ cp /usr/lib/hadoop/hadoop-common<span cl <p>为äºç¨åæ ·ç Hbase æ°æ®éå¯ä¸ä¸ªé群ï¼å¯å¨ AWS Management Console ä¸æå®åä¹åé群ç¸åç Amazon S3 ä½ç½®æä½¿ç¨ <code class="highlighter-rouge">hbase.rootdir</code> é ç½®å±æ§ãæ´å¤ç EMR HBase ä¿¡æ¯ï¼åè <a href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html">HBase on Amazon S3</a></p> -<h2 id="ec2--kylin">å¨ä¸ç¨ç EC2 ä¸é¨ç½² Kylin</h2> +<h3 id="ec2--kylin">å¨ä¸ç¨ç EC2 ä¸é¨ç½² Kylin</h3> <p>æ¨èå¨ä¸é¨ç client èç¹ä¸è¿è¡ Kylin (èä¸æ¯ masterï¼core æ task)ãå¯å¨ä¸ä¸ªåæ¨ EMR æåæ · VPC ä¸åç½çç¬ç« EC2 å®ä¾ï¼ä» master èç¹å¤å¶ Hadoop clients å°è¯¥å®ä¾ï¼ç¶åå¨å ¶ä¸å®è£ Kylinãè¿å¯æå Kylin èªèº«ä¸ master èç¹ä¸æå¡ç稳å®æ§ã</p> +<h3 id="section-2">å ¶ä»é®é¢</h3> + +<p>å¦æå°S3é 置为æ¨çworking-dirï¼å¹¶ä¸åç°äºâWrong FSâå¼å¸¸ï¼è¯·å°è¯ä¿®æ¹ <code class="highlighter-rouge">$KYLIN_HOME/conf/kylin_hive_conf.xml</code>ï¼<code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>ï¼<code class="highlighter-rouge">/etc/hadoop/conf/core-site.xml</code>ã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code> <span class="nt"><property></span> + <span class="nt"><name></span>fs.defaultFS<span class="nt"></name></span> + <span class="nt"><value></span>s3://{YOUR_BUCKET}<span class="nt"></value></span> + <span class="c"><!--<value>hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020</value>--></span> + <span class="nt"></property></span> +</code></pre> +</div> </article> </div> Modified: kylin/site/docs31/install/kylin_aws_emr.html URL: http://svn.apache.org/viewvc/kylin/site/docs31/install/kylin_aws_emr.html?rev=1872352&r1=1872351&r2=1872352&view=diff ============================================================================== --- kylin/site/docs31/install/kylin_aws_emr.html (original) +++ kylin/site/docs31/install/kylin_aws_emr.html Sun Jan 5 14:10:51 2020 @@ -7050,35 +7050,42 @@ var _hmt = _hmt || []; <h3 id="recommended-version">Recommended Version</h3> <ul> - <li>AWS EMR 5.7 (for EMR 5.8 and above, please refer to <a href="https://issues.apache.org/jira/browse/KYLIN-3129">KYLIN-3129</a>)</li> - <li>Apache Kylin v2.2.0 or above for HBase 1.x</li> + <li>AWS EMR 5.27 or later</li> + <li>Apache Kylin v3.0.0 or above for HBase 1.x</li> </ul> <h3 id="start-emr-cluster">Start EMR cluster</h3> <p>Launch an EMR cluster with AWS web console, command line or API. Select <em>HBase</em> in the applications as Kylin need HBase service.</p> -<p>You can select âHDFSâ or âS3â as the storage for HBase, depending on whether you need Cube data be persisted after shutting down the cluster. EMR HDFS uses the local disk of EC2 instances, which will erase the data when cluster is stopped, then Kylin metadata and Cube data can be lost.</p> +<p>You can choose âHDFSâ or âS3â as the storage for HBase, depending on whether you need Cube data be persisted after shutting down the cluster. EMR HDFS uses the local disk of EC2 instances, which will erase the data when cluster is stopped, then Kylin metadata and Cube data will be lost.<br /> +If you use S3 as HBaseâs storage, you need customize its configuration for <code class="highlighter-rouge">hbase.rpc.timeout</code>, because the bulk load to S3 is a copy operation, when data size is huge, HBase region server need wait much longer to finish than on HDFS.<br /> +If you want your metadata of Hive is persisted outside of EMR cluster, you can choose AWS Glue or RDS of the metadata of Hive. Thus you can build a state-less OLAP service by Kylin in cloud.</p> -<p>If you use S3 as HBaseâs storage, you need customize its configuration for <code class="highlighter-rouge">hbase.rpc.timeout</code>, because the bulk load to S3 is a copy operation, when data size is huge, HBase region server need wait much longer to finish than on HDFS.</p> +<p>Let create a demo EMR cluster via AWS CLIï¼with <br /> +1. S3 as HBase storage (optional)<br /> +2. Glue as Hive Metadata (optional)<br /> +3. Enable consist metadata of S3 to make sure data wouldnât lose (optional)</p> -<div class="highlighter-rouge"><pre class="highlight"><code>[ { - "Classification": "hbase-site", - "Properties": { - "hbase.rpc.timeout": "3600000", - "hbase.rootdir": "s3://yourbucket/EMRROOT" - } - }, - { - "Classification": "hbase", - "Properties": { - "hbase.emr.storageMode": "s3" - } - } -] +<div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase Name=Spark Name=Sqoop Name=Tez Name=ZooKeeper \ + --release-label emr-5.28.0 \ + --instance-groups '[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Worker Node"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master Node"}]' \ + --configurations '[{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://{S3_BUCKET}/hbase/data","hbase.rpc.timeout": "3600000"}},{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]' \ + --name 'Kylin3.0Cluster_Original' \ + --emrfs Consistent=true \ + --region cn-northwest-1 </code></pre> </div> +<h3 id="support-glue-as-metadata-of-hive">Support Glue as metadata of Hive</h3> + +<p>If you want to enable support read metadata from Glue, please refer to <code class="highlighter-rouge">https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore</code> and build two jars.</p> + +<ol> + <li>aws-glue-datacatalog-client-common-xxx.jar</li> + <li>aws-glue-datacatalog-hive2-client-xxx.jar</li> +</ol> + <h3 id="install-kylin">Install Kylin</h3> <p>When EMR cluster is in âWaitingâ status, you can SSH into its master node, download Kylin and then uncompress the tar-ball file:</p> @@ -7086,8 +7093,8 @@ var _hmt = _hmt || []; <div class="highlighter-rouge"><pre class="highlight"><code>sudo mkdir /usr/local/kylin sudo chown hadoop /usr/local/kylin <span class="nb">cd</span> /usr/local/kylin -wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz -tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz +wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-hbase1x.tar.gz +tar -zxvf apache-kylin-3.0.0-bin-hbase1x.tar.gz </code></pre> </div> @@ -7181,35 +7188,85 @@ tar -zxvf apache-kylin-2.5.0-bin-hbase1x </code></pre> </div> -<h3 id="start-kylin">Start Kylin</h3> +<h3 id="solve-jar-conflict">Solve jar conflict</h3> +<ul> + <li>Add following env variable in ~/.bashrc</li> +</ul> -<p>The start is the same as on normal Hadoop:</p> +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">HIVE_HOME</span><span class="o">=</span>/usr/lib/hive +<span class="nb">export </span><span class="nv">HADOOP_HOME</span><span class="o">=</span>/usr/lib/hadoop +<span class="nb">export </span><span class="nv">HBASE_HOME</span><span class="o">=</span>/usr/lib/hbase +<span class="nb">export </span><span class="nv">SPARK_HOME</span><span class="o">=</span>/usr/lib/spark + +<span class="nb">export </span><span class="nv">KYLIN_HOME</span><span class="o">=</span>/home/ec2-user/apache-kylin-3.0.0-SNAPSHOT-bin +<span class="nb">export </span><span class="nv">HCAT_HOME</span><span class="o">=</span>/usr/lib/hive-hcatalog +<span class="nb">export </span><span class="nv">KYLIN_CONF_HOME</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/conf +<span class="nb">export </span><span class="nv">tomcat_root</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/tomcat +<span class="nb">export </span><span class="nv">hive_dependency</span><span class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span class="nv">$HIVE_HOME</span>/lib/:<span class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:<span class="nv">$SPARK_HOME</span>/jars/ +<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/bin:<span class="nv">$PATH</span> -<div class="highlighter-rouge"><pre class="highlight"><code>export KYLIN_HOME=/usr/local/kylin/apache-kylin-2.2.0-bin -$KYLIN_HOME/bin/sample.sh -$KYLIN_HOME/bin/kylin.sh start +<span class="nb">export </span><span class="nv">hive_dependency</span><span class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span class="nv">$HIVE_HOME</span>/lib/<span class="k">*</span>:<span class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:/usr/share/aws/hmclient/lib/<span class="k">*</span>:<span class="nv">$SPARK_HOME</span>/jars/<span class="k">*</span>:<span class="nv">$HBASE_HOME</span>/lib/<span class="k">*</span>.jar:<span class="nv">$HBASE_HOME</span>/<span class="k">*</span>.jar </code></pre> </div> -<p>Donât forget to enable the 7070 port access in the security group for EMR master - âElasticMapReduce-masterâ, or with SSH tunnel to the master node, then you can access Kylin Web GUI at http://<master-dns>:7070/kylin</p> +<ul> + <li>Remove joda.jar</li> +</ul> -<p>Build the sample Cube, and then run queries when the Cube is ready. You can browse S3 to see whether the data is safely persisted.</p> +<div class="highlighter-rouge"><pre class="highlight"><code>mv <span class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar <span class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar.backup +</code></pre> +</div> + +<ul> + <li>Modify bin/kylin.sh<br /> +Add following content on the top of bin/kylin.sh</li> +</ul> -<h3 id="spark-configuration">Spark Configuration</h3> +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">HBASE_CLASSPATH_PREFIX</span><span class="o">=</span><span class="k">${</span><span class="nv">tomcat_root</span><span class="k">}</span>/bin/bootstrap.jar:<span class="k">${</span><span class="nv">tomcat_root</span><span class="k">}</span>/bin/tomcat-juli.jar:<span class="k">${</span><span class="nv">tomcat_root</span><span class="k">}</span>/lib/<span class="k">*</span>:<span class="nv">$hive_dependency</span>:<span class="nv">$HBASE_CLASSPATH_PREFIX</span> +</code></pre> +</div> -<p>EMRâs Spark version may be incompatible with Kylin, so you couldnât directly use EMRâs Spark. You need to set âSPARK_HOMEâ environment variable to Kylinâs Spark folder (KYLIN_HOME/spark) before start Kylin. To access files on S3 or EMRFS, we need to copy EMRâs implementation jars to Spark.</p> +<h3 id="enable-glue-as-metadata-for-hiveoptional">Enable glue as metadata for Hive(Optional)</h3> +<ol> + <li>Put <code class="highlighter-rouge">aws-glue-datacatalog-client-common-xxx.jar</code> and <code class="highlighter-rouge">aws-glue-datacatalog-hive2-client-xxx.jar</code> under $KYLIN_HOME/lib.</li> + <li>Set <code class="highlighter-rouge">kylin.source.hive.metadata-type=gluecatalog</code> in <code class="highlighter-rouge">kylin.properties</code></li> +</ol> -<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">export </span><span class="nv">SPARK_HOME</span><span class="o">=</span><span class="nv">$KYLIN_HOME</span>/spark +<h3 id="configure-spark">Configure Spark</h3> -cp /usr/lib/hadoop-lzo/lib/<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/ -cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/ -cp /usr/lib/hadoop/hadoop-common<span class="k">*</span>-amzn-<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/ +<ul> + <li>Build a Sparkâs flat jar</li> +</ul> +<div class="highlighter-rouge"><pre class="highlight"><code>rm -rf <span class="nv">$KYLIN_HOME</span>/spark_jars +mkdir <span class="nv">$KYLIN_HOME</span>/spark_jars +cp /usr/lib/spark/jars/<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark_jars +cp -f /usr/lib/hbase/lib/<span class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark_jars + +rm -f netty-3.9.9.Final.jar +rm -f netty-all-4.1.8.Final.jar + +â¨jar cv0f spark-libs.jar -C <span class="nv">$KYLIN_HOME</span>/spark_jars . +aws s3 cp spark-libs.jar s3://<span class="o">{</span>YOUR_BUCKET<span class="o">}</span>/kylin/package/ <span class="c"># You choose s3 as your working-dir</span> +hadoop fs -put spark-libs.jar hdfs://kylin/package/ <span class="c"># You choose hdfs as your working-dir</span> +</code></pre> +</div> +<ul> + <li>Set <code class="highlighter-rouge">kylin.engine.spark-conf.spark.yarn.archive=PATH_TO_SPARK_LIB</code> in <code class="highlighter-rouge">kylin.properties</code></li> +</ul> + +<h3 id="start-kylin">Start Kylin</h3> + +<p>The start is the same as on normal Hadoop:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nv">$KYLIN_HOME</span>/bin/sample.sh <span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start </code></pre> </div> -<p>You can also copy EMRâs spark-defaults configuration to Kylinâs spark for a better utilization of the cluster resources.</p> +<p>Donât forget to enable the 7070 port access in the security group for EMR master - âElasticMapReduce-masterâ, or with SSH tunnel to the master node, then you can access Kylin Web GUI at http://<master-dns>:7070/kylin</p> + +<p>Build the sample Cube, and then run queries when the Cube is ready. You can browse S3 to see whether the data is safely persisted.</p> <h3 id="shut-down-emr-cluster">Shut down EMR Cluster</h3> @@ -7223,10 +7280,23 @@ cp /usr/lib/hadoop/hadoop-common<span cl <p>To restart a cluster with the same HBase data, specify the same Amazon S3 location as the previous cluster either in the AWS Management Console or using the âhbase.rootdirâ configuration property. For more information about EMR HBase, refer to <a href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html">HBase on Amazon S3</a></p> -<h2 id="deploy-kylin-in-a-dedicated-ec2">Deploy Kylin in a dedicated EC2</h2> +<h3 id="deploy-kylin-in-a-dedicated-ec2">Deploy Kylin in a dedicated EC2</h3> <p>Running Kylin in a dedicated client node (not master, core or task) is recommended. You can start a separate EC2 instance within the same VPC and subnet as your EMR, copy the Hadoop clients from master node to it, and then install Kylin in it. This can improve the stability of services in master node as well as Kylin itself.</p> +<h3 id="trouble-shotting">Trouble shotting</h3> + +<ul> + <li>If you set S3 as your working dir and find some âWrong FSâ exception in kylin.log(if you enable shrunken dictionary), please try to modify $KYLIN_HOME/conf/kylin_hive_conf.xml, /etc/hive/conf/hive-site.xml, /etc/hadoop/conf/core-site.xml.</li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code> <span class="nt"><property></span> + <span class="nt"><name></span>fs.defaultFS<span class="nt"></name></span> + <span class="nt"><value></span>s3://{YOUR_BUCKET}<span class="nt"></value></span> + <span class="c"><!--<value>hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020</value>--></span> + <span class="nt"></property></span> +</code></pre> +</div> </article> </div> Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1872352&r1=1872351&r2=1872352&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Sun Jan 5 14:10:51 2020 @@ -19,8 +19,8 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Fri, 03 Jan 2020 05:59:19 -0800</pubDate> - <lastBuildDate>Fri, 03 Jan 2020 05:59:19 -0800</lastBuildDate> + <pubDate>Sun, 05 Jan 2020 05:59:15 -0800</pubDate> + <lastBuildDate>Sun, 05 Jan 2020 05:59:15 -0800</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item>