Re:Kylin interacting with AWS EMR

Xiaoxiang Yu Mon, 08 Jul 2019 23:28:42 -0700

Dear friend ,
   I am feeling sad that you have met such trouble. I have depolyed Kylin into 
CDH's Hadoop Cluster, but I have less knowledge about AWS's EMR, but I think I 
may share what I know to you.
   First question, how to depoly Kylin outside the Hadoop cluster? As far as I 
see, I think you should deploy Kylin into a router/client node of Hadoop 
Cluster. A router node should be a node which has deploy Hadoop binary(such as 
Hive/HDFS) and conf file, but without DataNode/NodeManager(So it has no heavy 
workload). The router/client node let you have fully access to Hive CLI/HBase 
CLI/HDFS CLI, that is suitable for Kylin's depolyment. 
   On another hand, I think depoly Kylin outside the Hadoop cluster is not 
suitable, because Kylin need to upload/download large amounts of data to/from 
Hadoop cluster. So, depolying Kylin outside the Hadoop cluster, make network 
being a bottleneck, which has bad influence on Kylin's performance.
   Another question, the entry "kylin.job.use-remote-cli=true", which is used 
for Kylin's developer, but not for Kylin's user. If you are interested in that, 
please check http://kylin.apache.org/development/dev_env.html for detail.
   Besides, I have invited you into a slack 
channel(https://apache-kylin.slack.com). Some kylin user has deploy Kylin 
successfully on EMR, you may ask them more question.





-----------------
-----------------
Best wishes to you ! 
From ：Xiaoxiang Yu



At 2019-07-09 00:34:01, "Fábio Teixeira" <[email protected]> wrote:
>Dear all,
>
>First of all, thank you very much for building and maintaining Apache Kylin, 
>it is a really awesome, the work you are doing. 
>
>I had to try it out, so I first configured Apache Kylin into an AWS EMR 
>cluster which worked pretty well and then I wanted to really go crazy and have 
>it outside the AWS EMR cluster.
>
>I’ve already setup a Kylin cluster using MySQL as metastore but I am 
>struggling on making it interacting with the EMR cluster.
>
>My issue:
>On the first build step of a cube, It is fetching data using sqoop and should 
>add it to the Hive table, but there it is timing out because it tries to 
>connect to 127.0.0.1:50010 which obviously is not the AWS EMR cluster. I was 
>trying to find where I could change the ip for the datanode without success.
>
>Considering my issue, I was checking the code and I saw that there is the 
>possibility of running the jobs using remote cli and I was wondering if this 
>should be the way to go on a Production environment.
>
>Would you be so kind and provide me some guidance on the following topics?:
>Setting up kylin.job.use-remote-cli=true is the configuration that one should 
>use when Apache Kylin is not inside the Hadoop cluster.
>If not then could you provide me any kind of guidance where I can find 
>documentation for doing that kind of configuration (Kylin and Hadoop 
>separated)?
>I was already investigating the 
>https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox 
><https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox> 
>Do you have more updated documentation for having Kylin outside the Hadoop 
>cluster?
>Is it recommended to use Kylin outside the Hadoop cluster on a production 
>environment?
>
>Thank you in advance.
>
>I look forward to hearing from you.
>
>Kind regards,
>Fábio Teixeira
>

Re:Kylin interacting with AWS EMR

Reply via email to