Dear friend , I am feeling sad that you have met such trouble. I have depolyed Kylin into CDH's Hadoop Cluster, but I have less knowledge about AWS's EMR, but I think I may share what I know to you. First question, how to depoly Kylin outside the Hadoop cluster? As far as I see, I think you should deploy Kylin into a router/client node of Hadoop Cluster. A router node should be a node which has deploy Hadoop binary(such as Hive/HDFS) and conf file, but without DataNode/NodeManager(So it has no heavy workload). The router/client node let you have fully access to Hive CLI/HBase CLI/HDFS CLI, that is suitable for Kylin's depolyment. On another hand, I think depoly Kylin outside the Hadoop cluster is not suitable, because Kylin need to upload/download large amounts of data to/from Hadoop cluster. So, depolying Kylin outside the Hadoop cluster, make network being a bottleneck, which has bad influence on Kylin's performance. Another question, the entry "kylin.job.use-remote-cli=true", which is used for Kylin's developer, but not for Kylin's user. If you are interested in that, please check http://kylin.apache.org/development/dev_env.html for detail. Besides, I have invited you into a slack channel(https://apache-kylin.slack.com). Some kylin user has deploy Kylin successfully on EMR, you may ask them more question.
----------------- ----------------- Best wishes to you ! From :Xiaoxiang Yu At 2019-07-09 00:34:01, "Fábio Teixeira" <fabio.so.teixe...@gmail.com> wrote: >Dear all, > >First of all, thank you very much for building and maintaining Apache Kylin, >it is a really awesome, the work you are doing. > >I had to try it out, so I first configured Apache Kylin into an AWS EMR >cluster which worked pretty well and then I wanted to really go crazy and have >it outside the AWS EMR cluster. > >I’ve already setup a Kylin cluster using MySQL as metastore but I am >struggling on making it interacting with the EMR cluster. > >My issue: >On the first build step of a cube, It is fetching data using sqoop and should >add it to the Hive table, but there it is timing out because it tries to >connect to 127.0.0.1:50010 which obviously is not the AWS EMR cluster. I was >trying to find where I could change the ip for the datanode without success. > >Considering my issue, I was checking the code and I saw that there is the >possibility of running the jobs using remote cli and I was wondering if this >should be the way to go on a Production environment. > >Would you be so kind and provide me some guidance on the following topics?: >Setting up kylin.job.use-remote-cli=true is the configuration that one should >use when Apache Kylin is not inside the Hadoop cluster. >If not then could you provide me any kind of guidance where I can find >documentation for doing that kind of configuration (Kylin and Hadoop >separated)? >I was already investigating the >https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox ><https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox> >Do you have more updated documentation for having Kylin outside the Hadoop >cluster? >Is it recommended to use Kylin outside the Hadoop cluster on a production >environment? > >Thank you in advance. > >I look forward to hearing from you. > >Kind regards, >Fábio Teixeira >