Dear all, First of all, thank you very much for building and maintaining Apache Kylin, it is a really awesome, the work you are doing.
I had to try it out, so I first configured Apache Kylin into an AWS EMR cluster which worked pretty well and then I wanted to really go crazy and have it outside the AWS EMR cluster. I’ve already setup a Kylin cluster using MySQL as metastore but I am struggling on making it interacting with the EMR cluster. My issue: On the first build step of a cube, It is fetching data using sqoop and should add it to the Hive table, but there it is timing out because it tries to connect to 127.0.0.1:50010 which obviously is not the AWS EMR cluster. I was trying to find where I could change the ip for the datanode without success. Considering my issue, I was checking the code and I saw that there is the possibility of running the jobs using remote cli and I was wondering if this should be the way to go on a Production environment. Would you be so kind and provide me some guidance on the following topics?: Setting up kylin.job.use-remote-cli=true is the configuration that one should use when Apache Kylin is not inside the Hadoop cluster. If not then could you provide me any kind of guidance where I can find documentation for doing that kind of configuration (Kylin and Hadoop separated)? I was already investigating the https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox <https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox> Do you have more updated documentation for having Kylin outside the Hadoop cluster? Is it recommended to use Kylin outside the Hadoop cluster on a production environment? Thank you in advance. I look forward to hearing from you. Kind regards, Fábio Teixeira
