Dear all,

First of all, thank you very much for building and maintaining Apache Kylin, it 
is a really awesome, the work you are doing. 

I had to try it out, so I first configured Apache Kylin into an AWS EMR cluster 
which worked pretty well and then I wanted to really go crazy and have it 
outside the AWS EMR cluster.

I’ve already setup a Kylin cluster using MySQL as metastore but I am struggling 
on making it interacting with the EMR cluster.

My issue:
On the first build step of a cube, It is fetching data using sqoop and should 
add it to the Hive table, but there it is timing out because it tries to 
connect to 127.0.0.1:50010 which obviously is not the AWS EMR cluster. I was 
trying to find where I could change the ip for the datanode without success.

Considering my issue, I was checking the code and I saw that there is the 
possibility of running the jobs using remote cli and I was wondering if this 
should be the way to go on a Production environment.

Would you be so kind and provide me some guidance on the following topics?:
Setting up kylin.job.use-remote-cli=true is the configuration that one should 
use when Apache Kylin is not inside the Hadoop cluster.
If not then could you provide me any kind of guidance where I can find 
documentation for doing that kind of configuration (Kylin and Hadoop separated)?
I was already investigating the 
https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox 
<https://github.com/apache/kylin/tree/master/examples/test_case_data/sandbox> 
Do you have more updated documentation for having Kylin outside the Hadoop 
cluster?
Is it recommended to use Kylin outside the Hadoop cluster on a production 
environment?

Thank you in advance.

I look forward to hearing from you.

Kind regards,
Fábio Teixeira

Reply via email to