How to estimate cluster resources and storage required by Kylin

2019-12-03 Thread Kamal Bannuru
Hi Team,

This is Kamal Bannuru , I am newbie to Kylin community, please help me with
"*How to estimate cluster resources and storage required by Kylin*"

Please find more details about the dimension tables ,fact table and cube
design details as below.

*Dimension Table:* dim_audio_songs_name_mapping 
--
Column Name |   DataType|  Sample values
--
songid| String  |   s001
songname |  String  |   yyy
artistname   |  String  |   XXX
country_code |  String  |   IN
--
Dimension table size in HDFS:10 GB
No.Of Records   :5 Million records


*Fact Table:* tb_songs_tranasactions
--
Column Name|DataType| Sample Value
--
transactionid  |bigint  | 1001
country_code |  String  | IN
currency|   String  | INR
paid_money   |  String  | 1000
songid  |  String   | s001
--

Dimension table size in HDFS : 20 GB
No.Of Records: 50 Million Records


Model CubeEngineMR  

*Cube Design details:*

Column Type | Column Name   | Join Relation 

Dimension Column | Country_code   |
tb_songs_tranasactions.country_code=dim_audio_songs_name_mapping.country_code
Dimension Column | songid  |
tb_songs_tranasactions.songid=dim_audio_songs_name_mapping.songid
Measure| Metric   | Count(transactionid)
count(tb_songs_tranasactions.transactionid)
Measure| Metric  | SUM(paid_money)
sum(tb_songs_tranasactions.paid_money)




*Cube size estimation and required computations calcuations *   

*Storage Estimations:*  

1)  Please share the details like how much storage is relatively required
considering the dimension columns , cardinality values and facts data .
2) how much hive storage is required for the intermediate tables and for the
cube storage size at Hbase.
3) Do we have any Aproximate formulas to estimate these sizes ?

*Computation Estimations*
*Cube building :*
How much computation resources at cluster are required for the intermediate
hive jobs  using cube engine as MR ?

*Cube Query :*
How much computation resources are required for Cube query from hbase
storage ?

Do we have any Aproximate formulas to estimate these sizes ?

If these questions are already answered, please share the links, please let
me know if any more details are required.

Thanks for the support.

Regards
Kamal Bannuru.





--
Sent from: http://apache-kylin.74782.x6.nabble.com/


Re: [ANNOUNCE] Please welcome Chunen Ni to the Apache Kylin PMC

2019-12-03 Thread nichunen
Thanks for all your support, it’s really a great honor to me.
 
Let's make Kylin a more powerful project together.



Best regards,

 

Ni Chunen / George



On 12/2/2019 15:05,Temple Zhou wrote:
Congratulations!

On Mon, Dec 2, 2019 at 3:04 PM Chao Long  wrote:

Congratulations, Chunen.

On Mon, Dec 2, 2019 at 2:52 PM Guangxu Cheng 
wrote:

Congratulations, chunen!!!

JiaTao Tao  于2019年12月2日周一 上午10:58写道:

Congratulations!

--

Regards!

Aron Tao


ShaoFeng Shi  于2019年12月1日周日 上午10:47写道:

On behalf of the Apache Kylin PMC, I am pleased to announce that
Chunen
Ni
has accepted our invitation to become a PMC member on the Kylin
project.
We
appreciate Chunen stepping up to take more responsibility in the
Kylin
project.

Please join me in welcoming Chunen to the Kylin PMC!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ:
https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org








Re: Kylin Service Starts Fine, Can't Reach Web UI

2019-12-03 Thread VivekRSahu
Hi Phil,

I faced similar issue due to SSH tunneling. I followed the below steps and
it worked for me:

Part 1:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ssh-tunnel.html
Part 2:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-proxy.html

Regards,
Vivek

--
Sent from: http://apache-kylin.74782.x6.nabble.com/


Re: Error on EMR

2019-12-03 Thread Xiaoxiang Yu
Hi, 
   I have successfully deployed latest version of Kylin(3.0.beta) on AWS EMR 
5.27 and build a few cubes successfully, maybe you can have a try? 
   The cluster is created by CLI looks like this, and I deployed Kylin on 
MASTER node:

aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=Spark 
Name=Sqoop Name=Tez Name=Zeppelin Name=ZooKeeper Name=Ganglia\
--release-label emr-5.27.0 \
--instance-groups 
'[{"InstanceCount":4,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":200,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m4.2xlarge","Name":"Worker
 
Cluster"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"c4.4xlarge","Name":"MasterQuery"}]'
 \
--configurations 
'[{"Classification":"hdfs-site","Properties":{"dfs.replication":"2"}}]' \
--ebs-root-volume-size 100 \--enable-debugging \
--name 'BenchmarkCluster' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region cn-northwest-1


Best wishes,
Xiaoxiang Yu 
 

在 2019/12/2 20:38,“Tanmay Movva” 写入:

Hello,

We have installed kylin on our EMR master along with hbase, hadoop and
hive. Using download-spark.sh from KYLIN_HOME/bin I have installed spark.
As mentioned in "Install KYLIN on AWS EMR" guide we have followed the steps
to configure Kylin working dir and hbase storage as S3 and also made the
necessary zkquorum changes.

When we run the sample.sh or check-env.sh we don't get any errors. But when
we run the cube build job from UI, the job fails at stage-2 "Redistribute
Flat Hive Tables". As the job "Create Intermediate Hive tables" has been
completed successfully I don't think there has been any error with Hive.

Can anyone help us with this? Thank You.


java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at 
org.apache.kylin.source.hive.CLIHiveClient.(CLIHiveClient.java:47)
at 
org.apache.kylin.source.hive.HiveClientFactory.getHiveClient(HiveClientFactory.java:27)
at 
org.apache.kylin.source.hive.RedistributeFlatHiveTableStep.computeRowCount(RedistributeFlatHiveTableStep.java:40)
at 
org.apache.kylin.source.hive.RedistributeFlatHiveTableStep.doWork(RedistributeFlatHiveTableStep.java:91)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.conf.HiveConf
at 
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1928)
at 
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1771)
... 11 more

-- 
Regards,
Tanmay Krishna Movva
Razorpay