Hi community,

Indeed, we also find out that such library is a strong demand.

Actually we have done many of the work Hongbin has mentioned in some kylin 
environments of our clients and we call the project as kylin-tools which is 
fully based on kylin rest apis.
What we did is that we created python scripts capable of doing most of the work 
that kylin's webapp can do.

The scripts have defined many command line options, and can be intergrated with 
crontab, oozie or other scheduling systems.The functionalities are listed as 
below:
1. Simple cube definition.
  It reads  simple cube definition from a csv, and convert them into json data 
so users can easily design their cubes and store them in a file.
2. Project create/delete, hive table synchronization, cache wipe
3. Cube batch create, build, enable/disable, delete
4. Auto check on job status, and support simple job failover
5. Cube information stored in mysql
6. Command lines to run kylin tasks

But this tool or project was developed for customized demand of our clients, 
and it definitely needs extra formalization and further development.
Again we'd like to contribute as much as we can to the community, but first of 
all we think we'd better discuss further on the features and design. 
Can you give us some advice and tell us your demand. If we can finish them, 
it's our pleasure to make it open source.



Best Regards,
 
George/倪春恩
Software Engineer/软件工程师
Mobile:+86-13501723787| Fax:+8610-56842040
北京明略软件系统有限公司(www.mininglamp.com)
北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层
F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping 
District,Beijing,102218
----------------------------------------------------------------------------------------------------------------------------
 
From: hongbin ma
Date: 2015-12-10 13:55
To: dev
Subject: [Request for comments] A client library to help automatic cube 
building/refreshing
Currently most users create/build/refresh cubes via our website by manual
click. Some of the advanced users might how Kylin provides REST APIs to
operate on cubes(
http://kylin.apache.org/docs/howto/howto_build_cube_with_restapi.html) It
is our design purpose to leave the cubing job scheduling to the client
side. We don't like the idea of integrating complex cubing job scheduling
because this might complicate server side and frontend a lot.
 
Yet we've seen a lot of user having the requirement of refreshing the cube
everyday. Some even needs to update the latest N days' data everyday. Let's
put aside how troublesome it is to click the build/refresh button everyday,
or how much efforts the user needs to learn using Kylin REST API
programatically. With no experienced guidance, the users tends to add a new
segment as well as refresh the last N segments EVERYDAY, this is extremely
inefficient and hurts query performance.
 
A sophisticated solution for such cases would be: organize the cube
segments by weeks/months/quarters (depending on how big N is, if N less
than 30, usually by month is optimal). Let's say N=30, then each time user
will only latest 2 segments need refreshing, this is much cheaper than the
naive solution.
 
However it's not so trivial for every kylin users to implement such
scheduling algorithm at his client side. Not to mention the error handling
logic, failover etc.  A more practical solution is that we provided a
scheduler library(which can be treated a child project to Kylin) to him,
and the user only needs to configure basic information like cube name, user
credentials, refresh frequency, N days to back refresh, etc. The client
library will take over for the resting dirty work.
 
I'm posting the idea here to see the community's opinion on this. Since
it's a really independent task (it's actually a independent project),
volunteers are greatly welcomed to fully take charge of this task.
 
​​
 
-- 
Regards,
 
*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Reply via email to