Hi community, Indeed, we also find out that such library is a strong demand.
Actually we have done many of the work Hongbin has mentioned in some kylin environments of our clients and we call the project as kylin-tools which is fully based on kylin rest apis. What we did is that we created python scripts capable of doing most of the work that kylin's webapp can do. The scripts have defined many command line options, and can be intergrated with crontab, oozie or other scheduling systems.The functionalities are listed as below: 1. Simple cube definition. It reads simple cube definition from a csv, and convert them into json data so users can easily design their cubes and store them in a file. 2. Project create/delete, hive table synchronization, cache wipe 3. Cube batch create, build, enable/disable, delete 4. Auto check on job status, and support simple job failover 5. Cube information stored in mysql 6. Command lines to run kylin tasks But this tool or project was developed for customized demand of our clients, and it definitely needs extra formalization and further development. Again we'd like to contribute as much as we can to the community, but first of all we think we'd better discuss further on the features and design. Can you give us some advice and tell us your demand. If we can finish them, it's our pleasure to make it open source. Best Regards, George/倪春恩 Software Engineer/软件工程师 Mobile:+86-13501723787| Fax:+8610-56842040 北京明略软件系统有限公司(www.mininglamp.com) 北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层 F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping District,Beijing,102218 ---------------------------------------------------------------------------------------------------------------------------- From: hongbin ma Date: 2015-12-10 13:55 To: dev Subject: [Request for comments] A client library to help automatic cube building/refreshing Currently most users create/build/refresh cubes via our website by manual click. Some of the advanced users might how Kylin provides REST APIs to operate on cubes( http://kylin.apache.org/docs/howto/howto_build_cube_with_restapi.html) It is our design purpose to leave the cubing job scheduling to the client side. We don't like the idea of integrating complex cubing job scheduling because this might complicate server side and frontend a lot. Yet we've seen a lot of user having the requirement of refreshing the cube everyday. Some even needs to update the latest N days' data everyday. Let's put aside how troublesome it is to click the build/refresh button everyday, or how much efforts the user needs to learn using Kylin REST API programatically. With no experienced guidance, the users tends to add a new segment as well as refresh the last N segments EVERYDAY, this is extremely inefficient and hurts query performance. A sophisticated solution for such cases would be: organize the cube segments by weeks/months/quarters (depending on how big N is, if N less than 30, usually by month is optimal). Let's say N=30, then each time user will only latest 2 segments need refreshing, this is much cheaper than the naive solution. However it's not so trivial for every kylin users to implement such scheduling algorithm at his client side. Not to mention the error handling logic, failover etc. A more practical solution is that we provided a scheduler library(which can be treated a child project to Kylin) to him, and the user only needs to configure basic information like cube name, user credentials, refresh frequency, N days to back refresh, etc. The client library will take over for the resting dirty work. I'm posting the idea here to see the community's opinion on this. Since it's a really independent task (it's actually a independent project), volunteers are greatly welcomed to fully take charge of this task. -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
