Currently most users create/build/refresh cubes via our website by manual click. Some of the advanced users might how Kylin provides REST APIs to operate on cubes( http://kylin.apache.org/docs/howto/howto_build_cube_with_restapi.html) It is our design purpose to leave the cubing job scheduling to the client side. We don't like the idea of integrating complex cubing job scheduling because this might complicate server side and frontend a lot.
Yet we've seen a lot of user having the requirement of refreshing the cube everyday. Some even needs to update the latest N days' data everyday. Let's put aside how troublesome it is to click the build/refresh button everyday, or how much efforts the user needs to learn using Kylin REST API programatically. With no experienced guidance, the users tends to add a new segment as well as refresh the last N segments EVERYDAY, this is extremely inefficient and hurts query performance. A sophisticated solution for such cases would be: organize the cube segments by weeks/months/quarters (depending on how big N is, if N less than 30, usually by month is optimal). Let's say N=30, then each time user will only latest 2 segments need refreshing, this is much cheaper than the naive solution. However it's not so trivial for every kylin users to implement such scheduling algorithm at his client side. Not to mention the error handling logic, failover etc. A more practical solution is that we provided a scheduler library(which can be treated a child project to Kylin) to him, and the user only needs to configure basic information like cube name, user credentials, refresh frequency, N days to back refresh, etc. The client library will take over for the resting dirty work. I'm posting the idea here to see the community's opinion on this. Since it's a really independent task (it's actually a independent project), volunteers are greatly welcomed to fully take charge of this task. -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone