Logically, a cube contains cuboids representing all combinations of dimensions. Apparently, a naive cube building strategy that materializes all cuboids will easily meet curse-of-dimension problems. Currently Kylin leverages a strategy called "aggregation groups" to reduce the number of cuboids need being materialized.
However, if the query pattern is simple and fixed, the "aggregation group" strategy is still not efficient enough. For example, suppose there're five dimensions, namely A,B,C,D and E. The data modeler is sure that only combinations (A,B,C), (D,E), (A,E) will be queried, so he’ll use the aggregation group tool to optimize his cube definition. However, whatever aggregation group he chooses, lots of useless combinations would be materialized. With a new strategy called "cuboid whitelist", data modelers can guide Kylin to only materialize the cuboids he's interested in. Depending on the whitelist, Kylin will materialize the minimal set of cuboids to cover each cuboid in the whitelist. To support this, the following functionalities should be added: 1. Front-end/UI for specifying whitelist members, and persistent them to cube description. 2. Enhanced job engine scheduler that will calculate a minimal spanning build tree based on the whitelist. 3. (OPTIONAL) Enhanced job engine to support dynamic whitelist, trigger new builds for lately added whitelist members. Hongbin Ma
