ulysses-you opened a new issue #934: URL: https://github.com/apache/incubator-kyuubi/issues/934
# Describe the proposal <!-- A clear and concise description of what the proposal is. If this is a KPIP https://kyuubi.apache.org/improvement-proposals.html, please put related link here. --> As we know, [Z-order](https://en.wikipedia.org/wiki/Z-order_curve) has benefits of data skipping which support map multidimensional data to one dimension. Besides, Z-order provides a good compression ratio for column-based storage. The additional cost of Z-order is that we need do a special "order" included an extra shuffle before data writing. Given this, Kyuubi want to find a better way to do the optimization using Z-order. In short, the basic question is: * how to choose a table and columns to optimize using Z-order * how to confirm the optimized table is effective **For question 1:** We can analyze the metrics to get the relationship between queries. Then choose a hot table and it's predicate distribution is concentrated. **For question 2:** Also analyze the metrics to see the queries which scan on the optimized table have benefits or not. We can rollback if perf has regression. # Task list <!-- Several sub-tasks with the pre-create issues, and it's better to @ the assignees if you know. More details can see github docs https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists. A simple example: - [ ] #1 - [ ] #11 @user1 - [ ] #12 - [ ] #13 - [ ] #2 @user2 - [ ] #3 --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
