Hi All, Just need little clarification on the query cost. How do we compute the query cost currently, Do we calculate the cost of the overall query ? Is this cost only for limiting the user from using a very expensive query ?
One generally used approach is to have a DAG/Tree of all the operations in the Query (in our case the Hive AST ) and then each node/operator having its own cost. By this we can calculate the cumulative cost of the query which would be a summission of all the individual costs of operators. This would provide a very granular control over the query cost. This approach can also help us further in Query Optimization where certain operators can be removed or rearranged. Drill/Hive/Pheonix use a similar approach via Calcite - though the implementation style vary. Kylin is also supposedly following a similar approach. Should we explore this possibility ? P.S I am asking this question without assuming any technical in-feasibilities or coupling on current design. Just a open thought.
