[DISCUSS] Query cost computation

Yash Sharma Thu, 25 Jun 2015 00:24:43 -0700

Hi All,
Just need little clarification on the query cost.

How do we compute the query cost currently, Do we calculate the cost of the
overall query ?
Is this cost only for limiting the user from using a very expensive query ?


One generally used approach is to have a DAG/Tree of all the operations in
the Query (in our case the Hive AST ) and then each node/operator having
its own cost.

By this we can calculate the cumulative cost of the query which would be a
summission of all the individual costs of operators. This would provide a
very granular control over the query cost.

This approach can also help us further in Query Optimization where certain
operators can be removed or rearranged. Drill/Hive/Pheonix use a similar
approach via Calcite - though the implementation style vary. Kylin is also
supposedly following a similar approach.

Should we explore this possibility ?

P.S I am asking this question without assuming any technical
in-feasibilities or coupling on current design. Just a open thought.

[DISCUSS] Query cost computation

Reply via email to