László Bodor created HIVE-27176: ----------------------------------- Summary: EXPLAIN SKEW <query> Key: HIVE-27176 URL: https://issues.apache.org/jira/browse/HIVE-27176 Project: Hive Issue Type: Improvement Reporter: László Bodor
Thinking about a new explain feature, which is actually not an explain, instead a set of analytical queries: considering a very complicated and large SQL statement (this below is a simple one, just for example's sake): {code} SELECT a FROM (SELECT b ... JOIN c on b.x = c.y) d JOIN e ON d.v = e.w {code} EXPLAIN skew should run a query like: {code} SELECT "b", "x", x, count distinct(b.x) as count order by count desc limit 50 UNION ALL SELECT "c", "y", y, count distinct(c.y) as count order by count desc limit 50 UNION ALL SELECT "d", "v", v count distinct(d.v) as count order by count desc limit 50 UNION ALL SELECT "e", "w", w, count distinct(e.w) as count order by count desc limit 50 {code} collecting some cardinality info about all the join columns found in the query, so result might be like: {code} table_name column_name column_value count b "x" x_skew_value1 100431234 b "x" x_skew_value2 234 c "y" y_skew_value1 350000 c "y" x_skew_value2 459999 c "y" x_skew_value3 42 ... {code} this doesn't solve the problem, instead shows data skew immediately for further analysis -- This message was sent by Atlassian Jira (v8.20.10#820010)