[jira] [Created] (HIVE-27176) EXPLAIN SKEW

Jira Sun, 26 Mar 2023 00:39:10 -0700

László Bodor created HIVE-27176:
-----------------------------------

             Summary: EXPLAIN SKEW <query>
                 Key: HIVE-27176
                 URL: https://issues.apache.org/jira/browse/HIVE-27176
             Project: Hive
          Issue Type: Improvement
            Reporter: László Bodor



Thinking about a new explain feature, which is actually not an explain, instead 
a set of analytical queries: considering a very complicated and large SQL 
statement (this below is a simple one, just for example's sake):
{code}
SELECT a FROM (SELECT b ... JOIN c on b.x = c.y) d JOIN e ON d.v = e.w
{code}

EXPLAIN skew should run a query like:
{code}
SELECT "b", "x", x, count distinct(b.x) as count order by count desc limit 50
UNION ALL
SELECT "c", "y", y, count distinct(c.y) as count order by count desc limit 50
UNION ALL
SELECT "d", "v", v count distinct(d.v) as count order by count desc limit 50
UNION ALL
SELECT "e", "w", w, count distinct(e.w) as count order by count desc limit 50
{code}

collecting some cardinality info about all the join columns found in the query, 
so result might be like:

{code}
table_name column_name column_value count
b "x" x_skew_value1 100431234
b "x" x_skew_value2 234
c "y" y_skew_value1 350000
c "y" x_skew_value2 459999
c "y" x_skew_value3 42
...
{code}
this doesn't solve the problem, instead shows data skew immediately for further 
analysis

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27176) EXPLAIN SKEW

Reply via email to