[
https://issues.apache.org/jira/browse/HIVE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889380#comment-13889380
]
Gunther Hagleitner commented on HIVE-6267:
------------------------------------------
Sorry [~navis]. I tried to keep the disruption as minimal as possible. I
collected all things I thought need fixing together before making the changes.
Then, I waited until the weekend and a time when the queue is empty and tried
to get everything back in shape before people start working again.
I think it's worth it, otherwise I wouldn't have spent so much time on it. As I
mentioned above, I've gotten feedback multiple times about seeing if we can
improve explain. Unfortunately, that means tons of golden files. If you can
think of a better way I can back out and try again. But it's not clear to me
how to avoid changing that many golden files, since we rely on it so heavily in
the q files..
> Explain explain
> ---------------
>
> Key: HIVE-6267
> URL: https://issues.apache.org/jira/browse/HIVE-6267
> Project: Hive
> Issue Type: Bug
> Reporter: Gunther Hagleitner
> Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6267.1.partial, HIVE-6267.2.partial,
> HIVE-6267.3.partial, HIVE-6267.4.patch, HIVE-6267.5.patch, HIVE-6267.6.patch,
> HIVE-6267.7.patch.gz, HIVE-6267.8.patch
>
>
> I've gotten feedback over time saying that it's very difficult to grok our
> explain command. There's supposedly a lot of information that mainly matters
> to developers or the testing framework. Comparing it to other major DBs it
> does seem like we're packing way more into explain than other folks.
> I've gone through the explain checking, what could be done to improve
> readability. Here's a list of things I've found:
> - AST (unreadable in it's "lisp" syntax, not really required for end users)
> - Vectorization (enough to display once per task and only when true)
> - Expressions representation is very lengthy, could be much more compact
> - "if not exists" on DDL (enough to display only on true, or maybe not at all)
> - bucketing info (enough if displayed only if table is actually bucketed)
> - external flag (show only if external)
> - GlobalTableId (don't need in plain explain, maybe in extended)
> - Position of big table (already clear from plan)
> - Stats always (Most DBs mostly only show stats in explain, that gives a
> sense of what the planer thinks will happen)
> - skew join (only if true should be enough)
> - limit doesn't show the actual limit
> - "Alias -> Map Operator tree" -> alias is duplicated in TableScan operator
> - tag is only useful at runtime (move to explain extended)
> - Some names are camel case or abbreviated, clearer if full name
> - Tez is missing vertex map (aka edges)
> - explain formatted (json) is broken right now (swallows some information)
> Since changing explain results in many golden file updates, i'd like to take
> a stab at all of these at once.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)