[
https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashish Thusoo updated HADOOP-4084:
----------------------------------
Attachment: patch-4084
Uploaded patch for support to explain plan and some other minor fixes and
cleanups.
A note on the implementation of explain plan. This is implemented through a
java annotation @explain (implemented in explain.java)
This annotation can take two optional arguments
@explain(displayName="name", normalExplain=false)
by default displayName is "" and normalExplain is true
displayName is the string used to display this class/return value of a method
in the plan. normalExplain=false means that this class/return value of a method
should only be displayed in case of an extended plan.
Additionally there is an ExplainTask that does the actual explain and contains
the explainWork that contains the AST string and the rootTasks that need to be
explained.
The general format of explain is
PARSE TREE
STAGE DEPENDENCIES
STAGE PLAN:
Plan for Stage 1
Plan for Stage 2
.
.
.
Within the plan the parent child relationship is shown through indentation, the
names of operators are displayed as specified in the @explain notation (if the
displayName is "" then the name is not displayed in the plan).
Additional each of the public functions of the class that are annotated with
@explain is called and the explain is recursively called on non primitive and
string values (if they are maps, lists or other classes). For primitive and
string values
we just print the value.
In future I can make this xml instead of text blobs that I am doing now.
The minor fixes/refactors include:
1. -Doverwrite=true option for running tests with the purpose of capturing
results, so now if you have to capture the results of TestCliDriver you can run
ant -Dtestcase=TestCliDriver -Doverwrite=true clean-test test
same is true for TestParse and TestParseNegative
Additionally for all these tests you can run a specific query by using
-Dqfile e.g.
ant -Dtestcase=TestParse -Dqfile=input1.q -Doverwrite=true
clean-test test
would run the parse test on input1.q and capture its output in the source
tree.
2. Fixed some warnings related to generics
3. Changed the location of velocity.log to be in the build directory for
TestCliDriver (this was already in the build location for TestParse and
TestParseNegative)
4. Unified the function registries for UDF and UDAF and introduced the notion
of displayName in FunctionInfo and FunctionRegistry which is used to show the
function in the plan.
> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
> Key: HADOOP-4084
> URL: https://issues.apache.org/jira/browse/HADOOP-4084
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/hive
> Reporter: Ashish Thusoo
> Assignee: Ashish Thusoo
> Attachments: patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where
> as without that keyword only logical information will be emitted.
> e.g. In case of a group by query
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use
> this for finding differences in the plan instead of the programmatic plan
> dumps that we are using today (tests/queries/positive).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.