Automatic memoization of intermediate data tables
-------------------------------------------------
Key: HIVE-449
URL: https://issues.apache.org/jira/browse/HIVE-449
Project: Hadoop Hive
Issue Type: Improvement
Reporter: Venky Iyer
Processing data with Hive encourages you to specify your data transformation in
the form of fairly complex nested joins/cluster bys/group bys etc,
supplementing functionality with custom transforms where necessary. This
however has the disadvantage that it's hard to inspect the output of
intermediate phases; it's also an inconvenience when your custom TRANSFORM
script at the end of a long chain of mapreduce jobs fails with syntax
errors/bugs -- because now you need to run all the previous steps before you
can check if you fixed the bugs in the custom script. This can be alleviated by
providing functionality to capture specific steps in intermediate tables
automatically, allowing me to be expressive in HiveQL without having to
bookkeep all the intermediate tables.
You may need a way to name queries and phases, so that you have a way of
identifying which intermediate tables belong to which queries' phases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.