Automatic memoization of intermediate data tables
-------------------------------------------------

                 Key: HIVE-449
                 URL: https://issues.apache.org/jira/browse/HIVE-449
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Venky Iyer


Processing data with Hive encourages you to specify your data transformation in 
the form of fairly complex nested joins/cluster bys/group bys etc, 
supplementing functionality with custom transforms where necessary. This 
however has the disadvantage that it's hard to inspect the output of 
intermediate phases; it's also an inconvenience when your custom TRANSFORM 
script at the end of a long chain of mapreduce jobs fails with syntax 
errors/bugs -- because now you need to run all the previous steps before you 
can check if you fixed the bugs in the custom script. This can be alleviated by 
providing functionality to capture specific steps in intermediate tables 
automatically,  allowing me to be expressive in HiveQL without having to 
bookkeep all the intermediate tables. 

You may need a way to name queries and phases, so that you have a way of 
identifying which intermediate tables belong to which queries' phases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to