[
https://issues.apache.org/jira/browse/HIVE-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702981#action_12702981
]
Jeff Hammerbacher commented on HIVE-449:
----------------------------------------
https://issues.apache.org/jira/browse/HIVE-29 would be another, potentially
more elegant, approach to this problem.
> Automatic memoization of intermediate data tables
> -------------------------------------------------
>
> Key: HIVE-449
> URL: https://issues.apache.org/jira/browse/HIVE-449
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Venky Iyer
>
> Processing data with Hive encourages you to specify your data transformation
> in the form of fairly complex nested joins/cluster bys/group bys etc,
> supplementing functionality with custom transforms where necessary. This
> however has the disadvantage that it's hard to inspect the output of
> intermediate phases; it's also an inconvenience when your custom TRANSFORM
> script at the end of a long chain of mapreduce jobs fails with syntax
> errors/bugs -- because now you need to run all the previous steps before you
> can check if you fixed the bugs in the custom script. This can be alleviated
> by providing functionality to capture specific steps in intermediate tables
> automatically, allowing me to be expressive in HiveQL without having to
> bookkeep all the intermediate tables.
> You may need a way to name queries and phases, so that you have a way of
> identifying which intermediate tables belong to which queries' phases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.