[jira] Commented: (HIVE-449) Automatic memoization of intermediate data tables

Jeff Hammerbacher (JIRA) Sun, 26 Apr 2009 20:56:56 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702981#action_12702981
 ]


Jeff Hammerbacher commented on HIVE-449:
----------------------------------------

https://issues.apache.org/jira/browse/HIVE-29 would be another, potentially 
more elegant, approach to this problem.

> Automatic memoization of intermediate data tables
> -------------------------------------------------
>
>                 Key: HIVE-449
>                 URL: https://issues.apache.org/jira/browse/HIVE-449
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Venky Iyer
>
> Processing data with Hive encourages you to specify your data transformation 
> in the form of fairly complex nested joins/cluster bys/group bys etc, 
> supplementing functionality with custom transforms where necessary. This 
> however has the disadvantage that it's hard to inspect the output of 
> intermediate phases; it's also an inconvenience when your custom TRANSFORM 
> script at the end of a long chain of mapreduce jobs fails with syntax 
> errors/bugs -- because now you need to run all the previous steps before you 
> can check if you fixed the bugs in the custom script. This can be alleviated 
> by providing functionality to capture specific steps in intermediate tables 
> automatically,  allowing me to be expressive in HiveQL without having to 
> bookkeep all the intermediate tables. 
> You may need a way to name queries and phases, so that you have a way of 
> identifying which intermediate tables belong to which queries' phases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-449) Automatic memoization of intermediate data tables

Reply via email to