[jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)

Jeff Hammerbacher (JIRA) Thu, 15 Jul 2010 11:43:17 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888886#action_12888886
 ]


Jeff Hammerbacher commented on HIVE-1107:
-----------------------------------------

Russell,

Let's not focus too hard on the name of the particular workflow execution 
engine.

The idea here is that a program of some sort (Hive query or set of Pig 
statements) must be processed and a physical plan of MapReduce operators 
produced. Once you have a DAG of operators to carry out, you need:

1) A way to serialize and exchange this DAG (e.g. Avro, JSON, XML)
2) A service to execute the DAG and ensure it runs to completion

Of course, things aren't this simple; for example, we need a consistent way to 
handle side data generated by an operator.

The goal of this proposal was to encourage Hive and Pig to target the same plan 
serialization format so that a single plan execution engine could be used. That 
way, work that is done on monitoring, capturing metadata from, and ensuring the 
reliability of multi-stage DAGs of MapReduce can be reused rather than 
reimplemented in each system.

Some arguments against this idea: component modularity can introduce 
inefficiencies, may make the overall system feel more complex, and does not 
deliver user-visible features despite the large effort required for 
implementation.

I believe the convergence of Pig and Hive on this front would be beneficial to 
the larger Hadoop community, but it's a large undertaking, and each 
organization has their own goals for their infrastructure.

Later,
Jeff

> Generic parallel execution framework for Hive (and Pig, and ...)
> ----------------------------------------------------------------
>
>                 Key: HIVE-1107
>                 URL: https://issues.apache.org/jira/browse/HIVE-1107
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
>
> Pig and Hive each have their own libraries for handling plan execution. As we 
> prepare to invest more time improving Hive's plan execution mechanism we 
> should also start to consider ways of building a generic plan execution 
> mechanism that is capable of supporting the needs of Hive and Pig, as well as 
> other Hadoop data flow programming environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)

Reply via email to