[ https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888886#action_12888886 ]
Jeff Hammerbacher commented on HIVE-1107: ----------------------------------------- Russell, Let's not focus too hard on the name of the particular workflow execution engine. The idea here is that a program of some sort (Hive query or set of Pig statements) must be processed and a physical plan of MapReduce operators produced. Once you have a DAG of operators to carry out, you need: 1) A way to serialize and exchange this DAG (e.g. Avro, JSON, XML) 2) A service to execute the DAG and ensure it runs to completion Of course, things aren't this simple; for example, we need a consistent way to handle side data generated by an operator. The goal of this proposal was to encourage Hive and Pig to target the same plan serialization format so that a single plan execution engine could be used. That way, work that is done on monitoring, capturing metadata from, and ensuring the reliability of multi-stage DAGs of MapReduce can be reused rather than reimplemented in each system. Some arguments against this idea: component modularity can introduce inefficiencies, may make the overall system feel more complex, and does not deliver user-visible features despite the large effort required for implementation. I believe the convergence of Pig and Hive on this front would be beneficial to the larger Hadoop community, but it's a large undertaking, and each organization has their own goals for their infrastructure. Later, Jeff > Generic parallel execution framework for Hive (and Pig, and ...) > ---------------------------------------------------------------- > > Key: HIVE-1107 > URL: https://issues.apache.org/jira/browse/HIVE-1107 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Carl Steinbach > > Pig and Hive each have their own libraries for handling plan execution. As we > prepare to invest more time improving Hive's plan execution mechanism we > should also start to consider ways of building a generic plan execution > mechanism that is capable of supporting the needs of Hive and Pig, as well as > other Hadoop data flow programming environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.