[ https://issues.apache.org/jira/browse/HIVE-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547189#comment-14547189 ]
Edward Capriolo edited comment on HIVE-10712 at 5/17/15 2:34 PM: ----------------------------------------------------------------- I have a question I want us all to consider. Hive currently has three execution engines. What is the value to adding a fourth one? I know on one hand that hive is an open source project and we do not wan to be outright rejecting ideas and directions but we have to ask ourselves is Flink so significantly different from spark or tez that we can justify the addition? In terms of the project having another engine we have more code, more dependencies, more tests. The project is already divided down the lines of supporting hive-on-tez and hive-on-spark. What is the value of a third camp? Hive has many different supported queries, but if Flink basically delivers the same performance as one of the back end on the majority of the queries I do not think it is a good direction. What if a 4th or 5th group come up with their own "execution engine" Hive on storm, hive-on-samza, hive-on-eds-query-engine. What value does an end user get between having to chose between this many engines where they face conflicting advice from conflicting people over which one they should use? As well as conflicting debates across the community as to which is the fastest/best. At this point I would like to have a real justification as to why we should add a 4th engine, for me not to be -1. We need some examples of some serious feature in flink that makes a large number of end-user queries faster/better otherwise I think this is just an academic pursuit that will further fragment us. Otherwise everyone data processing platform that has a map and reduce primitive can lobby for inclusion into hive. was (Author: appodictic): I have a question I want us all to consider. Hive currently has three execution engines. What is the value to adding a fourth one? I know on one hand that hive is an open source project and we do not wan to be outright rejecting ideas and directions but we have to ask ourselves is Flink so significantly different from spark or tez that we can justify the addition? In terms of the project having another engine we have more code, more dependencies, more tests. The project is already divided down the lines of supporting hive-on-tez and hive-on-spark. What is the value of a third camp? Hive has many different supported queries, but if Flink basically delivers the same performance as one of the back end on the majority of the queries I do not think it is a good direction. What if a 4th or 5th group come up with their own "execution engine" Hive on storm, hive-on-samza, hive-on-eds-query-engine. What value does an end user get between having to chose between this many engines where they face conflicting advice from conflicting people over which one they should use? As well as conflicting debates across the community as to which is the fastest/best. At this point I would like to have a real justification as to why we should add a 4th engine. For me not to be -1 we need some examples of some serious feature in flink that makes a large number of end-user queries faster/better otherwise I think this is just an academic pursuit that will further fragment us. > Hive on Apache Flink > -------------------- > > Key: HIVE-10712 > URL: https://issues.apache.org/jira/browse/HIVE-10712 > Project: Hive > Issue Type: Wish > Reporter: Greg Senia > > Flink as an open-source data analytics cluster computing framework has gained > some momentum recently. This initiative will provide user a new alternative > so that those user can consolidate their backend. > Secondly, providing such an alternative further increases Hive's adoption as > it exposes Flink users to a viable, feature-rich de facto standard SQL tools > on Hadoop. > Finally, allowing Hive to run on Flink also has performance benefits. Hive > queries, especially those involving multiple reducer stages, will run faster, > thus improving user experience as Tez/Spark does. > This is an umbrella JIRA which will cover many coming subtask. Feedback from > the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)