Hi Xuefu,

Welcome to the Flink community and thanks for starting this discussion!
Better Hive integration would be really great!
Can you go into details of what you are proposing? I can think of a couple
ways to improve Flink in that regard:

* Support for Hive UDFs
* Support for Hive metadata catalog
* Support for HiveQL syntax
* ???

Best, Fabian

Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu <
xuef...@alibaba-inc.com>:

> Hi all,
>
> Along with the community's effort, inside Alibaba we have explored Flink's
> potential as an execution engine not just for stream processing but also
> for batch processing. We are encouraged by our findings and have initiated
> our effort to make Flink's SQL capabilities full-fledged. When comparing
> what's available in Flink to the offerings from competitive data processing
> engines, we identified a major gap in Flink: a well integration with Hive
> ecosystem. This is crucial to the success of Flink SQL and batch due to the
> well-established data ecosystem around Hive. Therefore, we have done some
> initial work along this direction but there are still a lot of effort
> needed.
>
> We have two strategies in mind. The first one is to make Flink SQL
> full-fledged and well-integrated with Hive ecosystem. This is a similar
> approach to what Spark SQL adopted. The second strategy is to make Hive
> itself work with Flink, similar to the proposal in [1]. Each approach bears
> its pros and cons, but they don’t need to be mutually exclusive with each
> targeting at different users and use cases. We believe that both will
> promote a much greater adoption of Flink beyond stream processing.
>
> We have been focused on the first approach and would like to showcase
> Flink's batch and SQL capabilities with Flink SQL. However, we have also
> planned to start strategy #2 as the follow-up effort.
>
> I'm completely new to Flink(, with a short bio [2] below), though many of
> my colleagues here at Alibaba are long-time contributors. Nevertheless, I'd
> like to share our thoughts and invite your early feedback. At the same
> time, I am working on a detailed proposal on Flink SQL's integration with
> Hive ecosystem, which will be also shared when ready.
>
> While the ideas are simple, each approach will demand significant effort,
> more than what we can afford. Thus, the input and contributions from the
> communities are greatly welcome and appreciated.
>
> Regards,
>
>
> Xuefu
>
> References:
>
> [1] https://issues.apache.org/jira/browse/HIVE-10712
> [2] Xuefu Zhang is a long-time open source veteran, worked or working on
> many projects under Apache Foundation, of which he is also an honored
> member. About 10 years ago he worked in the Hadoop team at Yahoo where the
> projects just got started. Later he worked at Cloudera, initiating and
> leading the development of Hive on Spark project in the communities and
> across many organizations. Prior to joining Alibaba, he worked at Uber
> where he promoted Hive on Spark to all Uber's SQL on Hadoop workload and
> significantly improved Uber's cluster efficiency.
>
>
>

Reply via email to