Re: [DISCUSS] Flink's supported APIs and Hive query syntax

Ingo Bürk Mon, 07 Mar 2022 04:52:47 -0800

Hi,

thanks Martijn for bringing this up and raising very valid concerns. Iagree with the notion that Flink supporting Hive should come with aproper commitment, and otherwise we should consider not supporting it atall (in Flink itself, that is).

Given that Hive is an Apache project, my first thought was whether weshouldn't just reach out to the project to understand their plansregarding vulnerabilities and the future of the project?



Best
Ingo

On 07.03.22 12:23, Martijn Visser wrote:

Hi everyone,
Flink currently has 4 APIs with multiple language support which can beused to develop applications:
* DataStream API, both Java and Scala
* Table API, both Java and Scala
* Flink SQL, both in Flink query syntax and Hive query syntax (partially)
* Python API
Since FLIP-152 [1] the Flink SQL support has been extended to alsosupport the Hive query syntax. There is now a follow-up FLINK-26360 [2]to address more syntax compatibility issues.
I would like to open a discussion on Flink directly supporting the Hivequery syntax. I have some concerns if having a 100% Hive query syntax isindeed something that we should aim for in Flink.
I can understand that having Hive query syntax support in Flink couldhelp users due to interoperability and being able to migrate. However:
- Adding full Hive query syntax support will mean that we go from 6fully supported API/language combinations to 7. I think we are currentlyalready struggling with maintaining the existing combinations, letanother one more.- Apache Hive is/appears to be a project that's not that activelydeveloped anymore. The last release was made in January 2021. It'spopularity is rapidly declining in Europe and the United State, also dueHadoop becoming less popular.- Related to the previous topic, other software like Snowflake,Trino/Presto, Databricks are becoming more and more popular. If we addfull support for the Hive query syntax, then why not add support forSnowflake and the others?- We are supporting Hive versions that are no longer supported by theHive community with known security vulnerabilities. This makes Flinkalso vulnerable for those type of vulnerabilities.- The currently Hive implementation is done by using a lot of internalsof Flink, making Flink hard to maintain, with lots of tech debt andmaking things overly complex.
From my perspective, I think it would be better to not have Hive querysyntax compatibility directly in Flink itself. Of course we should havea proper Hive connector and a proper Hive catalog to make connectivitywith Hive (the versions that are still supported by the Hive community)itself possible. Alternatively, if Hive query syntax is so important, itshould not rely on internals but be available as a dialect/pluggableoption. That could also open up the possibility to add more syntaxsupport for others in the future, but I really think we should justfocus on Flink SQL itself. That's already hard enough to maintain andimprove on.
I'm looking forward to the thoughts of both Developers and Users, so I'mcross-posting to both mailing lists.
Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82 <https://twitter.com/MartijnVisser82>
[1]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=165227316<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=165227316>[2] https://issues.apache.org/jira/browse/FLINK-21529<https://issues.apache.org/jira/browse/FLINK-21529>

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

Reply via email to