Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

godfrey he Tue, 02 Feb 2021 19:47:29 -0800

Thanks for bringing up the discussion, Rui!

Regarding the DDL part in the "Introduce HiveParser" section,
I would like to choose the second option. Because if we could
use one hive parser to parse all hive SQLs, we need not to copy
Calcite parser code, and the framework and the code will be very simple.


Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
Could you list all the extensions and give some examples ?

One minor suggestion about the name of ParserImplFactory.
How about renaming ParserImplFactory to DefaultParserFactory ?

Best,
Godfrey

Rui Li <[email protected]> 于2021年2月3日周三 上午11:16写道：

> Hi Jingsong,
>
> Thanks for your comments and they're very good questions.
>
> Regarding # Version, we need to do some tradeoff here. Choosing the latest
> 3.x will cover all the features we want to support. But as you said, 3.x
> and 2.x can have some differences and requires more efforts to support
> lower versions. I decided to pick 2.x and evolve from there to support new
> features in 3.x. Because I think most hive users, especially those who are
> likely to be interested in this feature, are still using 2.x or even 1.x.
> So the priority is to cover 2.x and 1.x first.
>
> Regarding # Hive Codes, in my PoC, I simply copy the code and make as few
> changes as possible. I believe we can do some clean up or refactor to
> reduce it. With that in mind, I expect it to be over 10k lines of java
> code, and even more if we count ANTLR grammar files as well.
>
> Regarding # Functions, you're right that HiveModule is more of a solution
> than limitation. I just want to emphasize that HiveModule and hive dialect
> need to be used together to achieve better compatibility.
>
> Regarding # Keywords, new hive versions can have more reserved keywords
> than old versions. Since we're based on hive 2.x code, it may not provide
> 100% keyword-compatibility to 1.x users. But I expect it to be good enough
> for most cases. If not, we can provide different grammar files for lower
> versions.
>
> On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <[email protected]> wrote:
>
> > Thanks Rui for the proposal, I think this FLIP is required by many users,
> > and it is very good to traditional Hive users. I have some confusion:
> >
> > # Version
> >
> > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X
> have
> > some differences?
> >
> > # Hive Codes
> >
> > Can you evaluate how much code we need to copy to our
> flink-hive-connector?
> > Do we need to change them? We need to maintain them anyway.
> >
> > # Functions
> >
> > About Hive functions, I don't think it is a limitation, we are using
> > HiveModule to be compatible with Hive, right? So it is a solution instead
> > of a limitation.
> >
> > # Keywords
> >
> > Do you think there will be a keyword problem? Or can we be 100%
> compatible
> > with Hive?
> >
> > On the whole, the FLIP looks very good and I'm looking forward to it.
> >
> > Best,
> > Jingsong
> >
> > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
> > <[email protected]> wrote:
> >
> > > Thanks for the further info and explanations! I have no other concerns.
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Rui Li <[email protected]>
> > > Send Time:2020年12月10日(星期四) 20:35
> > > To:dev <[email protected]>; Zhijiang <[email protected]>
> > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > >
> > > Hi Zhijiang,
> > >
> > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
> > > compatibility with this FLIP. That's because Flink doesn't have the
> > > functionalities to support all Hive's features. To list a few examples:
> > >
> > >    1. Hive allows users to process data with shell scripts -- very
> > similar
> > >    to UDFs [1]
> > >    2. Users can compile inline Groovy UDFs and use them in queries [2]
> > >    3. Users can dynamically add/delete jars, or even execute arbitrary
> > >    shell command [3]
> > >
> > > These features cannot be supported merely by a parser/planner, and it's
> > > open to discussion whether Flink even should support them at all.
> > >
> > > So the ultimate goal of this FLIP is to provide Hive syntax
> compatibility
> > > to features that are already available in Flink, which I believe will
> > cover
> > > most common use cases.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> > > [2]
> > >
> > >
> >
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> > > [3]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
> > >
> > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[email protected]
> > > .invalid>
> > > wrote:
> > >
> > > > Thanks for launching the discussion and the FLIP, Rui!
> > > >
> > > > It is really nice to see our continuous efforts for compatibility
> with
> > > > Hive and benefiting users in this area.
> > > > I am only curious that are there any other compatible limitations for
> > > Hive
> > > > users after this FLIP? Or can I say that the Hive compatibility is
> > > > completely resolved after this FLIP?
> > > > I am interested in the ultimate goal in this area. Maybe it is out of
> > > this
> > > > FLIP scope, but still wish some insights from you if possible. :)
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > From:Rui Li <[email protected]>
> > > > Send Time:2020年12月10日(星期四) 16:46
> > > > To:dev <[email protected]>
> > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > >
> > > > Thanks Kurt for your inputs!
> > > >
> > > > I agree we should extend Hive code to support non-Hive tables. I have
> > > > updated the wiki page to remove the limitations you mentioned, and
> add
> > > > typical use cases in the "Motivation" section.
> > > >
> > > > Regarding comment #b, the interface is defined in
> > > flink-table-planner-blink
> > > > and only used by the blink planner. So I think "BlinkParserFactory"
> is
> > a
> > > > better name, WDYT?
> > > >
> > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[email protected]> wrote:
> > > >
> > > > > Thanks Rui for starting this discussion.
> > > > >
> > > > > I can see the benefit that we improve hive compatibility further,
> as
> > > > quite
> > > > > some users are asking for this
> > > > > feature in mailing lists [1][2][3] and some online chatting tools
> > such
> > > as
> > > > > DingTalk.
> > > > >
> > > > > I have 3 comments regarding to the design doc:
> > > > >
> > > > > a) Could you add a section to describe the typical use case you
> want
> > to
> > > > > support after this feature is introduced?
> > > > > In that way, users can also have an impression how to use this
> > feature
> > > > and
> > > > > what the behavior and outcome will be.
> > > > >
> > > > > b) Regarding the naming: "BlinkParserFactory", I suggest renaming
> it
> > to
> > > > > "FlinkParserFactory".
> > > > >
> > > > > c) About the two limitations you mentioned:
> > > > >     1. Only works with Hive tables and the current catalog needs to
> > be
> > > a
> > > > > HiveCatalog.
> > > > >     2. Queries cannot involve tables/views from multiple catalogs.
> > > > > I assume this is because hive parser and analyzer doesn't support
> > > > > referring to a name with "x.y.z" fashion? Since
> > > > > we can control all the behaviors by leveraging the codes hive
> > currently
> > > > > use. Is it possible that we can remove such
> > > > > limitations? The reason is I'm not sure if users can make the whole
> > > story
> > > > > work purely depending on hive catalog (that's
> > > > > the reason why I gave comment #a). If multiple catalogs are
> involved,
> > > > with
> > > > > this limitation I don't think any meaningful
> > > > > pipeline could be built. For example, users want to stream data
> from
> > > > Kafka
> > > > > to Hive, fully use hive's dialect including
> > > > > query part. The kafka table could be a temporary table or saved in
> > > > default
> > > > > memory catalog.
> > > > >
> > > > >
> > > > > [1]
> > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > > > [2]
> > > >
> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi guys,
> > > > > >
> > > > > > I'd like to start a discussion about providing HiveQL
> compatibility
> > > for
> > > > > > users connecting to a hive warehouse. FLIP-123 has already
> covered
> > > most
> > > > > > DDLs. So now it's time to complement the other big missing part
> --
> > > > > queries.
> > > > > > With FLIP-152, the hive dialect covers more scenarios and makes
> it
> > > even
> > > > > > easier for users to migrate to Flink. More details are in the
> FLIP
> > > wiki
> > > > > > page [1]. Looking forward to your feedback!
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > > > >
> > > > > > --
> > > > > > Best regards!
> > > > > > Rui Li
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> > >
> >
> > --
> > Best, Jingsong Lee
> >
>
>
> --
> Best regards!
> Rui Li
>

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Reply via email to