Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li Wed, 03 Feb 2021 22:02:57 -0800

Hi Jingsong, Godfrey and Jark,

I have updated the FLIP to incorporate your suggestions. Please let me know
if you have further comments. Thank you.


On Wed, Feb 3, 2021 at 4:51 PM Rui Li <lirui.fu...@gmail.com> wrote:

> Thanks Godfrey & Jark for your comments!
>
> Since both of you mentioned the naming of the parser factory, I'll answer
> this question first.
>
> I only intend to introduce the pluggable Parser for blink planner. So
> the BlinkParserFactory will be added to the flink-table-planner-blink
> module. We already have several factory classes there such as
> "BlinkPlannerFactory" and "BlinkExecutorFactory". So it seems
> "BlinkParserFactory" is following the naming convention of existing
> classes. But I'm also fine with "ParserFactory" if we're going to remove
> the "blink" prefix anyway.
> For the concrete implementation, I prefer "DefaultParserFactory". Having a
> "FlinkParserFactory" in blink planner seems a little weird. We can revisit
> this when we really get rid of the legacy planner.
>
> To answer the other questions:
>
> @Godfrey
> Yes I agree DDLs should also be handled by the HiveParser. That'll give
> users more consistent experience.
>
> The "Go Beyond Hive" section is mainly about decoupling the dialect from
> hive objects, so that users can write HiveQL to query non-hive tables or
> call non-hive functions. I think it's in the scope of this FLIP, but it
> doesn't have to be done in the first release. I'll update the FLIP to be
> more specific about it.
>
> @Jark
> Creating a new Parser only when dialect changes is a good idea. I'll
> update the FLIP to mention that.
>
> The 3.x new features we need to support in this FLIP include:
>
>    1. PK & NOT NULL constraints
>    2. Alter DB to change location
>
> These DDLs are already covered by FLIP-123. Missing them will be a
> regression.
>
> Other new features can be identified with more tests and user feedback,
> and will be supported incrementally.
>
> By "we can use a newer version to support older versions", I mean syntax
> in the new version is a superset of the old one. But we still need extra
> efforts to make sure the copied code works with different hive versions,
> e.g. more shim methods are required. So I'd rather start with a popular
> version than the newest version.
>
>
> On Wed, Feb 3, 2021 at 11:51 AM Jark Wu <imj...@gmail.com> wrote:
>
>> Thanks Rui for the great proposal, I believe this can be very attractive
>> for many Hive users.
>>
>> The FLIP looks good to me in general, I only have some minor comments:
>>
>> 1) BlinkParserFactory
>> IIUC, BlinkParserFactory is located in the flink-table-api-java module
>> with
>> the Parser interface there.
>> I suggest renaming it to `ParserFactory`, because it creates Parser
>> instead
>> of BlinkParser.
>> And the implementations can be `HiveParserFactory` and
>> `FlinkParserFactory`.
>> I think we should avoid the `Blink` keyword in interfaces, blink planner
>> is
>> already the default planner and
>> the old planner will be removed in the near future. There will be no
>> `blink` in the future then.
>>
>> 2) "create a new instance each time getParser is called"
>> Finding parser for every time getParser is called sounds heavy to me. I
>> think we can improve this by simplify
>> caching the Parser instance,  and creating a new one if current
>> sql-dialect
>> is different from the cached Parser.
>>
>> 3) Hive version
>> How much code needs to be done to support new features in 3.x based on
>> 2.x?
>> Is this also included in this FLIP/release?
>> I don't fully understand this because the FLIP says "we can use a newer
>> version to support older versions."
>>
>> Best,
>> Jark
>>
>> On Wed, 3 Feb 2021 at 11:48, godfrey he <godfre...@gmail.com> wrote:
>>
>> > Thanks for bringing up the discussion, Rui!
>> >
>> > Regarding the DDL part in the "Introduce HiveParser" section,
>> > I would like to choose the second option. Because if we could
>> > use one hive parser to parse all hive SQLs, we need not to copy
>> > Calcite parser code, and the framework and the code will be very simple.
>> >
>> > Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
>> > Could you list all the extensions and give some examples ?
>> >
>> > One minor suggestion about the name of ParserImplFactory.
>> > How about renaming ParserImplFactory to DefaultParserFactory ?
>> >
>> > Best,
>> > Godfrey
>> >
>> > Rui Li <lirui.fu...@gmail.com> 于2021年2月3日周三 上午11:16写道：
>> >
>> > > Hi Jingsong,
>> > >
>> > > Thanks for your comments and they're very good questions.
>> > >
>> > > Regarding # Version, we need to do some tradeoff here. Choosing the
>> > latest
>> > > 3.x will cover all the features we want to support. But as you said,
>> 3.x
>> > > and 2.x can have some differences and requires more efforts to support
>> > > lower versions. I decided to pick 2.x and evolve from there to support
>> > new
>> > > features in 3.x. Because I think most hive users, especially those who
>> > are
>> > > likely to be interested in this feature, are still using 2.x or even
>> 1.x.
>> > > So the priority is to cover 2.x and 1.x first.
>> > >
>> > > Regarding # Hive Codes, in my PoC, I simply copy the code and make as
>> few
>> > > changes as possible. I believe we can do some clean up or refactor to
>> > > reduce it. With that in mind, I expect it to be over 10k lines of java
>> > > code, and even more if we count ANTLR grammar files as well.
>> > >
>> > > Regarding # Functions, you're right that HiveModule is more of a
>> solution
>> > > than limitation. I just want to emphasize that HiveModule and hive
>> > dialect
>> > > need to be used together to achieve better compatibility.
>> > >
>> > > Regarding # Keywords, new hive versions can have more reserved
>> keywords
>> > > than old versions. Since we're based on hive 2.x code, it may not
>> provide
>> > > 100% keyword-compatibility to 1.x users. But I expect it to be good
>> > enough
>> > > for most cases. If not, we can provide different grammar files for
>> lower
>> > > versions.
>> > >
>> > > On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <jingsongl...@gmail.com>
>> > wrote:
>> > >
>> > > > Thanks Rui for the proposal, I think this FLIP is required by many
>> > users,
>> > > > and it is very good to traditional Hive users. I have some
>> confusion:
>> > > >
>> > > > # Version
>> > > >
>> > > > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive
>> 2.X
>> > > have
>> > > > some differences?
>> > > >
>> > > > # Hive Codes
>> > > >
>> > > > Can you evaluate how much code we need to copy to our
>> > > flink-hive-connector?
>> > > > Do we need to change them? We need to maintain them anyway.
>> > > >
>> > > > # Functions
>> > > >
>> > > > About Hive functions, I don't think it is a limitation, we are using
>> > > > HiveModule to be compatible with Hive, right? So it is a solution
>> > instead
>> > > > of a limitation.
>> > > >
>> > > > # Keywords
>> > > >
>> > > > Do you think there will be a keyword problem? Or can we be 100%
>> > > compatible
>> > > > with Hive?
>> > > >
>> > > > On the whole, the FLIP looks very good and I'm looking forward to
>> it.
>> > > >
>> > > > Best,
>> > > > Jingsong
>> > > >
>> > > > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
>> > > > <wangzhijiang...@aliyun.com.invalid> wrote:
>> > > >
>> > > > > Thanks for the further info and explanations! I have no other
>> > concerns.
>> > > > >
>> > > > > Best,
>> > > > > Zhijiang
>> > > > >
>> > > > >
>> > > > > ------------------------------------------------------------------
>> > > > > From:Rui Li <lirui.fu...@gmail.com>
>> > > > > Send Time:2020年12月10日(星期四) 20:35
>> > > > > To:dev <dev@flink.apache.org>; Zhijiang <
>> wangzhijiang...@aliyun.com>
>> > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>> > > > >
>> > > > > Hi Zhijiang,
>> > > > >
>> > > > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
>> > > > > compatibility with this FLIP. That's because Flink doesn't have
>> the
>> > > > > functionalities to support all Hive's features. To list a few
>> > examples:
>> > > > >
>> > > > >    1. Hive allows users to process data with shell scripts -- very
>> > > > similar
>> > > > >    to UDFs [1]
>> > > > >    2. Users can compile inline Groovy UDFs and use them in queries
>> > [2]
>> > > > >    3. Users can dynamically add/delete jars, or even execute
>> > arbitrary
>> > > > >    shell command [3]
>> > > > >
>> > > > > These features cannot be supported merely by a parser/planner, and
>> > it's
>> > > > > open to discussion whether Flink even should support them at all.
>> > > > >
>> > > > > So the ultimate goal of this FLIP is to provide Hive syntax
>> > > compatibility
>> > > > > to features that are already available in Flink, which I believe
>> will
>> > > > cover
>> > > > > most common use cases.
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
>> > > > > [3]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
>> > > > >
>> > > > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <
>> wangzhijiang...@aliyun.com
>> > > > > .invalid>
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks for launching the discussion and the FLIP, Rui!
>> > > > > >
>> > > > > > It is really nice to see our continuous efforts for
>> compatibility
>> > > with
>> > > > > > Hive and benefiting users in this area.
>> > > > > > I am only curious that are there any other compatible
>> limitations
>> > for
>> > > > > Hive
>> > > > > > users after this FLIP? Or can I say that the Hive compatibility
>> is
>> > > > > > completely resolved after this FLIP?
>> > > > > > I am interested in the ultimate goal in this area. Maybe it is
>> out
>> > of
>> > > > > this
>> > > > > > FLIP scope, but still wish some insights from you if possible.
>> :)
>> > > > > >
>> > > > > > Best,
>> > > > > > Zhijiang
>> > > > > >
>> > > > > >
>> > > > > >
>> ------------------------------------------------------------------
>> > > > > > From:Rui Li <lirui.fu...@gmail.com>
>> > > > > > Send Time:2020年12月10日(星期四) 16:46
>> > > > > > To:dev <dev@flink.apache.org>
>> > > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>> > > > > >
>> > > > > > Thanks Kurt for your inputs!
>> > > > > >
>> > > > > > I agree we should extend Hive code to support non-Hive tables. I
>> > have
>> > > > > > updated the wiki page to remove the limitations you mentioned,
>> and
>> > > add
>> > > > > > typical use cases in the "Motivation" section.
>> > > > > >
>> > > > > > Regarding comment #b, the interface is defined in
>> > > > > flink-table-planner-blink
>> > > > > > and only used by the blink planner. So I think
>> "BlinkParserFactory"
>> > > is
>> > > > a
>> > > > > > better name, WDYT?
>> > > > > >
>> > > > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <ykt...@gmail.com>
>> > wrote:
>> > > > > >
>> > > > > > > Thanks Rui for starting this discussion.
>> > > > > > >
>> > > > > > > I can see the benefit that we improve hive compatibility
>> further,
>> > > as
>> > > > > > quite
>> > > > > > > some users are asking for this
>> > > > > > > feature in mailing lists [1][2][3] and some online chatting
>> tools
>> > > > such
>> > > > > as
>> > > > > > > DingTalk.
>> > > > > > >
>> > > > > > > I have 3 comments regarding to the design doc:
>> > > > > > >
>> > > > > > > a) Could you add a section to describe the typical use case
>> you
>> > > want
>> > > > to
>> > > > > > > support after this feature is introduced?
>> > > > > > > In that way, users can also have an impression how to use this
>> > > > feature
>> > > > > > and
>> > > > > > > what the behavior and outcome will be.
>> > > > > > >
>> > > > > > > b) Regarding the naming: "BlinkParserFactory", I suggest
>> renaming
>> > > it
>> > > > to
>> > > > > > > "FlinkParserFactory".
>> > > > > > >
>> > > > > > > c) About the two limitations you mentioned:
>> > > > > > >     1. Only works with Hive tables and the current catalog
>> needs
>> > to
>> > > > be
>> > > > > a
>> > > > > > > HiveCatalog.
>> > > > > > >     2. Queries cannot involve tables/views from multiple
>> > catalogs.
>> > > > > > > I assume this is because hive parser and analyzer doesn't
>> support
>> > > > > > > referring to a name with "x.y.z" fashion? Since
>> > > > > > > we can control all the behaviors by leveraging the codes hive
>> > > > currently
>> > > > > > > use. Is it possible that we can remove such
>> > > > > > > limitations? The reason is I'm not sure if users can make the
>> > whole
>> > > > > story
>> > > > > > > work purely depending on hive catalog (that's
>> > > > > > > the reason why I gave comment #a). If multiple catalogs are
>> > > involved,
>> > > > > > with
>> > > > > > > this limitation I don't think any meaningful
>> > > > > > > pipeline could be built. For example, users want to stream
>> data
>> > > from
>> > > > > > Kafka
>> > > > > > > to Hive, fully use hive's dialect including
>> > > > > > > query part. The kafka table could be a temporary table or
>> saved
>> > in
>> > > > > > default
>> > > > > > > memory catalog.
>> > > > > > >
>> > > > > > >
>> > > > > > > [1]
>> > > > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
>> > > > > > > [2]
>> > > > > >
>> > >
>> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
>> > > > > > > [3]
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Kurt
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <lirui.fu...@gmail.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Hi guys,
>> > > > > > > >
>> > > > > > > > I'd like to start a discussion about providing HiveQL
>> > > compatibility
>> > > > > for
>> > > > > > > > users connecting to a hive warehouse. FLIP-123 has already
>> > > covered
>> > > > > most
>> > > > > > > > DDLs. So now it's time to complement the other big missing
>> part
>> > > --
>> > > > > > > queries.
>> > > > > > > > With FLIP-152, the hive dialect covers more scenarios and
>> makes
>> > > it
>> > > > > even
>> > > > > > > > easier for users to migrate to Flink. More details are in
>> the
>> > > FLIP
>> > > > > wiki
>> > > > > > > > page [1]. Looking forward to your feedback!
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Best regards!
>> > > > > > > > Rui Li
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best regards!
>> > > > > > Rui Li
>> > > > > >
>> > > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards!
>> > > > > Rui Li
>> > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > Best, Jingsong Lee
>> > > >
>> > >
>> > >
>> > > --
>> > > Best regards!
>> > > Rui Li
>> > >
>> >
>>
>
>
> --
> Best regards!
> Rui Li
>


-- 
Best regards!
Rui Li

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Reply via email to