Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Jingsong Li Tue, 02 Feb 2021 01:10:50 -0800

Thanks Rui for the proposal, I think this FLIP is required by many users,
and it is very good to traditional Hive users. I have some confusion:


# Version

Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X have
some differences?

# Hive Codes

Can you evaluate how much code we need to copy to our flink-hive-connector?
Do we need to change them? We need to maintain them anyway.

# Functions

About Hive functions, I don't think it is a limitation, we are using
HiveModule to be compatible with Hive, right? So it is a solution instead
of a limitation.

# Keywords

Do you think there will be a keyword problem? Or can we be 100% compatible
with Hive?

On the whole, the FLIP looks very good and I'm looking forward to it.

Best,
Jingsong

On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
<[email protected]> wrote:

> Thanks for the further info and explanations! I have no other concerns.
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Rui Li <[email protected]>
> Send Time:2020年12月10日(星期四) 20:35
> To:dev <[email protected]>; Zhijiang <[email protected]>
> Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>
> Hi Zhijiang,
>
> Glad to know you're interested in this FLIP. I wouldn't claim 100%
> compatibility with this FLIP. That's because Flink doesn't have the
> functionalities to support all Hive's features. To list a few examples:
>
>    1. Hive allows users to process data with shell scripts -- very similar
>    to UDFs [1]
>    2. Users can compile inline Groovy UDFs and use them in queries [2]
>    3. Users can dynamically add/delete jars, or even execute arbitrary
>    shell command [3]
>
> These features cannot be supported merely by a parser/planner, and it's
> open to discussion whether Flink even should support them at all.
>
> So the ultimate goal of this FLIP is to provide Hive syntax compatibility
> to features that are already available in Flink, which I believe will cover
> most common use cases.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> [2]
>
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> [3]
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
>
> On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[email protected]
> .invalid>
> wrote:
>
> > Thanks for launching the discussion and the FLIP, Rui!
> >
> > It is really nice to see our continuous efforts for compatibility with
> > Hive and benefiting users in this area.
> > I am only curious that are there any other compatible limitations for
> Hive
> > users after this FLIP? Or can I say that the Hive compatibility is
> > completely resolved after this FLIP?
> > I am interested in the ultimate goal in this area. Maybe it is out of
> this
> > FLIP scope, but still wish some insights from you if possible. :)
> >
> > Best,
> > Zhijiang
> >
> >
> > ------------------------------------------------------------------
> > From:Rui Li <[email protected]>
> > Send Time:2020年12月10日(星期四) 16:46
> > To:dev <[email protected]>
> > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> >
> > Thanks Kurt for your inputs!
> >
> > I agree we should extend Hive code to support non-Hive tables. I have
> > updated the wiki page to remove the limitations you mentioned, and add
> > typical use cases in the "Motivation" section.
> >
> > Regarding comment #b, the interface is defined in
> flink-table-planner-blink
> > and only used by the blink planner. So I think "BlinkParserFactory" is a
> > better name, WDYT?
> >
> > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[email protected]> wrote:
> >
> > > Thanks Rui for starting this discussion.
> > >
> > > I can see the benefit that we improve hive compatibility further, as
> > quite
> > > some users are asking for this
> > > feature in mailing lists [1][2][3] and some online chatting tools such
> as
> > > DingTalk.
> > >
> > > I have 3 comments regarding to the design doc:
> > >
> > > a) Could you add a section to describe the typical use case you want to
> > > support after this feature is introduced?
> > > In that way, users can also have an impression how to use this feature
> > and
> > > what the behavior and outcome will be.
> > >
> > > b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> > > "FlinkParserFactory".
> > >
> > > c) About the two limitations you mentioned:
> > >     1. Only works with Hive tables and the current catalog needs to be
> a
> > > HiveCatalog.
> > >     2. Queries cannot involve tables/views from multiple catalogs.
> > > I assume this is because hive parser and analyzer doesn't support
> > > referring to a name with "x.y.z" fashion? Since
> > > we can control all the behaviors by leveraging the codes hive currently
> > > use. Is it possible that we can remove such
> > > limitations? The reason is I'm not sure if users can make the whole
> story
> > > work purely depending on hive catalog (that's
> > > the reason why I gave comment #a). If multiple catalogs are involved,
> > with
> > > this limitation I don't think any meaningful
> > > pipeline could be built. For example, users want to stream data from
> > Kafka
> > > to Hive, fully use hive's dialect including
> > > query part. The kafka table could be a temporary table or saved in
> > default
> > > memory catalog.
> > >
> > >
> > > [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > [2]
> > http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > [3]
> > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[email protected]> wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I'd like to start a discussion about providing HiveQL compatibility
> for
> > > > users connecting to a hive warehouse. FLIP-123 has already covered
> most
> > > > DDLs. So now it's time to complement the other big missing part --
> > > queries.
> > > > With FLIP-152, the hive dialect covers more scenarios and makes it
> even
> > > > easier for users to migrate to Flink. More details are in the FLIP
> wiki
> > > > page [1]. Looking forward to your feedback!
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > >
> >
> >
> > --
> > Best regards!
> > Rui Li
> >
> >
>
> --
> Best regards!
> Rui Li
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Reply via email to