Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Zhijiang Thu, 10 Dec 2020 19:35:55 -0800

Thanks for the further info and explanations! I have no other concerns.

Best,
Zhijiang



------------------------------------------------------------------
From:Rui Li <lirui.fu...@gmail.com>
Send Time:2020年12月10日(星期四) 20:35
To:dev <dev@flink.apache.org>; Zhijiang <wangzhijiang...@aliyun.com>
Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Hi Zhijiang,

Glad to know you're interested in this FLIP. I wouldn't claim 100%
compatibility with this FLIP. That's because Flink doesn't have the
functionalities to support all Hive's features. To list a few examples:

   1. Hive allows users to process data with shell scripts -- very similar
   to UDFs [1]
   2. Users can compile inline Groovy UDFs and use them in queries [2]
   3. Users can dynamically add/delete jars, or even execute arbitrary
   shell command [3]

These features cannot be supported merely by a parser/planner, and it's
open to discussion whether Flink even should support them at all.

So the ultimate goal of this FLIP is to provide Hive syntax compatibility
to features that are already available in Flink, which I believe will cover
most common use cases.

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
[2]
https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
[3]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands

On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <wangzhijiang...@aliyun.com.invalid>
wrote:

> Thanks for launching the discussion and the FLIP, Rui!
>
> It is really nice to see our continuous efforts for compatibility with
> Hive and benefiting users in this area.
> I am only curious that are there any other compatible limitations for Hive
> users after this FLIP? Or can I say that the Hive compatibility is
> completely resolved after this FLIP?
> I am interested in the ultimate goal in this area. Maybe it is out of this
> FLIP scope, but still wish some insights from you if possible. :)
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Rui Li <lirui.fu...@gmail.com>
> Send Time:2020年12月10日(星期四) 16:46
> To:dev <dev@flink.apache.org>
> Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>
> Thanks Kurt for your inputs!
>
> I agree we should extend Hive code to support non-Hive tables. I have
> updated the wiki page to remove the limitations you mentioned, and add
> typical use cases in the "Motivation" section.
>
> Regarding comment #b, the interface is defined in flink-table-planner-blink
> and only used by the blink planner. So I think "BlinkParserFactory" is a
> better name, WDYT?
>
> On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <ykt...@gmail.com> wrote:
>
> > Thanks Rui for starting this discussion.
> >
> > I can see the benefit that we improve hive compatibility further, as
> quite
> > some users are asking for this
> > feature in mailing lists [1][2][3] and some online chatting tools such as
> > DingTalk.
> >
> > I have 3 comments regarding to the design doc:
> >
> > a) Could you add a section to describe the typical use case you want to
> > support after this feature is introduced?
> > In that way, users can also have an impression how to use this feature
> and
> > what the behavior and outcome will be.
> >
> > b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> > "FlinkParserFactory".
> >
> > c) About the two limitations you mentioned:
> >     1. Only works with Hive tables and the current catalog needs to be a
> > HiveCatalog.
> >     2. Queries cannot involve tables/views from multiple catalogs.
> > I assume this is because hive parser and analyzer doesn't support
> > referring to a name with "x.y.z" fashion? Since
> > we can control all the behaviors by leveraging the codes hive currently
> > use. Is it possible that we can remove such
> > limitations? The reason is I'm not sure if users can make the whole story
> > work purely depending on hive catalog (that's
> > the reason why I gave comment #a). If multiple catalogs are involved,
> with
> > this limitation I don't think any meaningful
> > pipeline could be built. For example, users want to stream data from
> Kafka
> > to Hive, fully use hive's dialect including
> > query part. The kafka table could be a temporary table or saved in
> default
> > memory catalog.
> >
> >
> > [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > [2]
> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > [3]
> >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <lirui.fu...@gmail.com> wrote:
> >
> > > Hi guys,
> > >
> > > I'd like to start a discussion about providing HiveQL compatibility for
> > > users connecting to a hive warehouse. FLIP-123 has already covered most
> > > DDLs. So now it's time to complement the other big missing part --
> > queries.
> > > With FLIP-152, the hive dialect covers more scenarios and makes it even
> > > easier for users to migrate to Flink. More details are in the FLIP wiki
> > > page [1]. Looking forward to your feedback!
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Reply via email to