Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Jing Ge Mon, 19 Jun 2023 05:57:49 -0700

Thanks for your reply. Nice feature!

Best regards,
Jing


On Wed, Jun 14, 2023 at 3:11 AM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:

> Yes, you're right.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Jing Ge" <j...@ververica.com.INVALID>
> 收件人: "dev" <dev@flink.apache.org>
> 发送时间: 星期三, 2023年 6 月 14日 上午 4:46:58
> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>
> Hi yuxia,
>
> Thanks for your proposal and sorry for the late reply. The FLIP is in good
> shape. If I am not mistaken, Everything, that a stored procedure could do,
> could also be done by a Flink job. The current stored procedure design is
> to empower Catalogs to provide users some commonly used logics/functions
> centrally and out-of-the-box, i.e. DRY. Is that correct?
>
> Best regards,
> Jing
>
> On Thu, Jun 8, 2023 at 10:32 AM Jark Wu <imj...@gmail.com> wrote:
>
> > Thank you for the proposal, yuxia! The FLIP looks good to me.
> >
> > Best,
> > Jark
> >
> > > 2023年6月8日 11:39，yuxia <luoyu...@alumni.sjtu.edu.cn> 写道：
> > >
> > > Hi, all.
> > > Thanks everyone for the valuable input. If there are are no further
> > concerns about this FLIP[1], I would like to start voting next monday
> > (6/12).
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > ----- 原始邮件 -----
> > > 发件人: "Martijn Visser" <martijnvis...@apache.org>
> > > 收件人: "dev" <dev@flink.apache.org>
> > > 发送时间: 星期二, 2023年 6 月 06日 下午 3:57:56
> > > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >
> > > Hi Yuxia,
> > >
> > > Thanks for the clarification. I would be +0 overall, because I think
> > > without actually allowing creation/customization of stored procedures,
> > the
> > > value for the majority of Flink users will be minimal.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Tue, Jun 6, 2023 at 3:52 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > wrote:
> > >
> > >> Hi, Martijn.
> > >> Thanks for you feedback.
> > >> 1: In this FLIP we don't intend to allow users to customize their own
> > >> stored procedure for we don't want to expose too much to users too
> > early as
> > >> the FLIP said.
> > >> The procedures are supposed to be provided only by Catalog. Catalog
> devs
> > >> can write their build-in procedures, and return the procedure in
> method
> > >> Catalog.getProcedure(ObjectPath procedurePath);
> > >> So, there won't be SQL syntax to create/save a stored procedure in
> this
> > >> FLIP. If we find we do need it, we can propse the SQL syntax to
> create a
> > >> stored procedure in another dedicated FLIP.
> > >>
> > >> 2: The syntax `Call procedure_name(xx)` proposed in this FLIP is the
> > >> default syntax in Calcite for call stored procedures. Actaully, we
> don't
> > >> need to do any modifcation in flink-sql-parser module for syntax of
> > calling
> > >> a procedure. MySQL[1], Postgres[2], Oracle[3] also use the syntax to
> > call a
> > >> stored procedure.
> > >>
> > >>
> > >> [1] https://dev.mysql.com/doc/refman/8.0/en/call.html
> > >> [2] https://www.postgresql.org/docs/15/sql-call.html
> > >> [3]
> https://docs.oracle.com/javadb/10.8.3.0/ref/rrefcallprocedure.html
> > >>
> > >> Best regards,
> > >> Yuxia
> > >>
> > >> ----- 原始邮件 -----
> > >> 发件人: "Martijn Visser" <martijnvis...@apache.org>
> > >> 收件人: "dev" <dev@flink.apache.org>
> > >> 发送时间: 星期一, 2023年 6 月 05日 下午 8:35:44
> > >> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >>
> > >> Hi Yuxia,
> > >>
> > >> Thanks for the FLIP. I have a couple of questions:
> > >>
> > >> 1. The syntax talks about how to CALL or SHOW the available stored
> > >> procedures, but not on how to create one. Will there not be a SQL
> > syntax to
> > >> create/save a stored procedure?
> > >> 2. Is there a default syntax in Calcite for stored procedures? What do
> > >> other databases do, do they use CALL/SHOW or something like EXEC, USE?
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Mon, Jun 5, 2023 at 3:23 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > wrote:
> > >>
> > >>> Hi, Jane.
> > >>> Thanks for you input. I think we can add the auxiliary command show
> > >>> procedures in this FLIP.
> > >>> Following the syntax for show functions proposed in FLIP-297.
> > >>> The syntax will be
> > >>> SHOW PROCEDURES [ ( FROM | IN ) [catalog_name.]database_name ] [
> [NOT]
> > >>> (LIKE | ILIKE) <sql_like_pattern> ].
> > >>> I have updated to this FLIP.
> > >>>
> > >>> The other auxiliary commands maybe not suitable currently or need a
> > >>> further/dedicated dicussion. Let's keep this FLIP focus.
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
> > >>>
> > >>> Best regards,
> > >>> Yuxia
> > >>>
> > >>> ----- 原始邮件 -----
> > >>> 发件人: "Jane Chan" <qingyue....@gmail.com>
> > >>> 收件人: "dev" <dev@flink.apache.org>
> > >>> 发送时间: 星期六, 2023年 6 月 03日 下午 7:04:39
> > >>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >>>
> > >>> Hi Yuxia,
> > >>>
> > >>> Thanks for bringing this to the discussion. The call procedure is a
> > >> widely
> > >>> used feature and will be very useful for users.
> > >>>
> > >>> I just have one question regarding the usage. The FLIP mentioned that
> > >>>
> > >>> Flink will allow connector developers to develop their own built-in
> > >> stored
> > >>>> procedures, and then enables users to call these predefiend stored
> > >>>> procedures.
> > >>>>
> > >>> In this FLIP, we don't intend to allow users to customize their own
> > >> stored
> > >>>> procedure  for we don't want to expose too much to users too early.
> > >>>
> > >>>
> > >>> If I understand correctly, we might need to provide some auxiliary
> > >> commands
> > >>> to inform users what built-in procedures are provided and how to use
> > >> them.
> > >>> For example, Snowflake provides commands like [1] [2], and MySQL
> > provides
> > >>> commands like [3] [4].
> > >>>
> > >>> [1] SHOW PROCEDURES,
> > >>> https://docs.snowflake.com/en/sql-reference/sql/show-procedures
> > >>> [2] DESCRIBE PROCEDURE <procedure_name>,
> > >>> https://docs.snowflake.com/en/sql-reference/sql/desc-procedure
> > >>> [3] SHOW PROCEDURE CODE，
> > >>> https://dev.mysql.com/doc/refman/5.7/en/show-procedure-code.html
> > >>> [4] SHOW PROCEDURE STATUS,
> > >>> https://dev.mysql.com/doc/refman/5.7/en/show-procedure-status.html
> > >>>
> > >>> Best,
> > >>> Jane
> > >>>
> > >>> On Sat, Jun 3, 2023 at 3:20 PM Benchao Li <libenc...@apache.org>
> > wrote:
> > >>>
> > >>>> Thanks Yuxia for the explanation, it makes sense to me. It would be
> > >> great
> > >>>> if you also add this to the FLIP doc.
> > >>>>
> > >>>> yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年6月1日周四 17:11写道：
> > >>>>
> > >>>>> Hi, Benchao.
> > >>>>> Thanks for your attention.
> > >>>>>
> > >>>>> Initially, I also want to pass `TableEnvironment` to procedure. But
> > >>>>> according my investegation and offline discussion with Jingson, the
> > >>> real
> > >>>>> important thing for procedure devs is the ability to build Flink
> > >>>>> datastream. But we can't get the `StreamExecutionEnvironment` which
> > >> is
> > >>>> the
> > >>>>> entrypoint to build datastream. That's to say we will lost the
> > >> ability
> > >>> to
> > >>>>> build a datastream if we just pass `TableEnvironment`.
> > >>>>>
> > >>>>> Of course, we can also pass `TableEnvironment` along with
> > >>>>> `StreamExecutionEnvironment` to Procedure. But I'm intend to be
> > >>> cautious
> > >>>>> about exposing too much too early to procedure devs. If someday we
> > >> find
> > >>>> we
> > >>>>> will need `TableEnvironment` to custom a procedure, we can then
> add a
> > >>>>> method like `getTableEnvironment()` in `ProcedureContext`.
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Yuxia
> > >>>>>
> > >>>>> ----- 原始邮件 -----
> > >>>>> 发件人: "Benchao Li" <libenc...@apache.org>
> > >>>>> 收件人: "dev" <dev@flink.apache.org>
> > >>>>> 发送时间: 星期四, 2023年 6 月 01日 下午 12:58:08
> > >>>>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >>>>>
> > >>>>> Thanks Yuxia for opening this discussion,
> > >>>>>
> > >>>>> The general idea looks good to me, I only have one question about
> the
> > >>>>> `ProcedureContext#getExecutionEnvironment`. Why are you proposing
> to
> > >>>> return
> > >>>>> a `StreamExecutionEnvironment` instead of `TableEnvironment`, could
> > >> you
> > >>>>> elaborate a little more on this?
> > >>>>>
> > >>>>> Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 17:58写道：
> > >>>>>
> > >>>>>> Thanks for your explanation.
> > >>>>>>
> > >>>>>> We can support Iterable in future. Current design looks good to
> me.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Jingsong
> > >>>>>>
> > >>>>>> On Tue, May 30, 2023 at 4:56 PM yuxia <
> luoyu...@alumni.sjtu.edu.cn
> > >>>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi, Jingsong.
> > >>>>>>> Thanks for your feedback.
> > >>>>>>>
> > >>>>>>>> Does this need to be a function call? Do you have some example?
> > >>>>>>> I think it'll be useful to support function call when user call
> > >>>>>> procedure.
> > >>>>>>> The following example is from iceberg:[1]
> > >>>>>>> CALL catalog_name.system.migrate('spark_catalog.db.sample',
> > >>>> map('foo',
> > >>>>>> 'bar'));
> > >>>>>>>
> > >>>>>>> It allows user to use `map('foo', 'bar')` to pass a map data to
> > >>>>>> procedure.
> > >>>>>>>
> > >>>>>>> Another case that I can imagine may be rollback a table to the
> > >>>> snapshot
> > >>>>>> of one week ago.
> > >>>>>>> Then, with function call, user may call `rollback(table_name,
> > >>> now() -
> > >>>>>> INTERVAL '7' DAY)` to acheive such purpose.
> > >>>>>>>
> > >>>>>>> Although it can be function call, the eventual parameter got by
> > >> the
> > >>>>>> procedure will always be the literal evaluated.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Procedure looks like a TableFunction, do you consider using
> > >>>> Collector
> > >>>>>>> something like TableFunction? (Supports large amount of data)
> > >>>>>>>
> > >>>>>>> Yes, I had considered it. But returns T[] is for simpility,
> > >>>>>>>
> > >>>>>>> First, regarding how to return the calling result of a procedure,
> > >>> it
> > >>>>>> looks more intuitive to me to use the return result of the `call`
> > >>>> method
> > >>>>>> instead of by calling something like collector#collect.
> > >>>>>>> Introduce a collector will increase necessary complexity.
> > >>>>>>>
> > >>>>>>> Second, regarding supporting large amount of data,  acoording my
> > >>>>>> investagtion, I haven't seen the requirement that supports
> > >> returning
> > >>>>> large
> > >>>>>> amount of data.
> > >>>>>>> Iceberg also return an array.[2] If you do think we should
> > >> support
> > >>>>> large
> > >>>>>> amount of data, I think we can change to return type from T[] to
> > >>>>> Iterable<T>
> > >>>>>>>
> > >>>>>>> [1]:
> > >>>> https://iceberg.apache.org/docs/latest/spark-procedures/#migrate
> > >>>>>>> [2]:
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/iceberg/blob/601c5af9b6abded79dabeba177331310d5487f43/spark/v3.2/spark/src/main/java/org/apache/spark/sql/connector/iceberg/catalog/Procedure.java#L44
> > >>>>>>>
> > >>>>>>> Best regards,
> > >>>>>>> Yuxia
> > >>>>>>>
> > >>>>>>> ----- 原始邮件 -----
> > >>>>>>> 发件人: "Jingsong Li" <jingsongl...@gmail.com>
> > >>>>>>> 收件人: "dev" <dev@flink.apache.org>
> > >>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 2:42:04
> > >>>>>>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >>>>>>>
> > >>>>>>> Thanks Yuxia for the proposal.
> > >>>>>>>
> > >>>>>>>> CALL [catalog_name.][database_name.]procedure_name ([
> > >> expression
> > >>> [,
> > >>>>>> expression]* ] )
> > >>>>>>>
> > >>>>>>> The expression can be a function call. Does this need to be a
> > >>>> function
> > >>>>>>> call? Do you have some example?
> > >>>>>>>
> > >>>>>>>> Procedure returns T[]
> > >>>>>>>
> > >>>>>>> Procedure looks like a TableFunction, do you consider using
> > >>> Collector
> > >>>>>>> something like TableFunction? (Supports large amount of data)
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jingsong
> > >>>>>>>
> > >>>>>>> On Mon, May 29, 2023 at 2:33 PM yuxia <
> > >> luoyu...@alumni.sjtu.edu.cn
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi, everyone.
> > >>>>>>>>
> > >>>>>>>> I’d like to start a discussion about FLIP-311: Support Call
> > >>> Stored
> > >>>>>> Procedure [1]
> > >>>>>>>>
> > >>>>>>>> Stored procedure provides a convenient way to encapsulate
> > >> complex
> > >>>>>> logic to perform data manipulation or administrative tasks in
> > >>> external
> > >>>>>> storage systems. It's widely used in traditional databases and
> > >>> popular
> > >>>>>> compute engines like Trino for it's convenience. Therefore, we
> > >>> propose
> > >>>>>> adding support for call stored procedure in Flink to enable better
> > >>>>>> integration with external storage systems.
> > >>>>>>>>
> > >>>>>>>> With this FLIP, Flink will allow connector developers to
> > >> develop
> > >>>>> their
> > >>>>>> own built-in stored procedures, and then enables users to call
> > >> these
> > >>>>>> predefiend stored procedures.
> > >>>>>>>>
> > >>>>>>>> Looking forward to your feedbacks.
> > >>>>>>>>
> > >>>>>>>> [1]:
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> > >>>>>>>>
> > >>>>>>>> Best regards,
> > >>>>>>>> Yuxia
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>>
> > >>>>> Best,
> > >>>>> Benchao Li
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>> Best,
> > >>>> Benchao Li
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Reply via email to