Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

yuxia Wed, 07 Jun 2023 20:54:30 -0700

Hi, all.
Thanks everyone for the valuable input. If there are are no further concerns 
about this FLIP[1], I would like to start voting next monday (6/12).


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Martijn Visser" <[email protected]>
收件人: "dev" <[email protected]>
发送时间: 星期二, 2023年 6 月 06日 下午 3:57:56
主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Hi Yuxia,

Thanks for the clarification. I would be +0 overall, because I think
without actually allowing creation/customization of stored procedures, the
value for the majority of Flink users will be minimal.

Best regards,

Martijn

On Tue, Jun 6, 2023 at 3:52 AM yuxia <[email protected]> wrote:

> Hi, Martijn.
> Thanks for you feedback.
> 1: In this FLIP we don't intend to allow users to customize their own
> stored procedure for we don't want to expose too much to users too early as
> the FLIP said.
> The procedures are supposed to be provided only by Catalog. Catalog devs
> can write their build-in procedures, and return the procedure in method
> Catalog.getProcedure(ObjectPath procedurePath);
> So, there won't be SQL syntax to create/save a stored procedure in this
> FLIP. If we find we do need it, we can propse the SQL syntax to create a
> stored procedure in another dedicated FLIP.
>
> 2: The syntax `Call procedure_name(xx)` proposed in this FLIP is the
> default syntax in Calcite for call stored procedures. Actaully, we don't
> need to do any modifcation in flink-sql-parser module for syntax of calling
> a procedure. MySQL[1], Postgres[2], Oracle[3] also use the syntax to call a
> stored procedure.
>
>
> [1] https://dev.mysql.com/doc/refman/8.0/en/call.html
> [2] https://www.postgresql.org/docs/15/sql-call.html
> [3] https://docs.oracle.com/javadb/10.8.3.0/ref/rrefcallprocedure.html
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Martijn Visser" <[email protected]>
> 收件人: "dev" <[email protected]>
> 发送时间: 星期一, 2023年 6 月 05日 下午 8:35:44
> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>
> Hi Yuxia,
>
> Thanks for the FLIP. I have a couple of questions:
>
> 1. The syntax talks about how to CALL or SHOW the available stored
> procedures, but not on how to create one. Will there not be a SQL syntax to
> create/save a stored procedure?
> 2. Is there a default syntax in Calcite for stored procedures? What do
> other databases do, do they use CALL/SHOW or something like EXEC, USE?
>
> Best regards,
>
> Martijn
>
> On Mon, Jun 5, 2023 at 3:23 AM yuxia <[email protected]> wrote:
>
> > Hi, Jane.
> > Thanks for you input. I think we can add the auxiliary command show
> > procedures in this FLIP.
> > Following the syntax for show functions proposed in FLIP-297.
> > The syntax will be
> > SHOW PROCEDURES [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT]
> > (LIKE | ILIKE) <sql_like_pattern> ].
> > I have updated to this FLIP.
> >
> > The other auxiliary commands maybe not suitable currently or need a
> > further/dedicated dicussion. Let's keep this FLIP focus.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "Jane Chan" <[email protected]>
> > 收件人: "dev" <[email protected]>
> > 发送时间: 星期六, 2023年 6 月 03日 下午 7:04:39
> > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >
> > Hi Yuxia,
> >
> > Thanks for bringing this to the discussion. The call procedure is a
> widely
> > used feature and will be very useful for users.
> >
> > I just have one question regarding the usage. The FLIP mentioned that
> >
> > Flink will allow connector developers to develop their own built-in
> stored
> > > procedures, and then enables users to call these predefiend stored
> > > procedures.
> > >
> > In this FLIP, we don't intend to allow users to customize their own
> stored
> > > procedure  for we don't want to expose too much to users too early.
> >
> >
> > If I understand correctly, we might need to provide some auxiliary
> commands
> > to inform users what built-in procedures are provided and how to use
> them.
> > For example, Snowflake provides commands like [1] [2], and MySQL provides
> > commands like [3] [4].
> >
> > [1] SHOW PROCEDURES,
> > https://docs.snowflake.com/en/sql-reference/sql/show-procedures
> > [2] DESCRIBE PROCEDURE <procedure_name>,
> > https://docs.snowflake.com/en/sql-reference/sql/desc-procedure
> > [3] SHOW PROCEDURE CODE，
> > https://dev.mysql.com/doc/refman/5.7/en/show-procedure-code.html
> > [4] SHOW PROCEDURE STATUS,
> > https://dev.mysql.com/doc/refman/5.7/en/show-procedure-status.html
> >
> > Best,
> > Jane
> >
> > On Sat, Jun 3, 2023 at 3:20 PM Benchao Li <[email protected]> wrote:
> >
> > > Thanks Yuxia for the explanation, it makes sense to me. It would be
> great
> > > if you also add this to the FLIP doc.
> > >
> > > yuxia <[email protected]> 于2023年6月1日周四 17:11写道：
> > >
> > > > Hi, Benchao.
> > > > Thanks for your attention.
> > > >
> > > > Initially, I also want to pass `TableEnvironment` to procedure. But
> > > > according my investegation and offline discussion with Jingson, the
> > real
> > > > important thing for procedure devs is the ability to build Flink
> > > > datastream. But we can't get the `StreamExecutionEnvironment` which
> is
> > > the
> > > > entrypoint to build datastream. That's to say we will lost the
> ability
> > to
> > > > build a datastream if we just pass `TableEnvironment`.
> > > >
> > > > Of course, we can also pass `TableEnvironment` along with
> > > > `StreamExecutionEnvironment` to Procedure. But I'm intend to be
> > cautious
> > > > about exposing too much too early to procedure devs. If someday we
> find
> > > we
> > > > will need `TableEnvironment` to custom a procedure, we can then add a
> > > > method like `getTableEnvironment()` in `ProcedureContext`.
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > ----- 原始邮件 -----
> > > > 发件人: "Benchao Li" <[email protected]>
> > > > 收件人: "dev" <[email protected]>
> > > > 发送时间: 星期四, 2023年 6 月 01日 下午 12:58:08
> > > > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > > >
> > > > Thanks Yuxia for opening this discussion,
> > > >
> > > > The general idea looks good to me, I only have one question about the
> > > > `ProcedureContext#getExecutionEnvironment`. Why are you proposing to
> > > return
> > > > a `StreamExecutionEnvironment` instead of `TableEnvironment`, could
> you
> > > > elaborate a little more on this?
> > > >
> > > > Jingsong Li <[email protected]> 于2023年5月30日周二 17:58写道：
> > > >
> > > > > Thanks for your explanation.
> > > > >
> > > > > We can support Iterable in future. Current design looks good to me.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Tue, May 30, 2023 at 4:56 PM yuxia <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > Hi, Jingsong.
> > > > > > Thanks for your feedback.
> > > > > >
> > > > > > > Does this need to be a function call? Do you have some example?
> > > > > > I think it'll be useful to support function call when user call
> > > > > procedure.
> > > > > > The following example is from iceberg:[1]
> > > > > > CALL catalog_name.system.migrate('spark_catalog.db.sample',
> > > map('foo',
> > > > > 'bar'));
> > > > > >
> > > > > > It allows user to use `map('foo', 'bar')` to pass a map data to
> > > > > procedure.
> > > > > >
> > > > > > Another case that I can imagine may be rollback a table to the
> > > snapshot
> > > > > of one week ago.
> > > > > > Then, with function call, user may call `rollback(table_name,
> > now() -
> > > > > INTERVAL '7' DAY)` to acheive such purpose.
> > > > > >
> > > > > > Although it can be function call, the eventual parameter got by
> the
> > > > > procedure will always be the literal evaluated.
> > > > > >
> > > > > >
> > > > > > > Procedure looks like a TableFunction, do you consider using
> > > Collector
> > > > > > something like TableFunction? (Supports large amount of data)
> > > > > >
> > > > > > Yes, I had considered it. But returns T[] is for simpility,
> > > > > >
> > > > > > First, regarding how to return the calling result of a procedure,
> > it
> > > > > looks more intuitive to me to use the return result of the `call`
> > > method
> > > > > instead of by calling something like collector#collect.
> > > > > > Introduce a collector will increase necessary complexity.
> > > > > >
> > > > > > Second, regarding supporting large amount of data,  acoording my
> > > > > investagtion, I haven't seen the requirement that supports
> returning
> > > > large
> > > > > amount of data.
> > > > > > Iceberg also return an array.[2] If you do think we should
> support
> > > > large
> > > > > amount of data, I think we can change to return type from T[] to
> > > > Iterable<T>
> > > > > >
> > > > > > [1]:
> > > https://iceberg.apache.org/docs/latest/spark-procedures/#migrate
> > > > > > [2]:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/iceberg/blob/601c5af9b6abded79dabeba177331310d5487f43/spark/v3.2/spark/src/main/java/org/apache/spark/sql/connector/iceberg/catalog/Procedure.java#L44
> > > > > >
> > > > > > Best regards,
> > > > > > Yuxia
> > > > > >
> > > > > > ----- 原始邮件 -----
> > > > > > 发件人: "Jingsong Li" <[email protected]>
> > > > > > 收件人: "dev" <[email protected]>
> > > > > > 发送时间: 星期一, 2023年 5 月 29日 下午 2:42:04
> > > > > > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > > > > >
> > > > > > Thanks Yuxia for the proposal.
> > > > > >
> > > > > > > CALL [catalog_name.][database_name.]procedure_name ([
> expression
> > [,
> > > > > expression]* ] )
> > > > > >
> > > > > > The expression can be a function call. Does this need to be a
> > > function
> > > > > > call? Do you have some example?
> > > > > >
> > > > > > > Procedure returns T[]
> > > > > >
> > > > > > Procedure looks like a TableFunction, do you consider using
> > Collector
> > > > > > something like TableFunction? (Supports large amount of data)
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, May 29, 2023 at 2:33 PM yuxia <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > Hi, everyone.
> > > > > > >
> > > > > > > I’d like to start a discussion about FLIP-311: Support Call
> > Stored
> > > > > Procedure [1]
> > > > > > >
> > > > > > > Stored procedure provides a convenient way to encapsulate
> complex
> > > > > logic to perform data manipulation or administrative tasks in
> > external
> > > > > storage systems. It's widely used in traditional databases and
> > popular
> > > > > compute engines like Trino for it's convenience. Therefore, we
> > propose
> > > > > adding support for call stored procedure in Flink to enable better
> > > > > integration with external storage systems.
> > > > > > >
> > > > > > > With this FLIP, Flink will allow connector developers to
> develop
> > > > their
> > > > > own built-in stored procedures, and then enables users to call
> these
> > > > > predefiend stored procedures.
> > > > > > >
> > > > > > > Looking forward to your feedbacks.
> > > > > > >
> > > > > > > [1]:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Yuxia
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Benchao Li
> > > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>

Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Reply via email to