Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-13 Thread Wei Zhong
Hi all,

Thanks for all of your response.

If there's no more comments, I would like to bring up the VOTE.

Best,
Wei

> 在 2020年3月13日,20:50,Xingbo Huang  写道:
> 
> Hi Wei,
> Thanks a lot for drafting the FLIP and kicking off the discussion.
> Big +1 for this feature.
> This feature will greatly facilitate PyFlink users to use Python UDF in SQL
> scenarios.
> 
> Best,
> Xingbo
> 
> Hequn Cheng  于2020年3月13日周五 下午5:10写道:
> 
>> Big +1 on this feature! It would be great to extend the usage of Python UDF
>> in SQL scenarios.
>> The design doc looks good from my side now. Thank you for the update.
>> 
>> Best,
>> Hequn
>> 
>> On Tue, Mar 10, 2020 at 3:50 PM Wei Zhong  wrote:
>> 
>>> Hi Timo,
>>> 
>>> Thanks for your reply.
>>> 
>>> If we aim for the option 1, it makes sense for me to include the change
>> in
>>> this FLIP as the option 1 does not change any public API. I'll update the
>>> FLIP page to illustrate this.
>>> 
>>> Best,
>>> Wei
>>> 
 在 2020年3月9日,17:58,Timo Walther  写道:
 
 Hi Wei,
 
 I agree with Dawid that we should defer the instantiation of temporary
>>> functions to compile time. In the long-term, we would like to integrate
>>> FunctionCatalog as a component of CatalogManager and unify the handling
>> of
>>> catalog objects as much as possible.
 
 We should aim for your proposed option 1. For fluent definition of
>>> functions in Table API, we would still like to offer passing instances
>> like
>>> `t.select(call(new ScalarFunction() { ... }))` that would be registered
>> as
>>> temporary system functions.
 
 Regrds,
 Timo
 
 
 On 09.03.20 09:24, Wei Zhong wrote:
> Hi Dawid,
> I think defering the instantiation of temporary functions to compile
>>> time is quite a good idea but needs further discussion. As it is
>> orthogonal
>>> with this FLIP, we could continue the discussion in a new thread later.
>>> What do you think?
> Best,
> Wei
>> 在 2020年3月5日,21:11,Wei Zhong  写道:
>> 
>> Hi Dawid,
>> 
>> Thanks for your suggestion.
>> 
>> After some investigation, there are two designs in my mind about how
>>> to defer the instantiation of temporary system function and temporary
>>> catalog function to compile time.
>> 
>> 1. FunctionCatalog accepts both FunctionDefinitions and
>> uninstantiated
>>> temporary functions. The uninstantiated temporary functions will be
>>> instantiated when compiling. There is no public API change in this
>> design,
>>> but the FunctionCatalog needs to store and process both
>> FunctionDefinitions
>>> and uninstantiated temporary functions.
>> 
>> 2. FunctionCatalog accepts only uninstantiated temporary functions.
>> In
>>> this design we need to remove those APIs that accepts FunctionDefinitions
>>> from TableEnvironment, i.e. `void createTemporaryFunction(String path,
>>> UserDefinedFunction functionInstance)` and `void
>>> createTemporarySystemFunction(String name, UserDefinedFunction
>>> functionInstance)`. But the FunctionCatalog only needs to store and
>> process
>>> uninstantiated temporary functions.
>> 
>> As I don't know the details about the plan to store temporary
>>> functions as catalog functions instead of FunctionDefinitions, I'm not
>> sure
>>> which solution fits more. It would be great if you could share more
>> details
>>> or share some thoughts on these two solutions?
>> 
>> Best,
>> Wei
>> 
>>> 在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
>>> 
>>> Hi all,
>>> I had a really quick look and from my perspective the proposal looks
>>> fine.
>>> I share Jarks opinion that the instantiation could be done at a
>> later
>>> stage. I agree with Wei it requires some changes in the internal
>>> implementation of the FunctionCatalog, to store temporary functions
>> as
>>> catalog functions instead of FunctionDefinitions, but we have that
>> on
>>> our
>>> agenda anyway. I would suggest investigating if we could do that as
>>> part of
>>> this flip already. Nevertheless this in theory can be also done
>> later.
>>> 
>>> Best,
>>> Dawid
>>> 
>>> On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
>>> 
 Thanks for the explanation, Wei!
 
 On Mon, 2 Mar 2020 at 20:59, Wei Zhong 
>>> wrote:
 
> Hi Jark,
> 
> Thanks for your suggestion.
> 
> Actually, the timing of starting a Python process depends on the
>> UDF
 type,
> because the Python process is used to provide the necessary
>>> information
 to
> instantiate the FunctionDefinition object of the Python UDF. For
>>> catalog
> function, the FunctionDefinition will be instantiated when
>>> compiling the
> job, which means the Python process is required during the
>>> compilation
> instead of the registeration. For temporary system function and
>>> temporary
> catalog function, the FunctionDefinition will be

Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-13 Thread Xingbo Huang
Hi Wei,
Thanks a lot for drafting the FLIP and kicking off the discussion.
Big +1 for this feature.
This feature will greatly facilitate PyFlink users to use Python UDF in SQL
scenarios.

Best,
Xingbo

Hequn Cheng  于2020年3月13日周五 下午5:10写道:

> Big +1 on this feature! It would be great to extend the usage of Python UDF
> in SQL scenarios.
> The design doc looks good from my side now. Thank you for the update.
>
> Best,
> Hequn
>
> On Tue, Mar 10, 2020 at 3:50 PM Wei Zhong  wrote:
>
> > Hi Timo,
> >
> > Thanks for your reply.
> >
> > If we aim for the option 1, it makes sense for me to include the change
> in
> > this FLIP as the option 1 does not change any public API. I'll update the
> > FLIP page to illustrate this.
> >
> > Best,
> > Wei
> >
> > > 在 2020年3月9日,17:58,Timo Walther  写道:
> > >
> > > Hi Wei,
> > >
> > > I agree with Dawid that we should defer the instantiation of temporary
> > functions to compile time. In the long-term, we would like to integrate
> > FunctionCatalog as a component of CatalogManager and unify the handling
> of
> > catalog objects as much as possible.
> > >
> > > We should aim for your proposed option 1. For fluent definition of
> > functions in Table API, we would still like to offer passing instances
> like
> > `t.select(call(new ScalarFunction() { ... }))` that would be registered
> as
> > temporary system functions.
> > >
> > > Regrds,
> > > Timo
> > >
> > >
> > > On 09.03.20 09:24, Wei Zhong wrote:
> > >> Hi Dawid,
> > >> I think defering the instantiation of temporary functions to compile
> > time is quite a good idea but needs further discussion. As it is
> orthogonal
> > with this FLIP, we could continue the discussion in a new thread later.
> > What do you think?
> > >> Best,
> > >> Wei
> > >>> 在 2020年3月5日,21:11,Wei Zhong  写道:
> > >>>
> > >>> Hi Dawid,
> > >>>
> > >>> Thanks for your suggestion.
> > >>>
> > >>> After some investigation, there are two designs in my mind about how
> > to defer the instantiation of temporary system function and temporary
> > catalog function to compile time.
> > >>>
> > >>> 1. FunctionCatalog accepts both FunctionDefinitions and
> uninstantiated
> > temporary functions. The uninstantiated temporary functions will be
> > instantiated when compiling. There is no public API change in this
> design,
> > but the FunctionCatalog needs to store and process both
> FunctionDefinitions
> > and uninstantiated temporary functions.
> > >>>
> > >>> 2. FunctionCatalog accepts only uninstantiated temporary functions.
> In
> > this design we need to remove those APIs that accepts FunctionDefinitions
> > from TableEnvironment, i.e. `void createTemporaryFunction(String path,
> > UserDefinedFunction functionInstance)` and `void
> > createTemporarySystemFunction(String name, UserDefinedFunction
> > functionInstance)`. But the FunctionCatalog only needs to store and
> process
> > uninstantiated temporary functions.
> > >>>
> > >>> As I don't know the details about the plan to store temporary
> > functions as catalog functions instead of FunctionDefinitions, I'm not
> sure
> > which solution fits more. It would be great if you could share more
> details
> > or share some thoughts on these two solutions?
> > >>>
> > >>> Best,
> > >>> Wei
> > >>>
> >  在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
> > 
> >  Hi all,
> >  I had a really quick look and from my perspective the proposal looks
> > fine.
> >  I share Jarks opinion that the instantiation could be done at a
> later
> >  stage. I agree with Wei it requires some changes in the internal
> >  implementation of the FunctionCatalog, to store temporary functions
> as
> >  catalog functions instead of FunctionDefinitions, but we have that
> on
> > our
> >  agenda anyway. I would suggest investigating if we could do that as
> > part of
> >  this flip already. Nevertheless this in theory can be also done
> later.
> > 
> >  Best,
> >  Dawid
> > 
> >  On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
> > 
> > > Thanks for the explanation, Wei!
> > >
> > > On Mon, 2 Mar 2020 at 20:59, Wei Zhong 
> > wrote:
> > >
> > >> Hi Jark,
> > >>
> > >> Thanks for your suggestion.
> > >>
> > >> Actually, the timing of starting a Python process depends on the
> UDF
> > > type,
> > >> because the Python process is used to provide the necessary
> > information
> > > to
> > >> instantiate the FunctionDefinition object of the Python UDF. For
> > catalog
> > >> function, the FunctionDefinition will be instantiated when
> > compiling the
> > >> job, which means the Python process is required during the
> > compilation
> > >> instead of the registeration. For temporary system function and
> > temporary
> > >> catalog function, the FunctionDefinition will be instantiated
> > during the
> > >> UDF registeration, so the Python process need to be started at
> that
> > time.
> > >>
> > >> But this FLIP will only suppo

Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-13 Thread Hequn Cheng
Big +1 on this feature! It would be great to extend the usage of Python UDF
in SQL scenarios.
The design doc looks good from my side now. Thank you for the update.

Best,
Hequn

On Tue, Mar 10, 2020 at 3:50 PM Wei Zhong  wrote:

> Hi Timo,
>
> Thanks for your reply.
>
> If we aim for the option 1, it makes sense for me to include the change in
> this FLIP as the option 1 does not change any public API. I'll update the
> FLIP page to illustrate this.
>
> Best,
> Wei
>
> > 在 2020年3月9日,17:58,Timo Walther  写道:
> >
> > Hi Wei,
> >
> > I agree with Dawid that we should defer the instantiation of temporary
> functions to compile time. In the long-term, we would like to integrate
> FunctionCatalog as a component of CatalogManager and unify the handling of
> catalog objects as much as possible.
> >
> > We should aim for your proposed option 1. For fluent definition of
> functions in Table API, we would still like to offer passing instances like
> `t.select(call(new ScalarFunction() { ... }))` that would be registered as
> temporary system functions.
> >
> > Regrds,
> > Timo
> >
> >
> > On 09.03.20 09:24, Wei Zhong wrote:
> >> Hi Dawid,
> >> I think defering the instantiation of temporary functions to compile
> time is quite a good idea but needs further discussion. As it is orthogonal
> with this FLIP, we could continue the discussion in a new thread later.
> What do you think?
> >> Best,
> >> Wei
> >>> 在 2020年3月5日,21:11,Wei Zhong  写道:
> >>>
> >>> Hi Dawid,
> >>>
> >>> Thanks for your suggestion.
> >>>
> >>> After some investigation, there are two designs in my mind about how
> to defer the instantiation of temporary system function and temporary
> catalog function to compile time.
> >>>
> >>> 1. FunctionCatalog accepts both FunctionDefinitions and uninstantiated
> temporary functions. The uninstantiated temporary functions will be
> instantiated when compiling. There is no public API change in this design,
> but the FunctionCatalog needs to store and process both FunctionDefinitions
> and uninstantiated temporary functions.
> >>>
> >>> 2. FunctionCatalog accepts only uninstantiated temporary functions. In
> this design we need to remove those APIs that accepts FunctionDefinitions
> from TableEnvironment, i.e. `void createTemporaryFunction(String path,
> UserDefinedFunction functionInstance)` and `void
> createTemporarySystemFunction(String name, UserDefinedFunction
> functionInstance)`. But the FunctionCatalog only needs to store and process
> uninstantiated temporary functions.
> >>>
> >>> As I don't know the details about the plan to store temporary
> functions as catalog functions instead of FunctionDefinitions, I'm not sure
> which solution fits more. It would be great if you could share more details
> or share some thoughts on these two solutions?
> >>>
> >>> Best,
> >>> Wei
> >>>
>  在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
> 
>  Hi all,
>  I had a really quick look and from my perspective the proposal looks
> fine.
>  I share Jarks opinion that the instantiation could be done at a later
>  stage. I agree with Wei it requires some changes in the internal
>  implementation of the FunctionCatalog, to store temporary functions as
>  catalog functions instead of FunctionDefinitions, but we have that on
> our
>  agenda anyway. I would suggest investigating if we could do that as
> part of
>  this flip already. Nevertheless this in theory can be also done later.
> 
>  Best,
>  Dawid
> 
>  On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
> 
> > Thanks for the explanation, Wei!
> >
> > On Mon, 2 Mar 2020 at 20:59, Wei Zhong 
> wrote:
> >
> >> Hi Jark,
> >>
> >> Thanks for your suggestion.
> >>
> >> Actually, the timing of starting a Python process depends on the UDF
> > type,
> >> because the Python process is used to provide the necessary
> information
> > to
> >> instantiate the FunctionDefinition object of the Python UDF. For
> catalog
> >> function, the FunctionDefinition will be instantiated when
> compiling the
> >> job, which means the Python process is required during the
> compilation
> >> instead of the registeration. For temporary system function and
> temporary
> >> catalog function, the FunctionDefinition will be instantiated
> during the
> >> UDF registeration, so the Python process need to be started at that
> time.
> >>
> >> But this FLIP will only support registering the temporary system
> function
> >> and temporary catalog function in SQL DDL because registering
> Python UDF
> > to
> >> catalog is not supported yet. We plan to support the registeration
> of
> >> Python catalog function (via Table API and SQL DDL) in a separate
> FLIP.
> >> I'll add a non-goal section to the FLIP page to illustrate this.
> >>
> >> Best,
> >> Wei
> >>
> >>
> >>> 在 2020年3月2日,15:11,Jark Wu  写道:
> >>>
> >>> Hi Weizhong,
> >>>
> >>

Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-10 Thread Wei Zhong
Hi Timo,

Thanks for your reply.

If we aim for the option 1, it makes sense for me to include the change in this 
FLIP as the option 1 does not change any public API. I'll update the FLIP page 
to illustrate this.

Best,
Wei

> 在 2020年3月9日,17:58,Timo Walther  写道:
> 
> Hi Wei,
> 
> I agree with Dawid that we should defer the instantiation of temporary 
> functions to compile time. In the long-term, we would like to integrate 
> FunctionCatalog as a component of CatalogManager and unify the handling of 
> catalog objects as much as possible.
> 
> We should aim for your proposed option 1. For fluent definition of functions 
> in Table API, we would still like to offer passing instances like 
> `t.select(call(new ScalarFunction() { ... }))` that would be registered as 
> temporary system functions.
> 
> Regrds,
> Timo
> 
> 
> On 09.03.20 09:24, Wei Zhong wrote:
>> Hi Dawid,
>> I think defering the instantiation of temporary functions to compile time is 
>> quite a good idea but needs further discussion. As it is orthogonal with 
>> this FLIP, we could continue the discussion in a new thread later. What do 
>> you think?
>> Best,
>> Wei
>>> 在 2020年3月5日,21:11,Wei Zhong  写道:
>>> 
>>> Hi Dawid,
>>> 
>>> Thanks for your suggestion.
>>> 
>>> After some investigation, there are two designs in my mind about how to 
>>> defer the instantiation of temporary system function and temporary catalog 
>>> function to compile time.
>>> 
>>> 1. FunctionCatalog accepts both FunctionDefinitions and uninstantiated 
>>> temporary functions. The uninstantiated temporary functions will be 
>>> instantiated when compiling. There is no public API change in this design, 
>>> but the FunctionCatalog needs to store and process both FunctionDefinitions 
>>> and uninstantiated temporary functions.
>>> 
>>> 2. FunctionCatalog accepts only uninstantiated temporary functions. In this 
>>> design we need to remove those APIs that accepts FunctionDefinitions from 
>>> TableEnvironment, i.e. `void createTemporaryFunction(String path, 
>>> UserDefinedFunction functionInstance)` and `void 
>>> createTemporarySystemFunction(String name, UserDefinedFunction 
>>> functionInstance)`. But the FunctionCatalog only needs to store and process 
>>> uninstantiated temporary functions.
>>> 
>>> As I don't know the details about the plan to store temporary functions as 
>>> catalog functions instead of FunctionDefinitions, I'm not sure which 
>>> solution fits more. It would be great if you could share more details or 
>>> share some thoughts on these two solutions?
>>> 
>>> Best,
>>> Wei
>>> 
 在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
 
 Hi all,
 I had a really quick look and from my perspective the proposal looks fine.
 I share Jarks opinion that the instantiation could be done at a later
 stage. I agree with Wei it requires some changes in the internal
 implementation of the FunctionCatalog, to store temporary functions as
 catalog functions instead of FunctionDefinitions, but we have that on our
 agenda anyway. I would suggest investigating if we could do that as part of
 this flip already. Nevertheless this in theory can be also done later.
 
 Best,
 Dawid
 
 On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
 
> Thanks for the explanation, Wei!
> 
> On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:
> 
>> Hi Jark,
>> 
>> Thanks for your suggestion.
>> 
>> Actually, the timing of starting a Python process depends on the UDF
> type,
>> because the Python process is used to provide the necessary information
> to
>> instantiate the FunctionDefinition object of the Python UDF. For catalog
>> function, the FunctionDefinition will be instantiated when compiling the
>> job, which means the Python process is required during the compilation
>> instead of the registeration. For temporary system function and temporary
>> catalog function, the FunctionDefinition will be instantiated during the
>> UDF registeration, so the Python process need to be started at that time.
>> 
>> But this FLIP will only support registering the temporary system function
>> and temporary catalog function in SQL DDL because registering Python UDF
> to
>> catalog is not supported yet. We plan to support the registeration of
>> Python catalog function (via Table API and SQL DDL) in a separate FLIP.
>> I'll add a non-goal section to the FLIP page to illustrate this.
>> 
>> Best,
>> Wei
>> 
>> 
>>> 在 2020年3月2日,15:11,Jark Wu  写道:
>>> 
>>> Hi Weizhong,
>>> 
>>> Thanks for proposing this feature. In geneal, I'm +1 from the table's
>> view.
>>> 
>>> I have one suggestion: I think the register python function into
> catalog
>>> doesn't need to startup python process (the "High Level Sequence
> Diagram"
>>> in your FLIP).
>>> Because only meta-information is persisted into catalog,

Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-09 Thread Timo Walther

Hi Wei,

I agree with Dawid that we should defer the instantiation of temporary 
functions to compile time. In the long-term, we would like to integrate 
FunctionCatalog as a component of CatalogManager and unify the handling 
of catalog objects as much as possible.


We should aim for your proposed option 1. For fluent definition of 
functions in Table API, we would still like to offer passing instances 
like `t.select(call(new ScalarFunction() { ... }))` that would be 
registered as temporary system functions.


Regrds,
Timo


On 09.03.20 09:24, Wei Zhong wrote:

Hi Dawid,

I think defering the instantiation of temporary functions to compile time is 
quite a good idea but needs further discussion. As it is orthogonal with this 
FLIP, we could continue the discussion in a new thread later. What do you think?

Best,
Wei


在 2020年3月5日,21:11,Wei Zhong  写道:

Hi Dawid,

Thanks for your suggestion.

After some investigation, there are two designs in my mind about how to defer 
the instantiation of temporary system function and temporary catalog function 
to compile time.

1. FunctionCatalog accepts both FunctionDefinitions and uninstantiated 
temporary functions. The uninstantiated temporary functions will be 
instantiated when compiling. There is no public API change in this design, but 
the FunctionCatalog needs to store and process both FunctionDefinitions and 
uninstantiated temporary functions.

2. FunctionCatalog accepts only uninstantiated temporary functions. In this 
design we need to remove those APIs that accepts FunctionDefinitions from 
TableEnvironment, i.e. `void createTemporaryFunction(String path, 
UserDefinedFunction functionInstance)` and `void 
createTemporarySystemFunction(String name, UserDefinedFunction 
functionInstance)`. But the FunctionCatalog only needs to store and process 
uninstantiated temporary functions.

As I don't know the details about the plan to store temporary functions as 
catalog functions instead of FunctionDefinitions, I'm not sure which solution 
fits more. It would be great if you could share more details or share some 
thoughts on these two solutions?

Best,
Wei


在 2020年3月4日,16:17,Dawid Wysakowicz  写道:

Hi all,
I had a really quick look and from my perspective the proposal looks fine.
I share Jarks opinion that the instantiation could be done at a later
stage. I agree with Wei it requires some changes in the internal
implementation of the FunctionCatalog, to store temporary functions as
catalog functions instead of FunctionDefinitions, but we have that on our
agenda anyway. I would suggest investigating if we could do that as part of
this flip already. Nevertheless this in theory can be also done later.

Best,
Dawid

On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:


Thanks for the explanation, Wei!

On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:


Hi Jark,

Thanks for your suggestion.

Actually, the timing of starting a Python process depends on the UDF

type,

because the Python process is used to provide the necessary information

to

instantiate the FunctionDefinition object of the Python UDF. For catalog
function, the FunctionDefinition will be instantiated when compiling the
job, which means the Python process is required during the compilation
instead of the registeration. For temporary system function and temporary
catalog function, the FunctionDefinition will be instantiated during the
UDF registeration, so the Python process need to be started at that time.

But this FLIP will only support registering the temporary system function
and temporary catalog function in SQL DDL because registering Python UDF

to

catalog is not supported yet. We plan to support the registeration of
Python catalog function (via Table API and SQL DDL) in a separate FLIP.
I'll add a non-goal section to the FLIP page to illustrate this.

Best,
Wei



在 2020年3月2日,15:11,Jark Wu  写道:

Hi Weizhong,

Thanks for proposing this feature. In geneal, I'm +1 from the table's

view.


I have one suggestion: I think the register python function into

catalog

doesn't need to startup python process (the "High Level Sequence

Diagram"

in your FLIP).
Because only meta-information is persisted into catalog, we don't need

to

store "return type", "input types" into catalog.
I guess the python process is required when compiling a SQL job.

Best,
Jark



On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:


Big +1 for this feature.

We built our SQL platform on Java Table API, and most common UDF are
implemented in Java. However some python developers are not familiar

with

Java/Scala, and it's very inconvenient for these users to use UDF in

SQL.


Wei Zhong  于2020年2月28日周五 下午6:58写道:


Thank for your reply Dan!

By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
imj...@gmail.com> @Timo  could you please take a
look?

Thanks,
Wei


在 2020年2月25日,16:25,zoudan  写道:

+1 for supporting Python UDF in Java/Scala Table API.
This is a great feature and would be helpful for python users!

Best,
Dan

Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-09 Thread Wei Zhong
Hi Dawid,

I think defering the instantiation of temporary functions to compile time is 
quite a good idea but needs further discussion. As it is orthogonal with this 
FLIP, we could continue the discussion in a new thread later. What do you think?

Best,
Wei

> 在 2020年3月5日,21:11,Wei Zhong  写道:
> 
> Hi Dawid,
> 
> Thanks for your suggestion. 
> 
> After some investigation, there are two designs in my mind about how to defer 
> the instantiation of temporary system function and temporary catalog function 
> to compile time.
> 
> 1. FunctionCatalog accepts both FunctionDefinitions and uninstantiated 
> temporary functions. The uninstantiated temporary functions will be 
> instantiated when compiling. There is no public API change in this design, 
> but the FunctionCatalog needs to store and process both FunctionDefinitions 
> and uninstantiated temporary functions.
> 
> 2. FunctionCatalog accepts only uninstantiated temporary functions. In this 
> design we need to remove those APIs that accepts FunctionDefinitions from 
> TableEnvironment, i.e. `void createTemporaryFunction(String path, 
> UserDefinedFunction functionInstance)` and `void 
> createTemporarySystemFunction(String name, UserDefinedFunction 
> functionInstance)`. But the FunctionCatalog only needs to store and process 
> uninstantiated temporary functions.
> 
> As I don't know the details about the plan to store temporary functions as 
> catalog functions instead of FunctionDefinitions, I'm not sure which solution 
> fits more. It would be great if you could share more details or share some 
> thoughts on these two solutions?
> 
> Best,
> Wei
> 
>> 在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
>> 
>> Hi all,
>> I had a really quick look and from my perspective the proposal looks fine.
>> I share Jarks opinion that the instantiation could be done at a later
>> stage. I agree with Wei it requires some changes in the internal
>> implementation of the FunctionCatalog, to store temporary functions as
>> catalog functions instead of FunctionDefinitions, but we have that on our
>> agenda anyway. I would suggest investigating if we could do that as part of
>> this flip already. Nevertheless this in theory can be also done later.
>> 
>> Best,
>> Dawid
>> 
>> On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
>> 
>>> Thanks for the explanation, Wei!
>>> 
>>> On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:
>>> 
 Hi Jark,
 
 Thanks for your suggestion.
 
 Actually, the timing of starting a Python process depends on the UDF
>>> type,
 because the Python process is used to provide the necessary information
>>> to
 instantiate the FunctionDefinition object of the Python UDF. For catalog
 function, the FunctionDefinition will be instantiated when compiling the
 job, which means the Python process is required during the compilation
 instead of the registeration. For temporary system function and temporary
 catalog function, the FunctionDefinition will be instantiated during the
 UDF registeration, so the Python process need to be started at that time.
 
 But this FLIP will only support registering the temporary system function
 and temporary catalog function in SQL DDL because registering Python UDF
>>> to
 catalog is not supported yet. We plan to support the registeration of
 Python catalog function (via Table API and SQL DDL) in a separate FLIP.
 I'll add a non-goal section to the FLIP page to illustrate this.
 
 Best,
 Wei
 
 
> 在 2020年3月2日,15:11,Jark Wu  写道:
> 
> Hi Weizhong,
> 
> Thanks for proposing this feature. In geneal, I'm +1 from the table's
 view.
> 
> I have one suggestion: I think the register python function into
>>> catalog
> doesn't need to startup python process (the "High Level Sequence
>>> Diagram"
> in your FLIP).
> Because only meta-information is persisted into catalog, we don't need
>>> to
> store "return type", "input types" into catalog.
> I guess the python process is required when compiling a SQL job.
> 
> Best,
> Jark
> 
> 
> 
> On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
> 
>> Big +1 for this feature.
>> 
>> We built our SQL platform on Java Table API, and most common UDF are
>> implemented in Java. However some python developers are not familiar
 with
>> Java/Scala, and it's very inconvenient for these users to use UDF in
 SQL.
>> 
>> Wei Zhong  于2020年2月28日周五 下午6:58写道:
>> 
>>> Thank for your reply Dan!
>>> 
>>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
>>> imj...@gmail.com> @Timo  could you please take a
>>> look?
>>> 
>>> Thanks,
>>> Wei
>>> 
 在 2020年2月25日,16:25,zoudan  写道:
 
 +1 for supporting Python UDF in Java/Scala Table API.
 This is a great feature and would be helpful for python users!
 
 Best,
 Dan Zou

Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-05 Thread Wei Zhong
Hi Dawid,

Thanks for your suggestion. 

After some investigation, there are two designs in my mind about how to defer 
the instantiation of temporary system function and temporary catalog function 
to compile time.

1. FunctionCatalog accepts both FunctionDefinitions and uninstantiated 
temporary functions. The uninstantiated temporary functions will be 
instantiated when compiling. There is no public API change in this design, but 
the FunctionCatalog needs to store and process both FunctionDefinitions and 
uninstantiated temporary functions.

2. FunctionCatalog accepts only uninstantiated temporary functions. In this 
design we need to remove those APIs that accepts FunctionDefinitions from 
TableEnvironment, i.e. `void createTemporaryFunction(String path, 
UserDefinedFunction functionInstance)` and `void 
createTemporarySystemFunction(String name, UserDefinedFunction 
functionInstance)`. But the FunctionCatalog only needs to store and process 
uninstantiated temporary functions.

As I don't know the details about the plan to store temporary functions as 
catalog functions instead of FunctionDefinitions, I'm not sure which solution 
fits more. It would be great if you could share more details or share some 
thoughts on these two solutions?

Best,
Wei

> 在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
> 
> Hi all,
> I had a really quick look and from my perspective the proposal looks fine.
> I share Jarks opinion that the instantiation could be done at a later
> stage. I agree with Wei it requires some changes in the internal
> implementation of the FunctionCatalog, to store temporary functions as
> catalog functions instead of FunctionDefinitions, but we have that on our
> agenda anyway. I would suggest investigating if we could do that as part of
> this flip already. Nevertheless this in theory can be also done later.
> 
> Best,
> Dawid
> 
> On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
> 
>> Thanks for the explanation, Wei!
>> 
>> On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:
>> 
>>> Hi Jark,
>>> 
>>> Thanks for your suggestion.
>>> 
>>> Actually, the timing of starting a Python process depends on the UDF
>> type,
>>> because the Python process is used to provide the necessary information
>> to
>>> instantiate the FunctionDefinition object of the Python UDF. For catalog
>>> function, the FunctionDefinition will be instantiated when compiling the
>>> job, which means the Python process is required during the compilation
>>> instead of the registeration. For temporary system function and temporary
>>> catalog function, the FunctionDefinition will be instantiated during the
>>> UDF registeration, so the Python process need to be started at that time.
>>> 
>>> But this FLIP will only support registering the temporary system function
>>> and temporary catalog function in SQL DDL because registering Python UDF
>> to
>>> catalog is not supported yet. We plan to support the registeration of
>>> Python catalog function (via Table API and SQL DDL) in a separate FLIP.
>>> I'll add a non-goal section to the FLIP page to illustrate this.
>>> 
>>> Best,
>>> Wei
>>> 
>>> 
 在 2020年3月2日,15:11,Jark Wu  写道:
 
 Hi Weizhong,
 
 Thanks for proposing this feature. In geneal, I'm +1 from the table's
>>> view.
 
 I have one suggestion: I think the register python function into
>> catalog
 doesn't need to startup python process (the "High Level Sequence
>> Diagram"
 in your FLIP).
 Because only meta-information is persisted into catalog, we don't need
>> to
 store "return type", "input types" into catalog.
 I guess the python process is required when compiling a SQL job.
 
 Best,
 Jark
 
 
 
 On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
 
> Big +1 for this feature.
> 
> We built our SQL platform on Java Table API, and most common UDF are
> implemented in Java. However some python developers are not familiar
>>> with
> Java/Scala, and it's very inconvenient for these users to use UDF in
>>> SQL.
> 
> Wei Zhong  于2020年2月28日周五 下午6:58写道:
> 
>> Thank for your reply Dan!
>> 
>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
>> imj...@gmail.com> @Timo  could you please take a
>> look?
>> 
>> Thanks,
>> Wei
>> 
>>> 在 2020年2月25日,16:25,zoudan  写道:
>>> 
>>> +1 for supporting Python UDF in Java/Scala Table API.
>>> This is a great feature and would be helpful for python users!
>>> 
>>> Best,
>>> Dan Zou
>>> 
>>> 
>> 
>> 
> 
> --
> 
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking
>>> University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> 
> 
>>> 
>>> 
>> 



Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-04 Thread Dawid Wysakowicz
Hi all,
I had a really quick look and from my perspective the proposal looks fine.
I share Jarks opinion that the instantiation could be done at a later
stage. I agree with Wei it requires some changes in the internal
implementation of the FunctionCatalog, to store temporary functions as
catalog functions instead of FunctionDefinitions, but we have that on our
agenda anyway. I would suggest investigating if we could do that as part of
this flip already. Nevertheless this in theory can be also done later.

Best,
Dawid

On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:

> Thanks for the explanation, Wei!
>
> On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:
>
> > Hi Jark,
> >
> > Thanks for your suggestion.
> >
> > Actually, the timing of starting a Python process depends on the UDF
> type,
> > because the Python process is used to provide the necessary information
> to
> > instantiate the FunctionDefinition object of the Python UDF. For catalog
> > function, the FunctionDefinition will be instantiated when compiling the
> > job, which means the Python process is required during the compilation
> > instead of the registeration. For temporary system function and temporary
> > catalog function, the FunctionDefinition will be instantiated during the
> > UDF registeration, so the Python process need to be started at that time.
> >
> > But this FLIP will only support registering the temporary system function
> > and temporary catalog function in SQL DDL because registering Python UDF
> to
> > catalog is not supported yet. We plan to support the registeration of
> > Python catalog function (via Table API and SQL DDL) in a separate FLIP.
> > I'll add a non-goal section to the FLIP page to illustrate this.
> >
> > Best,
> > Wei
> >
> >
> > > 在 2020年3月2日,15:11,Jark Wu  写道:
> > >
> > > Hi Weizhong,
> > >
> > > Thanks for proposing this feature. In geneal, I'm +1 from the table's
> > view.
> > >
> > > I have one suggestion: I think the register python function into
> catalog
> > > doesn't need to startup python process (the "High Level Sequence
> Diagram"
> > > in your FLIP).
> > > Because only meta-information is persisted into catalog, we don't need
> to
> > > store "return type", "input types" into catalog.
> > > I guess the python process is required when compiling a SQL job.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
> > >
> > >> Big +1 for this feature.
> > >>
> > >> We built our SQL platform on Java Table API, and most common UDF are
> > >> implemented in Java. However some python developers are not familiar
> > with
> > >> Java/Scala, and it's very inconvenient for these users to use UDF in
> > SQL.
> > >>
> > >> Wei Zhong  于2020年2月28日周五 下午6:58写道:
> > >>
> > >>> Thank for your reply Dan!
> > >>>
> > >>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
> > >>> imj...@gmail.com> @Timo  could you please take a
> > >>> look?
> > >>>
> > >>> Thanks,
> > >>> Wei
> > >>>
> >  在 2020年2月25日,16:25,zoudan  写道:
> > 
> >  +1 for supporting Python UDF in Java/Scala Table API.
> >  This is a great feature and would be helpful for python users!
> > 
> >  Best,
> >  Dan Zou
> > 
> > 
> > >>>
> > >>>
> > >>
> > >> --
> > >>
> > >> Benchao Li
> > >> School of Electronics Engineering and Computer Science, Peking
> > University
> > >> Tel:+86-15650713730
> > >> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >>
> > >>
> >
> >
>


Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-02 Thread Jark Wu
Thanks for the explanation, Wei!

On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:

> Hi Jark,
>
> Thanks for your suggestion.
>
> Actually, the timing of starting a Python process depends on the UDF type,
> because the Python process is used to provide the necessary information to
> instantiate the FunctionDefinition object of the Python UDF. For catalog
> function, the FunctionDefinition will be instantiated when compiling the
> job, which means the Python process is required during the compilation
> instead of the registeration. For temporary system function and temporary
> catalog function, the FunctionDefinition will be instantiated during the
> UDF registeration, so the Python process need to be started at that time.
>
> But this FLIP will only support registering the temporary system function
> and temporary catalog function in SQL DDL because registering Python UDF to
> catalog is not supported yet. We plan to support the registeration of
> Python catalog function (via Table API and SQL DDL) in a separate FLIP.
> I'll add a non-goal section to the FLIP page to illustrate this.
>
> Best,
> Wei
>
>
> > 在 2020年3月2日,15:11,Jark Wu  写道:
> >
> > Hi Weizhong,
> >
> > Thanks for proposing this feature. In geneal, I'm +1 from the table's
> view.
> >
> > I have one suggestion: I think the register python function into catalog
> > doesn't need to startup python process (the "High Level Sequence Diagram"
> > in your FLIP).
> > Because only meta-information is persisted into catalog, we don't need to
> > store "return type", "input types" into catalog.
> > I guess the python process is required when compiling a SQL job.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
> >
> >> Big +1 for this feature.
> >>
> >> We built our SQL platform on Java Table API, and most common UDF are
> >> implemented in Java. However some python developers are not familiar
> with
> >> Java/Scala, and it's very inconvenient for these users to use UDF in
> SQL.
> >>
> >> Wei Zhong  于2020年2月28日周五 下午6:58写道:
> >>
> >>> Thank for your reply Dan!
> >>>
> >>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
> >>> imj...@gmail.com> @Timo  could you please take a
> >>> look?
> >>>
> >>> Thanks,
> >>> Wei
> >>>
>  在 2020年2月25日,16:25,zoudan  写道:
> 
>  +1 for supporting Python UDF in Java/Scala Table API.
>  This is a great feature and would be helpful for python users!
> 
>  Best,
>  Dan Zou
> 
> 
> >>>
> >>>
> >>
> >> --
> >>
> >> Benchao Li
> >> School of Electronics Engineering and Computer Science, Peking
> University
> >> Tel:+86-15650713730
> >> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >>
> >>
>
>


Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-02 Thread Wei Zhong
Hi Jark,

Thanks for your suggestion. 

Actually, the timing of starting a Python process depends on the UDF type, 
because the Python process is used to provide the necessary information to 
instantiate the FunctionDefinition object of the Python UDF. For catalog 
function, the FunctionDefinition will be instantiated when compiling the job, 
which means the Python process is required during the compilation instead of 
the registeration. For temporary system function and temporary catalog 
function, the FunctionDefinition will be instantiated during the UDF 
registeration, so the Python process need to be started at that time. 

But this FLIP will only support registering the temporary system function and 
temporary catalog function in SQL DDL because registering Python UDF to catalog 
is not supported yet. We plan to support the registeration of Python catalog 
function (via Table API and SQL DDL) in a separate FLIP. I'll add a non-goal 
section to the FLIP page to illustrate this.

Best,
Wei


> 在 2020年3月2日,15:11,Jark Wu  写道:
> 
> Hi Weizhong,
> 
> Thanks for proposing this feature. In geneal, I'm +1 from the table's view.
> 
> I have one suggestion: I think the register python function into catalog
> doesn't need to startup python process (the "High Level Sequence Diagram"
> in your FLIP).
> Because only meta-information is persisted into catalog, we don't need to
> store "return type", "input types" into catalog.
> I guess the python process is required when compiling a SQL job.
> 
> Best,
> Jark
> 
> 
> 
> On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
> 
>> Big +1 for this feature.
>> 
>> We built our SQL platform on Java Table API, and most common UDF are
>> implemented in Java. However some python developers are not familiar with
>> Java/Scala, and it's very inconvenient for these users to use UDF in SQL.
>> 
>> Wei Zhong  于2020年2月28日周五 下午6:58写道:
>> 
>>> Thank for your reply Dan!
>>> 
>>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
>>> imj...@gmail.com> @Timo  could you please take a
>>> look?
>>> 
>>> Thanks,
>>> Wei
>>> 
 在 2020年2月25日,16:25,zoudan  写道:
 
 +1 for supporting Python UDF in Java/Scala Table API.
 This is a great feature and would be helpful for python users!
 
 Best,
 Dan Zou
 
 
>>> 
>>> 
>> 
>> --
>> 
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>> 
>> 



Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-01 Thread Jark Wu
Hi Weizhong,

Thanks for proposing this feature. In geneal, I'm +1 from the table's view.

I have one suggestion: I think the register python function into catalog
doesn't need to startup python process (the "High Level Sequence Diagram"
in your FLIP).
Because only meta-information is persisted into catalog, we don't need to
store "return type", "input types" into catalog.
I guess the python process is required when compiling a SQL job.

Best,
Jark



On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:

> Big +1 for this feature.
>
> We built our SQL platform on Java Table API, and most common UDF are
> implemented in Java. However some python developers are not familiar with
> Java/Scala, and it's very inconvenient for these users to use UDF in SQL.
>
> Wei Zhong  于2020年2月28日周五 下午6:58写道:
>
>> Thank for your reply Dan!
>>
>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
>> imj...@gmail.com> @Timo  could you please take a
>> look?
>>
>> Thanks,
>> Wei
>>
>> > 在 2020年2月25日,16:25,zoudan  写道:
>> >
>> > +1 for supporting Python UDF in Java/Scala Table API.
>> > This is a great feature and would be helpful for python users!
>> >
>> > Best,
>> > Dan Zou
>> >
>> >
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>


Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-02-28 Thread Benchao Li
Big +1 for this feature.

We built our SQL platform on Java Table API, and most common UDF are
implemented in Java. However some python developers are not familiar with
Java/Scala, and it's very inconvenient for these users to use UDF in SQL.

Wei Zhong  于2020年2月28日周五 下午6:58写道:

> Thank for your reply Dan!
>
> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
> imj...@gmail.com> @Timo  could you please take a look?
>
> Thanks,
> Wei
>
> > 在 2020年2月25日,16:25,zoudan  写道:
> >
> > +1 for supporting Python UDF in Java/Scala Table API.
> > This is a great feature and would be helpful for python users!
> >
> > Best,
> > Dan Zou
> >
> >
>
>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-02-28 Thread Wei Zhong
Thank for your reply Dan!

By the way, this FLIP is closely related to the SQL API.  @Jark Wu 
 @Timo  could you please take a look?

Thanks,
Wei

> 在 2020年2月25日,16:25,zoudan  写道:
> 
> +1 for supporting Python UDF in Java/Scala Table API.
> This is a great feature and would be helpful for python users!
> 
> Best,
> Dan Zou
> 
> 



Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-02-25 Thread zoudan
+1 for supporting Python UDF in Java/Scala Table API.
This is a great feature and would be helpful for python users!

Best,
Dan Zou




[DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-02-24 Thread Wei Zhong
Hi everyone,

I would like to start discussion about how to support Python UDF in SQL 
Function DDL.

The SQL Function DDL(FLIP-79[1]) is a great feature which was introduced in the 
release of 1.10.0. However, it currently only supports creating Java/Scala UDF 
in the SQL Function DDL. Although FLIP-79 has already proposed a statement 
about how to create Python UDF in the SQL Function DDL, it’s still not 
supported yet.

We want to introduce the support of Python UDF in the SQL Function DDL to fill 
this gap. It provides another way of using Python UDF and could extend Python 
UDF to users of Java/Scala Table API, SQL client, etc.

Here is the design doc:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL

Looking forward to your feedback!

Best,
Wei

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-79+Flink+Function+DDL+Support