Re: Re: Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-09 Thread Jerry Peng
Hi Everyone,

Some of my thoughts:

1. If we are going to introduce new methods or stages in the lifecycle of a
function such as "open" and "close",  I would recommend we follow what we
already have for Pulsar IO sources and Sink so that we are consistent.

2. I am also not a particular fan of the name "HookFunction".  Can we call
it "RichFunction"?  This naming is consistent with other projects such as
Apache Storm and Flink.

3. In regards to the topic

If this rule must be followed, then one possible way is we define a
> separate `HookFunction` interface which includes `setup`, `process`,
> `tearDown` methods, and add support in the new runtime to handle this
> interface. Just as currently the runtime handles `pulsar.Function` and
> `java.util.Function` separately. The old runtime will not be able to
> recognize the new interface and thus won't execute functions which need
> initialization.
>

I would rather not define a completely new interface. Ideally we have a
"RichFunction" interface that extends the existing.  So that we can
minimize interface duplication and if we want to introduce new interfaces
in the future we can build on top of that.  In regards to the issue about
the old runtime executing a function that implements the new interface, is
it sufficient to document somewhere that the worker has to be updated to a
certain version to run the functions that implement the new interface?

Best,

Jerry



On Wed, Jul 7, 2021 at 9:45 AM Neng Lu  wrote:

> - Regarding the interface design:
> One thing we need to confirm is whether we **don't want old runtime to
> execute new function without proper initialization** or not.
>
> If this rule must be followed, then one possible way is we define a
> separate `HookFunction` interface which includes `setup`, `process`,
> `tearDown` methods, and add support in the new runtime to handle this
> interface. Just as currently the runtime handles `pulsar.Function` and
> `java.util.Function` separately. The old runtime will not be able to
> recognize the new interface and thus won't execute functions which need
> initialization.
>
> Documentation and examples can be added to help user learn and understand
> this new interface.
>
> - Regarding the exception:
> I think user is responsible for properly handle any checked exceptions and
> thus the interface should not allow throwing any exception. For any
> unchecked exceptions, the runtime/process will fail fast.
>
> - Regarding the contract:
> Given that checked exceptions are handled by user properly as described
> above, the `setup` method must be successfully executed or exited. So,
> there's no need to call the `tearDown` method.
>
>
> On 2021/07/06 21:02:17 Enrico Olivelli wrote:
> > Il Mar 6 Lug 2021, 21:30 Neng Lu  ha scritto:
> >
> > > IMHO, The old runtime should not be able to run the new functions.
> >
> >
> > This is not possible to enforce this because we cannot change old code.
> >
> > >
> > >
> > > New functions require resource initialization via hooks. If the actual
> > > `setup()` method is not called (or only a default no-op one is called),
> > > then the function is not properly initialized and there'll be problems
> if
> > > they are executed.
> > >
> >
> > I believe that using default methods is better, as it is clearer for new
> > users.
> >
> > If you add a new interface then it is difficult for a new user to
> discover
> > this feature.
> >
> > Additionally we have to define the contract for calling the teardown
> > function.
> > Will it be called in case of failure of the setup function?
> >
> > Are those functions allowed to throw exceptions. I believe it is good to
> > let them throw Exception in order to not force the user to catch
> everything
> > and re throw as a RuntimeException
> >
> > Enrico
> >
> >
> > >
> > > On 2021/07/06 19:03:09 Sijie Guo wrote:
> > > > So there are two compatibility issues we need to consider here.
> > > >
> > > > 1) Old runtime to run new functions.
> > > > 2) New runtime to run old functions.
> > > >
> > > > Making the methods with default no-op implementation will resolve 1).
> > > > Is that correct?
> > > >
> > > > We can use reflection to check if the methods exist or not to solve
> 2),
> > > no?
> > > >
> > > > - Sijie
> > > >
> > > > On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
> > > > >
> > > > > I think the reason is for keeping the original `Function`
> unchanged,
> > > so that existing implemented functions are not affected.
> > > > >
> > > > >
> > > > > On 2021/07/06 03:34:49 Sijie Guo wrote:
> > > > > > Thank you for starting the discussion!
> > > > > >
> > > > > > I have added this proposal to PIP-86:
> > > > > >
> > >
> https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> > > > > >
> > > > > > I have one question: why do you introduce HookFunction? Why not
> just
> > > add
> > > > > > two default methods to the existing Functions API?
> > > > > >
> > > > > > - Sijie
> > > > > >
> > > > > > 

Re: Re: Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-07 Thread Neng Lu
- Regarding the interface design:
One thing we need to confirm is whether we **don't want old runtime to execute 
new function without proper initialization** or not. 

If this rule must be followed, then one possible way is we define a separate 
`HookFunction` interface which includes `setup`, `process`, `tearDown` methods, 
and add support in the new runtime to handle this interface. Just as currently 
the runtime handles `pulsar.Function` and `java.util.Function` separately. The 
old runtime will not be able to recognize the new interface and thus won't 
execute functions which need initialization.

Documentation and examples can be added to help user learn and understand this 
new interface.

- Regarding the exception:
I think user is responsible for properly handle any checked exceptions and thus 
the interface should not allow throwing any exception. For any unchecked 
exceptions, the runtime/process will fail fast. 

- Regarding the contract:
Given that checked exceptions are handled by user properly as described above, 
the `setup` method must be successfully executed or exited. So, there's no need 
to call the `tearDown` method.


On 2021/07/06 21:02:17 Enrico Olivelli wrote:
> Il Mar 6 Lug 2021, 21:30 Neng Lu  ha scritto:
> 
> > IMHO, The old runtime should not be able to run the new functions.
> 
> 
> This is not possible to enforce this because we cannot change old code.
> 
> >
> >
> > New functions require resource initialization via hooks. If the actual
> > `setup()` method is not called (or only a default no-op one is called),
> > then the function is not properly initialized and there'll be problems if
> > they are executed.
> >
> 
> I believe that using default methods is better, as it is clearer for new
> users.
> 
> If you add a new interface then it is difficult for a new user to discover
> this feature.
> 
> Additionally we have to define the contract for calling the teardown
> function.
> Will it be called in case of failure of the setup function?
> 
> Are those functions allowed to throw exceptions. I believe it is good to
> let them throw Exception in order to not force the user to catch everything
> and re throw as a RuntimeException
> 
> Enrico
> 
> 
> >
> > On 2021/07/06 19:03:09 Sijie Guo wrote:
> > > So there are two compatibility issues we need to consider here.
> > >
> > > 1) Old runtime to run new functions.
> > > 2) New runtime to run old functions.
> > >
> > > Making the methods with default no-op implementation will resolve 1).
> > > Is that correct?
> > >
> > > We can use reflection to check if the methods exist or not to solve 2),
> > no?
> > >
> > > - Sijie
> > >
> > > On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
> > > >
> > > > I think the reason is for keeping the original `Function` unchanged,
> > so that existing implemented functions are not affected.
> > > >
> > > >
> > > > On 2021/07/06 03:34:49 Sijie Guo wrote:
> > > > > Thank you for starting the discussion!
> > > > >
> > > > > I have added this proposal to PIP-86:
> > > > >
> > https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> > > > >
> > > > > I have one question: why do you introduce HookFunction? Why not just
> > add
> > > > > two default methods to the existing Functions API?
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Mon, Jul 5, 2021 at 6:49 PM 陈磊 
> > wrote:
> > > > >
> > > > > > Motivation
> > > > > >
> > > > > > It is very useful in many scenarios to provide safe and convenient
> > > > > > capabilities for function's external resource initialization and
> > release.
> > > > > > In addition to the normal data processing path, it enables
> > functions to use
> > > > > > HookFunction to manage external resources
> > > > > >
> > > > > > At present, in order to process data, only the logic of resource
> > > > > > initialization -> processing -> release and shutdown can be
> > written in the
> > > > > > process() of Function. This method is complicated, insecure, and
> > > > > > unnecessary.
> > > > > >
> > > > > > Instead, we should have a new standard way for users to use
> > Function
> > > > > > easily and safely. Summarized as follows:
> > > > > >
> > > > > > Before Function starts, some resources only need to be initialized
> > once,
> > > > > > and there is no need to make various judgments in the process()
> > method of
> > > > > > the Function interface
> > > > > >
> > > > > > After closing the Function, in the process of using process(), you
> > need to
> > > > > > manually close the referenced external resources, which need to be
> > released
> > > > > > separately in the close() of javaInstance
> > > > > > API and Implementation Changes
> > > > > >
> > > > > > The organization of the function implementation hierarchy has been
> > added,
> > > > > > it currently looks like the following figure:
> > > > > >
> > > > > > Use Cases:
> > > > > >
> > > > > > Before transformation
> > > > > > public class DemoFunction implements Function{

Re: Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-06 Thread Enrico Olivelli
Il Mar 6 Lug 2021, 21:30 Neng Lu  ha scritto:

> IMHO, The old runtime should not be able to run the new functions.


This is not possible to enforce this because we cannot change old code.

>
>
> New functions require resource initialization via hooks. If the actual
> `setup()` method is not called (or only a default no-op one is called),
> then the function is not properly initialized and there'll be problems if
> they are executed.
>

I believe that using default methods is better, as it is clearer for new
users.

If you add a new interface then it is difficult for a new user to discover
this feature.

Additionally we have to define the contract for calling the teardown
function.
Will it be called in case of failure of the setup function?

Are those functions allowed to throw exceptions. I believe it is good to
let them throw Exception in order to not force the user to catch everything
and re throw as a RuntimeException

Enrico


>
> On 2021/07/06 19:03:09 Sijie Guo wrote:
> > So there are two compatibility issues we need to consider here.
> >
> > 1) Old runtime to run new functions.
> > 2) New runtime to run old functions.
> >
> > Making the methods with default no-op implementation will resolve 1).
> > Is that correct?
> >
> > We can use reflection to check if the methods exist or not to solve 2),
> no?
> >
> > - Sijie
> >
> > On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
> > >
> > > I think the reason is for keeping the original `Function` unchanged,
> so that existing implemented functions are not affected.
> > >
> > >
> > > On 2021/07/06 03:34:49 Sijie Guo wrote:
> > > > Thank you for starting the discussion!
> > > >
> > > > I have added this proposal to PIP-86:
> > > >
> https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> > > >
> > > > I have one question: why do you introduce HookFunction? Why not just
> add
> > > > two default methods to the existing Functions API?
> > > >
> > > > - Sijie
> > > >
> > > > On Mon, Jul 5, 2021 at 6:49 PM 陈磊 
> wrote:
> > > >
> > > > > Motivation
> > > > >
> > > > > It is very useful in many scenarios to provide safe and convenient
> > > > > capabilities for function's external resource initialization and
> release.
> > > > > In addition to the normal data processing path, it enables
> functions to use
> > > > > HookFunction to manage external resources
> > > > >
> > > > > At present, in order to process data, only the logic of resource
> > > > > initialization -> processing -> release and shutdown can be
> written in the
> > > > > process() of Function. This method is complicated, insecure, and
> > > > > unnecessary.
> > > > >
> > > > > Instead, we should have a new standard way for users to use
> Function
> > > > > easily and safely. Summarized as follows:
> > > > >
> > > > > Before Function starts, some resources only need to be initialized
> once,
> > > > > and there is no need to make various judgments in the process()
> method of
> > > > > the Function interface
> > > > >
> > > > > After closing the Function, in the process of using process(), you
> need to
> > > > > manually close the referenced external resources, which need to be
> released
> > > > > separately in the close() of javaInstance
> > > > > API and Implementation Changes
> > > > >
> > > > > The organization of the function implementation hierarchy has been
> added,
> > > > > it currently looks like the following figure:
> > > > >
> > > > > Use Cases:
> > > > >
> > > > > Before transformation
> > > > > public class DemoFunction implements Function{
> > > > > RedisClient client;
> > > > > @Override
> > > > > public String process(String str, Context context) {
> > > > > 1.client=init();
> > > > > 2.Object object = client.get(key);
> > > > > //Data enrichment
> > > > > 3.client.close();
> > > > > return null;
> > > > > }
> > > > > }
> > > > >
> > > > > After the transformation
> > > > > public class DemoFunction implements HookFunction{
> > > > > RedisClient client;
> > > > > @Override
> > > > > public void setup(Context context) {
> > > > > Map connectInfo =
> context.getUserConfigMap();
> > > > > client=init();
> > > > > }
> > > > >
> > > > > @Override
> > > > > public  String process(String str, Context context) {
> > > > > Object object = client.get(key);
> > > > > //Data enrichment
> > > > > return null;
> > > > > }
> > > > >
> > > > > @Override
> > > > > public void cleanup() {
> > > > > client.close();
> > > > > }
> > > > > }
> > > > >
> > > > > It is quite simple and clear to use in function processing code.
> > > > > 
> > > > >
> > > > >
> > > > >
> > > >
> >
>


Re: Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-06 Thread Neng Lu
IMHO, The old runtime should not be able to run the new functions. 

New functions require resource initialization via hooks. If the actual 
`setup()` method is not called (or only a default no-op one is called), then 
the function is not properly initialized and there'll be problems if they are 
executed.


On 2021/07/06 19:03:09 Sijie Guo wrote:
> So there are two compatibility issues we need to consider here.
> 
> 1) Old runtime to run new functions.
> 2) New runtime to run old functions.
> 
> Making the methods with default no-op implementation will resolve 1).
> Is that correct?
> 
> We can use reflection to check if the methods exist or not to solve 2), no?
> 
> - Sijie
> 
> On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
> >
> > I think the reason is for keeping the original `Function` unchanged, so 
> > that existing implemented functions are not affected.
> >
> >
> > On 2021/07/06 03:34:49 Sijie Guo wrote:
> > > Thank you for starting the discussion!
> > >
> > > I have added this proposal to PIP-86:
> > > https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> > >
> > > I have one question: why do you introduce HookFunction? Why not just add
> > > two default methods to the existing Functions API?
> > >
> > > - Sijie
> > >
> > > On Mon, Jul 5, 2021 at 6:49 PM 陈磊  wrote:
> > >
> > > > Motivation
> > > >
> > > > It is very useful in many scenarios to provide safe and convenient
> > > > capabilities for function's external resource initialization and 
> > > > release.
> > > > In addition to the normal data processing path, it enables functions to 
> > > > use
> > > > HookFunction to manage external resources
> > > >
> > > > At present, in order to process data, only the logic of resource
> > > > initialization -> processing -> release and shutdown can be written in 
> > > > the
> > > > process() of Function. This method is complicated, insecure, and
> > > > unnecessary.
> > > >
> > > > Instead, we should have a new standard way for users to use Function
> > > > easily and safely. Summarized as follows:
> > > >
> > > > Before Function starts, some resources only need to be initialized once,
> > > > and there is no need to make various judgments in the process() method 
> > > > of
> > > > the Function interface
> > > >
> > > > After closing the Function, in the process of using process(), you need 
> > > > to
> > > > manually close the referenced external resources, which need to be 
> > > > released
> > > > separately in the close() of javaInstance
> > > > API and Implementation Changes
> > > >
> > > > The organization of the function implementation hierarchy has been 
> > > > added,
> > > > it currently looks like the following figure:
> > > >
> > > > Use Cases:
> > > >
> > > > Before transformation
> > > > public class DemoFunction implements Function{
> > > > RedisClient client;
> > > > @Override
> > > > public String process(String str, Context context) {
> > > > 1.client=init();
> > > > 2.Object object = client.get(key);
> > > > //Data enrichment
> > > > 3.client.close();
> > > > return null;
> > > > }
> > > > }
> > > >
> > > > After the transformation
> > > > public class DemoFunction implements HookFunction{
> > > > RedisClient client;
> > > > @Override
> > > > public void setup(Context context) {
> > > > Map connectInfo = context.getUserConfigMap();
> > > > client=init();
> > > > }
> > > >
> > > > @Override
> > > > public  String process(String str, Context context) {
> > > > Object object = client.get(key);
> > > > //Data enrichment
> > > > return null;
> > > > }
> > > >
> > > > @Override
> > > > public void cleanup() {
> > > > client.close();
> > > > }
> > > > }
> > > >
> > > > It is quite simple and clear to use in function processing code.
> > > > 
> > > >
> > > >
> > > >
> > >
> 


Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-06 Thread Sijie Guo
So there are two compatibility issues we need to consider here.

1) Old runtime to run new functions.
2) New runtime to run old functions.

Making the methods with default no-op implementation will resolve 1).
Is that correct?

We can use reflection to check if the methods exist or not to solve 2), no?

- Sijie

On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
>
> I think the reason is for keeping the original `Function` unchanged, so that 
> existing implemented functions are not affected.
>
>
> On 2021/07/06 03:34:49 Sijie Guo wrote:
> > Thank you for starting the discussion!
> >
> > I have added this proposal to PIP-86:
> > https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> >
> > I have one question: why do you introduce HookFunction? Why not just add
> > two default methods to the existing Functions API?
> >
> > - Sijie
> >
> > On Mon, Jul 5, 2021 at 6:49 PM 陈磊  wrote:
> >
> > > Motivation
> > >
> > > It is very useful in many scenarios to provide safe and convenient
> > > capabilities for function's external resource initialization and release.
> > > In addition to the normal data processing path, it enables functions to 
> > > use
> > > HookFunction to manage external resources
> > >
> > > At present, in order to process data, only the logic of resource
> > > initialization -> processing -> release and shutdown can be written in the
> > > process() of Function. This method is complicated, insecure, and
> > > unnecessary.
> > >
> > > Instead, we should have a new standard way for users to use Function
> > > easily and safely. Summarized as follows:
> > >
> > > Before Function starts, some resources only need to be initialized once,
> > > and there is no need to make various judgments in the process() method of
> > > the Function interface
> > >
> > > After closing the Function, in the process of using process(), you need to
> > > manually close the referenced external resources, which need to be 
> > > released
> > > separately in the close() of javaInstance
> > > API and Implementation Changes
> > >
> > > The organization of the function implementation hierarchy has been added,
> > > it currently looks like the following figure:
> > >
> > > Use Cases:
> > >
> > > Before transformation
> > > public class DemoFunction implements Function{
> > > RedisClient client;
> > > @Override
> > > public String process(String str, Context context) {
> > > 1.client=init();
> > > 2.Object object = client.get(key);
> > > //Data enrichment
> > > 3.client.close();
> > > return null;
> > > }
> > > }
> > >
> > > After the transformation
> > > public class DemoFunction implements HookFunction{
> > > RedisClient client;
> > > @Override
> > > public void setup(Context context) {
> > > Map connectInfo = context.getUserConfigMap();
> > > client=init();
> > > }
> > >
> > > @Override
> > > public  String process(String str, Context context) {
> > > Object object = client.get(key);
> > > //Data enrichment
> > > return null;
> > > }
> > >
> > > @Override
> > > public void cleanup() {
> > > client.close();
> > > }
> > > }
> > >
> > > It is quite simple and clear to use in function processing code.
> > > 
> > >
> > >
> > >
> >


Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-06 Thread Neng Lu
I think the reason is for keeping the original `Function` unchanged, so that 
existing implemented functions are not affected.


On 2021/07/06 03:34:49 Sijie Guo wrote:
> Thank you for starting the discussion!
> 
> I have added this proposal to PIP-86:
> https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> 
> I have one question: why do you introduce HookFunction? Why not just add
> two default methods to the existing Functions API?
> 
> - Sijie
> 
> On Mon, Jul 5, 2021 at 6:49 PM 陈磊  wrote:
> 
> > Motivation
> >
> > It is very useful in many scenarios to provide safe and convenient
> > capabilities for function's external resource initialization and release.
> > In addition to the normal data processing path, it enables functions to use
> > HookFunction to manage external resources
> >
> > At present, in order to process data, only the logic of resource
> > initialization -> processing -> release and shutdown can be written in the
> > process() of Function. This method is complicated, insecure, and
> > unnecessary.
> >
> > Instead, we should have a new standard way for users to use Function
> > easily and safely. Summarized as follows:
> >
> > Before Function starts, some resources only need to be initialized once,
> > and there is no need to make various judgments in the process() method of
> > the Function interface
> >
> > After closing the Function, in the process of using process(), you need to
> > manually close the referenced external resources, which need to be released
> > separately in the close() of javaInstance
> > API and Implementation Changes
> >
> > The organization of the function implementation hierarchy has been added,
> > it currently looks like the following figure:
> >
> > Use Cases:
> >
> > Before transformation
> > public class DemoFunction implements Function{
> > RedisClient client;
> > @Override
> > public String process(String str, Context context) {
> > 1.client=init();
> > 2.Object object = client.get(key);
> > //Data enrichment
> > 3.client.close();
> > return null;
> > }
> > }
> >
> > After the transformation
> > public class DemoFunction implements HookFunction{
> > RedisClient client;
> > @Override
> > public void setup(Context context) {
> > Map connectInfo = context.getUserConfigMap();
> > client=init();
> > }
> >
> > @Override
> > public  String process(String str, Context context) {
> > Object object = client.get(key);
> > //Data enrichment
> > return null;
> > }
> >
> > @Override
> > public void cleanup() {
> > client.close();
> > }
> > }
> >
> > It is quite simple and clear to use in function processing code.
> > 
> >
> >
> >
>