Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-20 Thread Michael Marshall
I am concerned that sharing function state will lead to tightly
coupled functions.

I tend to think of functions as actors, according to the actor model
[0]. As such, I view the state of a function (actor) as private and as
something that can only be modified by other functions (actors)
indirectly through messaging. Technically, our documentation does not
state that functions are completely isolated or that they are actors,
but I think we should consider making it part of their design. Without
isolation, functions could become very coupled, and that will
introduce additional challenges related to function scaling, version
management, and security.

> Certain uses of Pulsar functions could benefit from the ability to access
> the states of other functions. Currently functions can only access their
> own states, and so sharing information between functions requires other
> solutions such as writing to a separate database.

Instead of using a separate database, have you considered sharing
state by publishing messages to a topic that another function is
listening to?

- Michael

[0] - https://en.wikipedia.org/wiki/Actor_model



On Tue, Jan 18, 2022 at 11:38 AM Jerry Peng  wrote:
>
> I have concerns about security in this case and potential consistency
> issues.  We will need to define and implement some sort of ACLs system
> first on top of state for this to make sense.
>
> On Mon, Jan 17, 2022 at 5:52 PM Ethan Merrill 
> wrote:
>
> > Thanks for the feedback. I see your concerns.
> >
> > I've been thinking about some ways to mitigate these concerns without
> > expanding the scope of this too much. First, I think it could be a good
> > idea to limit state access to just functions within the same namespace.
> > This will at least avoid any issues that might arise with different
> > namespaces having different state storage implementations.
> >
> > Another thing we could consider is making states read-only by other
> > functions. This allows us to clearly define the owner of the data and avoid
> > unexpected issues with multiple functions trying to change the same state.
> > It would limit some potentially desirable functionality such as 2 functions
> > being able to increment the same counter, but keeping the data ownership
> > clearly defined may be more important for now.
> >
> > Another option to think about could be adding a way to differentiate
> > public/private states or defining what other functions are allowed to
> > access certain states would ease security concerns. It would require some
> > more development time, though, since it would complicate the current
> > implementation a bit. We'd have to address how and where state access is
> > defined.
> >
> > Another option we could consider could be having a separate public state
> > store that all functions in a namespace have access to. It would be simple
> > to implement and would at least separate a function's private states from
> > states it wants other functions to have access to. Data ownership is a bit
> > messy with the public states for this solution, but it would at least
> > provide a method of sharing data that needs to be shared.
> >
> > Let me know if you have any thoughts on any of these changes
> >
> > 
> > From: Enrico Olivelli 
> > Sent: Tuesday, January 11, 2022 1:45 PM
> > To: Dev 
> > Subject: Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing
> > Other Function States
> >
> > Thank you for posting your PIP !
> >
> > I am sharing some of Neng's concerns.
> > We should define clearly how security works.
> >
> > Also, currently the function defines some "namespace" for the state
> > storage, and we recently added support for custom state storage
> > implementation. With this change each function will possibly access
> > other state storage namespaces (think about using a Database per each
> > tenant).
> >
> > We should also state something about guarantees while accessing
> > multiple storages and/or about transactional (atomic?) access
> >
> >
> > Enrico
> >
> > Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu  ha
> > scritto:
> > >
> > > Before we advance further, we first need to get on the same page of the
> > > pros and cons of allowing this feature.
> > >
> > > If functions can access (especially the write access) other functions'
> > > state, the data ownership will be a mess, isolation is broken and data
> > > security might be compromised.
> > >
> > >
> > >
> > >
> > >
> > > O

Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-18 Thread Jerry Peng
I have concerns about security in this case and potential consistency
issues.  We will need to define and implement some sort of ACLs system
first on top of state for this to make sense.

On Mon, Jan 17, 2022 at 5:52 PM Ethan Merrill 
wrote:

> Thanks for the feedback. I see your concerns.
>
> I've been thinking about some ways to mitigate these concerns without
> expanding the scope of this too much. First, I think it could be a good
> idea to limit state access to just functions within the same namespace.
> This will at least avoid any issues that might arise with different
> namespaces having different state storage implementations.
>
> Another thing we could consider is making states read-only by other
> functions. This allows us to clearly define the owner of the data and avoid
> unexpected issues with multiple functions trying to change the same state.
> It would limit some potentially desirable functionality such as 2 functions
> being able to increment the same counter, but keeping the data ownership
> clearly defined may be more important for now.
>
> Another option to think about could be adding a way to differentiate
> public/private states or defining what other functions are allowed to
> access certain states would ease security concerns. It would require some
> more development time, though, since it would complicate the current
> implementation a bit. We'd have to address how and where state access is
> defined.
>
> Another option we could consider could be having a separate public state
> store that all functions in a namespace have access to. It would be simple
> to implement and would at least separate a function's private states from
> states it wants other functions to have access to. Data ownership is a bit
> messy with the public states for this solution, but it would at least
> provide a method of sharing data that needs to be shared.
>
> Let me know if you have any thoughts on any of these changes
>
> ____
> From: Enrico Olivelli 
> Sent: Tuesday, January 11, 2022 1:45 PM
> To: Dev 
> Subject: Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing
> Other Function States
>
> Thank you for posting your PIP !
>
> I am sharing some of Neng's concerns.
> We should define clearly how security works.
>
> Also, currently the function defines some "namespace" for the state
> storage, and we recently added support for custom state storage
> implementation. With this change each function will possibly access
> other state storage namespaces (think about using a Database per each
> tenant).
>
> We should also state something about guarantees while accessing
> multiple storages and/or about transactional (atomic?) access
>
>
> Enrico
>
> Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu  ha
> scritto:
> >
> > Before we advance further, we first need to get on the same page of the
> > pros and cons of allowing this feature.
> >
> > If functions can access (especially the write access) other functions'
> > state, the data ownership will be a mess, isolation is broken and data
> > security might be compromised.
> >
> >
> >
> >
> >
> > On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
> > wrote:
> >
> > > Original PIP:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fissues%2F13633data=04%7C01%7Cethan.merrill%40legrand.us%7Cedd443d57f1e41c8588408d9d543508e%7C199686b5bef4496087867a6b1888fee3%7C1%7C0%7C637775307351827417%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbCae%2FULTgUiV3pIrUQbOtzvPlilATc%2Bcn50I1eg0Iw%3Dreserved=0
> > >
> > > Pasted below for quoting convenience.
> > >
> > > -
> > >
> > > ## Motivation
> > >
> > > Certain uses of Pulsar functions could benefit from the ability to
> access
> > > the states of other functions. Currently functions can only access
> their
> > > own states, and so sharing information between functions requires other
> > > solutions such as writing to a separate database.
> > >
> > > ## Goal
> > >
> > > The goal is to enable the ability for a function to access another
> > > function's state. Given another function's tenant, namespace, and
> name, any
> > > function should be able to access the other function's state for read
> and
> > > write purposes. This PIP is not concerned with expanding the
> capabilities
> > > of function states, It only deals with expanding access to function
> states.
> > >
> > > ## API Changes
> >

Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-17 Thread Ethan Merrill
Thanks for the feedback. I see your concerns.

I've been thinking about some ways to mitigate these concerns without expanding 
the scope of this too much. First, I think it could be a good idea to limit 
state access to just functions within the same namespace. This will at least 
avoid any issues that might arise with different namespaces having different 
state storage implementations.

Another thing we could consider is making states read-only by other functions. 
This allows us to clearly define the owner of the data and avoid unexpected 
issues with multiple functions trying to change the same state. It would limit 
some potentially desirable functionality such as 2 functions being able to 
increment the same counter, but keeping the data ownership clearly defined may 
be more important for now.

Another option to think about could be adding a way to differentiate 
public/private states or defining what other functions are allowed to access 
certain states would ease security concerns. It would require some more 
development time, though, since it would complicate the current implementation 
a bit. We'd have to address how and where state access is defined.

Another option we could consider could be having a separate public state store 
that all functions in a namespace have access to. It would be simple to 
implement and would at least separate a function's private states from states 
it wants other functions to have access to. Data ownership is a bit messy with 
the public states for this solution, but it would at least provide a method of 
sharing data that needs to be shared.

Let me know if you have any thoughts on any of these changes


From: Enrico Olivelli 
Sent: Tuesday, January 11, 2022 1:45 PM
To: Dev 
Subject: Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other 
Function States

Thank you for posting your PIP !

I am sharing some of Neng's concerns.
We should define clearly how security works.

Also, currently the function defines some "namespace" for the state
storage, and we recently added support for custom state storage
implementation. With this change each function will possibly access
other state storage namespaces (think about using a Database per each
tenant).

We should also state something about guarantees while accessing
multiple storages and/or about transactional (atomic?) access


Enrico

Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu  ha scritto:
>
> Before we advance further, we first need to get on the same page of the
> pros and cons of allowing this feature.
>
> If functions can access (especially the write access) other functions'
> state, the data ownership will be a mess, isolation is broken and data
> security might be compromised.
>
>
>
>
>
> On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
> wrote:
>
> > Original PIP: 
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fissues%2F13633data=04%7C01%7Cethan.merrill%40legrand.us%7Cedd443d57f1e41c8588408d9d543508e%7C199686b5bef4496087867a6b1888fee3%7C1%7C0%7C637775307351827417%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbCae%2FULTgUiV3pIrUQbOtzvPlilATc%2Bcn50I1eg0Iw%3Dreserved=0
> >
> > Pasted below for quoting convenience.
> >
> > -
> >
> > ## Motivation
> >
> > Certain uses of Pulsar functions could benefit from the ability to access
> > the states of other functions. Currently functions can only access their
> > own states, and so sharing information between functions requires other
> > solutions such as writing to a separate database.
> >
> > ## Goal
> >
> > The goal is to enable the ability for a function to access another
> > function's state. Given another function's tenant, namespace, and name, any
> > function should be able to access the other function's state for read and
> > write purposes. This PIP is not concerned with expanding the capabilities
> > of function states, It only deals with expanding access to function states.
> >
> > ## API Changes
> >
> > The Pulsar function API would be modified to have the function context
> > implement the following interface for accessing function states using a
> > tenant, namespace, and name.
> >
> > ```
> > public interface SharedContext {
> > /**
> >  * Update the state value for the key.
> >  *
> >  * @param key   name of the key
> >  * @param value state value of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void putState(String key, ByteBuffer value, String te

Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-11 Thread Enrico Olivelli
Thank you for posting your PIP !

I am sharing some of Neng's concerns.
We should define clearly how security works.

Also, currently the function defines some "namespace" for the state
storage, and we recently added support for custom state storage
implementation. With this change each function will possibly access
other state storage namespaces (think about using a Database per each
tenant).

We should also state something about guarantees while accessing
multiple storages and/or about transactional (atomic?) access


Enrico

Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu  ha scritto:
>
> Before we advance further, we first need to get on the same page of the
> pros and cons of allowing this feature.
>
> If functions can access (especially the write access) other functions'
> state, the data ownership will be a mess, isolation is broken and data
> security might be compromised.
>
>
>
>
>
> On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
> wrote:
>
> > Original PIP: https://github.com/apache/pulsar/issues/13633
> >
> > Pasted below for quoting convenience.
> >
> > -
> >
> > ## Motivation
> >
> > Certain uses of Pulsar functions could benefit from the ability to access
> > the states of other functions. Currently functions can only access their
> > own states, and so sharing information between functions requires other
> > solutions such as writing to a separate database.
> >
> > ## Goal
> >
> > The goal is to enable the ability for a function to access another
> > function's state. Given another function's tenant, namespace, and name, any
> > function should be able to access the other function's state for read and
> > write purposes. This PIP is not concerned with expanding the capabilities
> > of function states, It only deals with expanding access to function states.
> >
> > ## API Changes
> >
> > The Pulsar function API would be modified to have the function context
> > implement the following interface for accessing function states using a
> > tenant, namespace, and name.
> >
> > ```
> > public interface SharedContext {
> > /**
> >  * Update the state value for the key.
> >  *
> >  * @param key   name of the key
> >  * @param value state value of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void putState(String key, ByteBuffer value, String tenant, String ns,
> > String name);
> >
> > /**
> >  * Update the state value for the key, but don't wait for the
> > operation to be completed
> >  *
> >  * @param key   name of the key
> >  * @param value state value of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > CompletableFuture putStateAsync(String key, ByteBuffer value,
> > String tenant, String ns, String name);
> >
> > /**
> >  * Retrieve the state value for the key.
> >  *
> >  * @param key name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  * @return the state value for the key.
> >  */
> > ByteBuffer getState(String key, String tenant, String ns, String name);
> >
> > /**
> >  * Retrieve the state value for the key, but don't wait for the
> > operation to be completed
> >  *
> >  * @param key name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  * @return the state value for the key.
> >  */
> > CompletableFuture getStateAsync(String key, String tenant,
> > String ns, String name);
> >
> > /**
> >  * Delete the state value for the key.
> >  *
> >  * @param key   name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void deleteState(String key, String tenant, String ns, String name);
> >
> > /**
> >  * Delete the state value for the key, but don't wait for the
> > operation to be completed
> >  *
> >  * @param key   name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > CompletableFuture deleteStateAsync(String key, String tenant,
> > String ns, String name);
> >
> > /**
> >  * Increment the builtin distributed counter referred by key.
> >  *
> >  * @param keyThe name of the key
> >  * @param amount The amount to be incremented
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void incrCounter(String key, long amount, String tenant, String ns,
> > String name);
> >
> >  

Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-11 Thread Neng Lu
Before we advance further, we first need to get on the same page of the
pros and cons of allowing this feature.

If functions can access (especially the write access) other functions'
state, the data ownership will be a mess, isolation is broken and data
security might be compromised.





On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
wrote:

> Original PIP: https://github.com/apache/pulsar/issues/13633
>
> Pasted below for quoting convenience.
>
> -
>
> ## Motivation
>
> Certain uses of Pulsar functions could benefit from the ability to access
> the states of other functions. Currently functions can only access their
> own states, and so sharing information between functions requires other
> solutions such as writing to a separate database.
>
> ## Goal
>
> The goal is to enable the ability for a function to access another
> function's state. Given another function's tenant, namespace, and name, any
> function should be able to access the other function's state for read and
> write purposes. This PIP is not concerned with expanding the capabilities
> of function states, It only deals with expanding access to function states.
>
> ## API Changes
>
> The Pulsar function API would be modified to have the function context
> implement the following interface for accessing function states using a
> tenant, namespace, and name.
>
> ```
> public interface SharedContext {
> /**
>  * Update the state value for the key.
>  *
>  * @param key   name of the key
>  * @param value state value of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void putState(String key, ByteBuffer value, String tenant, String ns,
> String name);
>
> /**
>  * Update the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key   name of the key
>  * @param value state value of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture putStateAsync(String key, ByteBuffer value,
> String tenant, String ns, String name);
>
> /**
>  * Retrieve the state value for the key.
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the state value for the key.
>  */
> ByteBuffer getState(String key, String tenant, String ns, String name);
>
> /**
>  * Retrieve the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the state value for the key.
>  */
> CompletableFuture getStateAsync(String key, String tenant,
> String ns, String name);
>
> /**
>  * Delete the state value for the key.
>  *
>  * @param key   name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void deleteState(String key, String tenant, String ns, String name);
>
> /**
>  * Delete the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key   name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture deleteStateAsync(String key, String tenant,
> String ns, String name);
>
> /**
>  * Increment the builtin distributed counter referred by key.
>  *
>  * @param keyThe name of the key
>  * @param amount The amount to be incremented
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void incrCounter(String key, long amount, String tenant, String ns,
> String name);
>
> /**
>  * Increment the builtin distributed counter referred by key
>  * but dont wait for the completion of the increment operation
>  *
>  * @param keyThe name of the key
>  * @param amount The amount to be incremented
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture incrCounterAsync(String key, long amount,
> String tenant, String ns, String name);
>
> /**
>  * Retrieve the counter value for the key.
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the amount of the counter value for this key
>  */
> long getCounter(String key, 

[DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-05 Thread Ethan Merrill
Original PIP: https://github.com/apache/pulsar/issues/13633

Pasted below for quoting convenience.

-

## Motivation

Certain uses of Pulsar functions could benefit from the ability to access the 
states of other functions. Currently functions can only access their own 
states, and so sharing information between functions requires other solutions 
such as writing to a separate database.

## Goal

The goal is to enable the ability for a function to access another function's 
state. Given another function's tenant, namespace, and name, any function 
should be able to access the other function's state for read and write 
purposes. This PIP is not concerned with expanding the capabilities of function 
states, It only deals with expanding access to function states.

## API Changes

The Pulsar function API would be modified to have the function context 
implement the following interface for accessing function states using a tenant, 
namespace, and name.

```
public interface SharedContext {
/**
 * Update the state value for the key.
 *
 * @param key   name of the key
 * @param value state value of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 */
void putState(String key, ByteBuffer value, String tenant, String ns, 
String name);

/**
 * Update the state value for the key, but don't wait for the operation to 
be completed
 *
 * @param key   name of the key
 * @param value state value of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 */
CompletableFuture putStateAsync(String key, ByteBuffer value, String 
tenant, String ns, String name);

/**
 * Retrieve the state value for the key.
 *
 * @param key name of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 * @return the state value for the key.
 */
ByteBuffer getState(String key, String tenant, String ns, String name);

/**
 * Retrieve the state value for the key, but don't wait for the operation 
to be completed
 *
 * @param key name of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 * @return the state value for the key.
 */
CompletableFuture getStateAsync(String key, String tenant, 
String ns, String name);

/**
 * Delete the state value for the key.
 *
 * @param key   name of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 */
void deleteState(String key, String tenant, String ns, String name);

/**
 * Delete the state value for the key, but don't wait for the operation to 
be completed
 *
 * @param key   name of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 */
CompletableFuture deleteStateAsync(String key, String tenant, String 
ns, String name);

/**
 * Increment the builtin distributed counter referred by key.
 *
 * @param keyThe name of the key
 * @param amount The amount to be incremented
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 */
void incrCounter(String key, long amount, String tenant, String ns, String 
name);

/**
 * Increment the builtin distributed counter referred by key
 * but dont wait for the completion of the increment operation
 *
 * @param keyThe name of the key
 * @param amount The amount to be incremented
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 */
CompletableFuture incrCounterAsync(String key, long amount, String 
tenant, String ns, String name);

/**
 * Retrieve the counter value for the key.
 *
 * @param key name of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 * @return the amount of the counter value for this key
 */
long getCounter(String key, String tenant, String ns, String name);

/**
 * Retrieve the counter value for the key, but don't wait
 * for the operation to be completed
 *
 * @param key name of the key
 * @param tenant the state tenant name
 * @param ns the state namespace name
 * @param name the state store name
 * @return the amount of the counter value for this key
 */
CompletableFuture getCounterAsync(String key, String tenant, String 
ns, String name);
}
```

And the python context would have the following added:
```
class Context(object):
  @abstractmethod