Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States
I am concerned that sharing function state will lead to tightly coupled functions. I tend to think of functions as actors, according to the actor model [0]. As such, I view the state of a function (actor) as private and as something that can only be modified by other functions (actors) indirectly through messaging. Technically, our documentation does not state that functions are completely isolated or that they are actors, but I think we should consider making it part of their design. Without isolation, functions could become very coupled, and that will introduce additional challenges related to function scaling, version management, and security. > Certain uses of Pulsar functions could benefit from the ability to access > the states of other functions. Currently functions can only access their > own states, and so sharing information between functions requires other > solutions such as writing to a separate database. Instead of using a separate database, have you considered sharing state by publishing messages to a topic that another function is listening to? - Michael [0] - https://en.wikipedia.org/wiki/Actor_model On Tue, Jan 18, 2022 at 11:38 AM Jerry Peng wrote: > > I have concerns about security in this case and potential consistency > issues. We will need to define and implement some sort of ACLs system > first on top of state for this to make sense. > > On Mon, Jan 17, 2022 at 5:52 PM Ethan Merrill > wrote: > > > Thanks for the feedback. I see your concerns. > > > > I've been thinking about some ways to mitigate these concerns without > > expanding the scope of this too much. First, I think it could be a good > > idea to limit state access to just functions within the same namespace. > > This will at least avoid any issues that might arise with different > > namespaces having different state storage implementations. > > > > Another thing we could consider is making states read-only by other > > functions. This allows us to clearly define the owner of the data and avoid > > unexpected issues with multiple functions trying to change the same state. > > It would limit some potentially desirable functionality such as 2 functions > > being able to increment the same counter, but keeping the data ownership > > clearly defined may be more important for now. > > > > Another option to think about could be adding a way to differentiate > > public/private states or defining what other functions are allowed to > > access certain states would ease security concerns. It would require some > > more development time, though, since it would complicate the current > > implementation a bit. We'd have to address how and where state access is > > defined. > > > > Another option we could consider could be having a separate public state > > store that all functions in a namespace have access to. It would be simple > > to implement and would at least separate a function's private states from > > states it wants other functions to have access to. Data ownership is a bit > > messy with the public states for this solution, but it would at least > > provide a method of sharing data that needs to be shared. > > > > Let me know if you have any thoughts on any of these changes > > > > > > From: Enrico Olivelli > > Sent: Tuesday, January 11, 2022 1:45 PM > > To: Dev > > Subject: Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing > > Other Function States > > > > Thank you for posting your PIP ! > > > > I am sharing some of Neng's concerns. > > We should define clearly how security works. > > > > Also, currently the function defines some "namespace" for the state > > storage, and we recently added support for custom state storage > > implementation. With this change each function will possibly access > > other state storage namespaces (think about using a Database per each > > tenant). > > > > We should also state something about guarantees while accessing > > multiple storages and/or about transactional (atomic?) access > > > > > > Enrico > > > > Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu ha > > scritto: > > > > > > Before we advance further, we first need to get on the same page of the > > > pros and cons of allowing this feature. > > > > > > If functions can access (especially the write access) other functions' > > > state, the data ownership will be a mess, isolation is broken and data > > > security might be compromised. > > > > > > > > > > > > > > > > > > O
Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States
I have concerns about security in this case and potential consistency issues. We will need to define and implement some sort of ACLs system first on top of state for this to make sense. On Mon, Jan 17, 2022 at 5:52 PM Ethan Merrill wrote: > Thanks for the feedback. I see your concerns. > > I've been thinking about some ways to mitigate these concerns without > expanding the scope of this too much. First, I think it could be a good > idea to limit state access to just functions within the same namespace. > This will at least avoid any issues that might arise with different > namespaces having different state storage implementations. > > Another thing we could consider is making states read-only by other > functions. This allows us to clearly define the owner of the data and avoid > unexpected issues with multiple functions trying to change the same state. > It would limit some potentially desirable functionality such as 2 functions > being able to increment the same counter, but keeping the data ownership > clearly defined may be more important for now. > > Another option to think about could be adding a way to differentiate > public/private states or defining what other functions are allowed to > access certain states would ease security concerns. It would require some > more development time, though, since it would complicate the current > implementation a bit. We'd have to address how and where state access is > defined. > > Another option we could consider could be having a separate public state > store that all functions in a namespace have access to. It would be simple > to implement and would at least separate a function's private states from > states it wants other functions to have access to. Data ownership is a bit > messy with the public states for this solution, but it would at least > provide a method of sharing data that needs to be shared. > > Let me know if you have any thoughts on any of these changes > > ____ > From: Enrico Olivelli > Sent: Tuesday, January 11, 2022 1:45 PM > To: Dev > Subject: Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing > Other Function States > > Thank you for posting your PIP ! > > I am sharing some of Neng's concerns. > We should define clearly how security works. > > Also, currently the function defines some "namespace" for the state > storage, and we recently added support for custom state storage > implementation. With this change each function will possibly access > other state storage namespaces (think about using a Database per each > tenant). > > We should also state something about guarantees while accessing > multiple storages and/or about transactional (atomic?) access > > > Enrico > > Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu ha > scritto: > > > > Before we advance further, we first need to get on the same page of the > > pros and cons of allowing this feature. > > > > If functions can access (especially the write access) other functions' > > state, the data ownership will be a mess, isolation is broken and data > > security might be compromised. > > > > > > > > > > > > On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill > > wrote: > > > > > Original PIP: > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fissues%2F13633data=04%7C01%7Cethan.merrill%40legrand.us%7Cedd443d57f1e41c8588408d9d543508e%7C199686b5bef4496087867a6b1888fee3%7C1%7C0%7C637775307351827417%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbCae%2FULTgUiV3pIrUQbOtzvPlilATc%2Bcn50I1eg0Iw%3Dreserved=0 > > > > > > Pasted below for quoting convenience. > > > > > > - > > > > > > ## Motivation > > > > > > Certain uses of Pulsar functions could benefit from the ability to > access > > > the states of other functions. Currently functions can only access > their > > > own states, and so sharing information between functions requires other > > > solutions such as writing to a separate database. > > > > > > ## Goal > > > > > > The goal is to enable the ability for a function to access another > > > function's state. Given another function's tenant, namespace, and > name, any > > > function should be able to access the other function's state for read > and > > > write purposes. This PIP is not concerned with expanding the > capabilities > > > of function states, It only deals with expanding access to function > states. > > > > > > ## API Changes > >
Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States
Thanks for the feedback. I see your concerns. I've been thinking about some ways to mitigate these concerns without expanding the scope of this too much. First, I think it could be a good idea to limit state access to just functions within the same namespace. This will at least avoid any issues that might arise with different namespaces having different state storage implementations. Another thing we could consider is making states read-only by other functions. This allows us to clearly define the owner of the data and avoid unexpected issues with multiple functions trying to change the same state. It would limit some potentially desirable functionality such as 2 functions being able to increment the same counter, but keeping the data ownership clearly defined may be more important for now. Another option to think about could be adding a way to differentiate public/private states or defining what other functions are allowed to access certain states would ease security concerns. It would require some more development time, though, since it would complicate the current implementation a bit. We'd have to address how and where state access is defined. Another option we could consider could be having a separate public state store that all functions in a namespace have access to. It would be simple to implement and would at least separate a function's private states from states it wants other functions to have access to. Data ownership is a bit messy with the public states for this solution, but it would at least provide a method of sharing data that needs to be shared. Let me know if you have any thoughts on any of these changes From: Enrico Olivelli Sent: Tuesday, January 11, 2022 1:45 PM To: Dev Subject: Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States Thank you for posting your PIP ! I am sharing some of Neng's concerns. We should define clearly how security works. Also, currently the function defines some "namespace" for the state storage, and we recently added support for custom state storage implementation. With this change each function will possibly access other state storage namespaces (think about using a Database per each tenant). We should also state something about guarantees while accessing multiple storages and/or about transactional (atomic?) access Enrico Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu ha scritto: > > Before we advance further, we first need to get on the same page of the > pros and cons of allowing this feature. > > If functions can access (especially the write access) other functions' > state, the data ownership will be a mess, isolation is broken and data > security might be compromised. > > > > > > On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill > wrote: > > > Original PIP: > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fissues%2F13633data=04%7C01%7Cethan.merrill%40legrand.us%7Cedd443d57f1e41c8588408d9d543508e%7C199686b5bef4496087867a6b1888fee3%7C1%7C0%7C637775307351827417%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbCae%2FULTgUiV3pIrUQbOtzvPlilATc%2Bcn50I1eg0Iw%3Dreserved=0 > > > > Pasted below for quoting convenience. > > > > - > > > > ## Motivation > > > > Certain uses of Pulsar functions could benefit from the ability to access > > the states of other functions. Currently functions can only access their > > own states, and so sharing information between functions requires other > > solutions such as writing to a separate database. > > > > ## Goal > > > > The goal is to enable the ability for a function to access another > > function's state. Given another function's tenant, namespace, and name, any > > function should be able to access the other function's state for read and > > write purposes. This PIP is not concerned with expanding the capabilities > > of function states, It only deals with expanding access to function states. > > > > ## API Changes > > > > The Pulsar function API would be modified to have the function context > > implement the following interface for accessing function states using a > > tenant, namespace, and name. > > > > ``` > > public interface SharedContext { > > /** > > * Update the state value for the key. > > * > > * @param key name of the key > > * @param value state value of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > */ > > void putState(String key, ByteBuffer value, String te
Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States
Thank you for posting your PIP ! I am sharing some of Neng's concerns. We should define clearly how security works. Also, currently the function defines some "namespace" for the state storage, and we recently added support for custom state storage implementation. With this change each function will possibly access other state storage namespaces (think about using a Database per each tenant). We should also state something about guarantees while accessing multiple storages and/or about transactional (atomic?) access Enrico Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu ha scritto: > > Before we advance further, we first need to get on the same page of the > pros and cons of allowing this feature. > > If functions can access (especially the write access) other functions' > state, the data ownership will be a mess, isolation is broken and data > security might be compromised. > > > > > > On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill > wrote: > > > Original PIP: https://github.com/apache/pulsar/issues/13633 > > > > Pasted below for quoting convenience. > > > > - > > > > ## Motivation > > > > Certain uses of Pulsar functions could benefit from the ability to access > > the states of other functions. Currently functions can only access their > > own states, and so sharing information between functions requires other > > solutions such as writing to a separate database. > > > > ## Goal > > > > The goal is to enable the ability for a function to access another > > function's state. Given another function's tenant, namespace, and name, any > > function should be able to access the other function's state for read and > > write purposes. This PIP is not concerned with expanding the capabilities > > of function states, It only deals with expanding access to function states. > > > > ## API Changes > > > > The Pulsar function API would be modified to have the function context > > implement the following interface for accessing function states using a > > tenant, namespace, and name. > > > > ``` > > public interface SharedContext { > > /** > > * Update the state value for the key. > > * > > * @param key name of the key > > * @param value state value of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > */ > > void putState(String key, ByteBuffer value, String tenant, String ns, > > String name); > > > > /** > > * Update the state value for the key, but don't wait for the > > operation to be completed > > * > > * @param key name of the key > > * @param value state value of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > */ > > CompletableFuture putStateAsync(String key, ByteBuffer value, > > String tenant, String ns, String name); > > > > /** > > * Retrieve the state value for the key. > > * > > * @param key name of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > * @return the state value for the key. > > */ > > ByteBuffer getState(String key, String tenant, String ns, String name); > > > > /** > > * Retrieve the state value for the key, but don't wait for the > > operation to be completed > > * > > * @param key name of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > * @return the state value for the key. > > */ > > CompletableFuture getStateAsync(String key, String tenant, > > String ns, String name); > > > > /** > > * Delete the state value for the key. > > * > > * @param key name of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > */ > > void deleteState(String key, String tenant, String ns, String name); > > > > /** > > * Delete the state value for the key, but don't wait for the > > operation to be completed > > * > > * @param key name of the key > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > */ > > CompletableFuture deleteStateAsync(String key, String tenant, > > String ns, String name); > > > > /** > > * Increment the builtin distributed counter referred by key. > > * > > * @param keyThe name of the key > > * @param amount The amount to be incremented > > * @param tenant the state tenant name > > * @param ns the state namespace name > > * @param name the state store name > > */ > > void incrCounter(String key, long amount, String tenant, String ns, > > String name); > > > >
Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States
Before we advance further, we first need to get on the same page of the pros and cons of allowing this feature. If functions can access (especially the write access) other functions' state, the data ownership will be a mess, isolation is broken and data security might be compromised. On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill wrote: > Original PIP: https://github.com/apache/pulsar/issues/13633 > > Pasted below for quoting convenience. > > - > > ## Motivation > > Certain uses of Pulsar functions could benefit from the ability to access > the states of other functions. Currently functions can only access their > own states, and so sharing information between functions requires other > solutions such as writing to a separate database. > > ## Goal > > The goal is to enable the ability for a function to access another > function's state. Given another function's tenant, namespace, and name, any > function should be able to access the other function's state for read and > write purposes. This PIP is not concerned with expanding the capabilities > of function states, It only deals with expanding access to function states. > > ## API Changes > > The Pulsar function API would be modified to have the function context > implement the following interface for accessing function states using a > tenant, namespace, and name. > > ``` > public interface SharedContext { > /** > * Update the state value for the key. > * > * @param key name of the key > * @param value state value of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > */ > void putState(String key, ByteBuffer value, String tenant, String ns, > String name); > > /** > * Update the state value for the key, but don't wait for the > operation to be completed > * > * @param key name of the key > * @param value state value of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > */ > CompletableFuture putStateAsync(String key, ByteBuffer value, > String tenant, String ns, String name); > > /** > * Retrieve the state value for the key. > * > * @param key name of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > * @return the state value for the key. > */ > ByteBuffer getState(String key, String tenant, String ns, String name); > > /** > * Retrieve the state value for the key, but don't wait for the > operation to be completed > * > * @param key name of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > * @return the state value for the key. > */ > CompletableFuture getStateAsync(String key, String tenant, > String ns, String name); > > /** > * Delete the state value for the key. > * > * @param key name of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > */ > void deleteState(String key, String tenant, String ns, String name); > > /** > * Delete the state value for the key, but don't wait for the > operation to be completed > * > * @param key name of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > */ > CompletableFuture deleteStateAsync(String key, String tenant, > String ns, String name); > > /** > * Increment the builtin distributed counter referred by key. > * > * @param keyThe name of the key > * @param amount The amount to be incremented > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > */ > void incrCounter(String key, long amount, String tenant, String ns, > String name); > > /** > * Increment the builtin distributed counter referred by key > * but dont wait for the completion of the increment operation > * > * @param keyThe name of the key > * @param amount The amount to be incremented > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > */ > CompletableFuture incrCounterAsync(String key, long amount, > String tenant, String ns, String name); > > /** > * Retrieve the counter value for the key. > * > * @param key name of the key > * @param tenant the state tenant name > * @param ns the state namespace name > * @param name the state store name > * @return the amount of the counter value for this key > */ > long getCounter(String key,
[DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States
Original PIP: https://github.com/apache/pulsar/issues/13633 Pasted below for quoting convenience. - ## Motivation Certain uses of Pulsar functions could benefit from the ability to access the states of other functions. Currently functions can only access their own states, and so sharing information between functions requires other solutions such as writing to a separate database. ## Goal The goal is to enable the ability for a function to access another function's state. Given another function's tenant, namespace, and name, any function should be able to access the other function's state for read and write purposes. This PIP is not concerned with expanding the capabilities of function states, It only deals with expanding access to function states. ## API Changes The Pulsar function API would be modified to have the function context implement the following interface for accessing function states using a tenant, namespace, and name. ``` public interface SharedContext { /** * Update the state value for the key. * * @param key name of the key * @param value state value of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name */ void putState(String key, ByteBuffer value, String tenant, String ns, String name); /** * Update the state value for the key, but don't wait for the operation to be completed * * @param key name of the key * @param value state value of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name */ CompletableFuture putStateAsync(String key, ByteBuffer value, String tenant, String ns, String name); /** * Retrieve the state value for the key. * * @param key name of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name * @return the state value for the key. */ ByteBuffer getState(String key, String tenant, String ns, String name); /** * Retrieve the state value for the key, but don't wait for the operation to be completed * * @param key name of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name * @return the state value for the key. */ CompletableFuture getStateAsync(String key, String tenant, String ns, String name); /** * Delete the state value for the key. * * @param key name of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name */ void deleteState(String key, String tenant, String ns, String name); /** * Delete the state value for the key, but don't wait for the operation to be completed * * @param key name of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name */ CompletableFuture deleteStateAsync(String key, String tenant, String ns, String name); /** * Increment the builtin distributed counter referred by key. * * @param keyThe name of the key * @param amount The amount to be incremented * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name */ void incrCounter(String key, long amount, String tenant, String ns, String name); /** * Increment the builtin distributed counter referred by key * but dont wait for the completion of the increment operation * * @param keyThe name of the key * @param amount The amount to be incremented * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name */ CompletableFuture incrCounterAsync(String key, long amount, String tenant, String ns, String name); /** * Retrieve the counter value for the key. * * @param key name of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name * @return the amount of the counter value for this key */ long getCounter(String key, String tenant, String ns, String name); /** * Retrieve the counter value for the key, but don't wait * for the operation to be completed * * @param key name of the key * @param tenant the state tenant name * @param ns the state namespace name * @param name the state store name * @return the amount of the counter value for this key */ CompletableFuture getCounterAsync(String key, String tenant, String ns, String name); } ``` And the python context would have the following added: ``` class Context(object): @abstractmethod