Hi Enrico,

This is interesting. I'm thinking of the KTables part of Kafka Streams and
also Raft state machines.

You could build something equivalent to a Raft state machine on top of
Pulsar where WaitForExclusive acts as leader election and the topic as the
log. I would be interested in a PIP, that also includes considerations for
things like:

   - how applications can persist an applied index to avoid having to
   rebuild state from scratch
   - how to manage topic size such that it isn't easy for users to end up
   with inconsistent views of the state. Things like checkpointing, may be
   having more control over topic data retention.

Also how does this compare to PIP-104 Table View? (
http://mail-archives.apache.org/mod_mbox/pulsar-dev/202110.mbox/%3cCA+JmKXbGf3CMy5dxypX=so-gyxrdh0pmttkhy3wnnm4azgw...@mail.gmail.com%3e
)

Finally, have we looked at what the market is asking for? What kind of
product strategy do we have regarding Pulsar? This kind of thing could end
up being highly valuable but also a big investment and it would be good to
know how the community is steering Pulsar to make it the most relevant it
can be.

Jack


On Wed, Nov 10, 2021 at 5:18 PM Enrico Olivelli <eolive...@gmail.com> wrote:

> [ External sender. Exercise caution. ]
>
> Hello,
> With Pulsar 2.8.0 we have the Exclusive Producer, which allows you to use
> Pulsar as a consistent write-ahead-log for replicated state machines.
>
> It already happened to me a couple of times to need to build some
> replicated state storage on top of Pulsar and I would like to share some
> thoughts.
>
> We can provide some simple built-in mechanism to share some "state"  across
> several instances of an application without adding some Database or other
> components to the architecture:
> - metadata
> - dynamic configuration
> - task assignments
> - key-value database
>
> In general we can provide an API to handle a shared distributed Java
> Object: each client can access the Object and mutate the State,
> ensuring consistency.
>
> I have drafted a small API to build such an abstraction:
>
> public interface PulsarDatabase<V, O> {
>
>     /**
>      * Read from the current state.
>      * @param reader a function that accesses current state and returns a
> value
>      * @param latest ensure that the value is the latest
>      * @return an handle to the result of the operation
>      */
>     <K> CompletableFuture<K> read(Function<V, K> reader, boolean latest);
>
>     /*
>      * Execute a mutation on the state.
>      * The operationsGenerator generates a list of mutations to be
>      * written to the log, the operationApplier function
>      * is executed to mutate the state after each successful write
>      * to the log. Finally the reader function can read from
>      * the current status before releasing the write lock.
>      * @param operationsGenerator generates a list of mutations
>      * @param operationApplier apply each mutation to the current state
>      * @param reader read from the status while inside the write lock
>      * @param <K> the returned data type
>      * @param <O> the operation type
>      * @return a handle to the completion of the operation
>      */
>     <K> CompletableFuture<K> write(Function<V, List<O>>
> operationsGenerator,
>                                      Function<V, K> reader);
> }
>
> Using this simple abstraction it is easy to build for instance a
> distributed Java "Map" like this
>
> https://github.com/eolivelli/pulsar-db/blob/main/src/main/java/org/apache/pulsar/db/PulsarMap.java
>
>
> I believe that we should add this feature to the Pulsar Client API,
> maybe we can start by adding this in the pulsar-adapters module as it can
> be loosely coupled with the core Pulsar Client
>
> Building distributed data structures on top of that API is simple,
> but the underlying implementation of the core APi is not straightforward,
> because there are many
> edge cases to deal with.
>
> If we provide some recipes that are available out-of-the-box we will
> unleash the secret power
> of Exclusive producer and we will allow more applications to migrate to
> Pulsar or to choose Pulsar as storage backbone.
>
> You can find the code here https://github.com/eolivelli/pulsar-db, it is
> only a proof-of-concept, but it is already usable.
>
> If there is an interest in this I will be happy to draft a PIP
> and also to send the implementation to the pulsar-adapters repository.
>
> Best regards
>
> Enrico
>

Reply via email to