Hi all, I think this is an interesting area to look at, but one thing I see immediately is that the API is being designed in a way to encourage lifecycle issues. Specifically the service interface “subscribe” method receives a consumer function from the client. It would be *much* better if the subscribe method did not take a consumer, but instead the Subscription returned by the subscribe method should return a PushStream.
Making this change avoids the provider implementation from having to maintain a registry of instances from client bundles (the listener pattern is “considered harmful” in OSGi), which can leak memory and/or class loaders as client bundles are started/stopped/updated. Allowing the client to create PushStream instances on demand gives finer grained control for the client over when the stream of data processing is closed (both from within and without the data stream), and provides easier fail-safe defaults for late-registering clients. You obviously get the further advantages of PushStreams including buffering, windowing and transformation pipelines. Using this would allow for simpler optimisation of the fetch logic in the Kafka/Mongo/Memory client when processing bulk messages from history. Best Regards, Tim On 2 Jan 2019, at 07:30, Christian Schneider <ch...@die-schneider.net<mailto:ch...@die-schneider.net>> wrote: Am Mi., 2. Jan. 2019 um 02:05 Uhr schrieb Timothee Maret <tma...@apache.org<mailto:tma...@apache.org> : Hi, I looked at the API considering how we could use it for our replication use case. I identified one key concept that seemed to be missing, the indexing of messages with monotonically increasing offsets. For replication, we leverage those offsets extensively, for instance to efficiently compute sub ranges of messages, to skip range of messages, to delay processing of messages, to clean up resources, etc. If we want to leverage the journaled event API to guarantee portability, it seems to me that we'd need to have the offset or an equivalent construct part of the API. How about adding a "getOffset" signature and documenting the offset requirement in the Position interface ? I just started implementing the in memory impl of the API and also used an offset. For the cases I know an offset makes sense. Alexei and I were just unsure if the offset is really a general abstraction. If we all agree an offset makes sense then I am in favour of adding it. Actually in the case of no partitions (wich we currently assume) the position is not more than an offset. Another unclear intention to me in the API, is the support for partitions (similar to Kafka). The documentation indicates it is not a goal, however the API seems to contain some hints for multi-partition support such as the "TopicPosition" interface. How about supporting multiple partitions in the API by allowing to specify a key (with a semantic similar to Kafka) in the "newMessage" signature ? I removed the TopicPosition interface again a few days ago. It was not part of the API Alexei and I discussed and makes no sense when we limit ourself to no partitions (or 1 partition in case of kafka). So the main question is if limiting ourselves is a good idea. I think it is but I would be very interested in other opinions. Cheers Christian -- -- Christian Schneider http://www.liquid-reality.de<http://www.liquid-reality.de/> Computer Scientist http://www.adobe.com<http://www.adobe.com/>