Re: [DISCUSS] Goals, Requirements and API for journaled events

Timothy Ward Wed, 02 Jan 2019 02:03:27 -0800

Hi all,

I think this is an interesting area to look at, but one thing I see immediately 
is that the API is being designed in a way to encourage lifecycle issues. 
Specifically the service interface “subscribe” method receives a consumer 
function from the client. It would be *much* better if the subscribe method did 
not take a consumer, but instead the Subscription returned by the subscribe 
method should return a PushStream.


Making this change avoids the provider implementation from having to maintain a 
registry of instances from client bundles (the listener pattern is “considered 
harmful” in OSGi), which can leak memory and/or class loaders as client bundles 
are started/stopped/updated. Allowing the client to create PushStream instances 
on demand gives finer grained control for the client over when the stream of 
data processing is closed (both from within and without the data stream), and 
provides easier fail-safe defaults for late-registering clients.

You obviously get the further advantages of PushStreams including buffering, 
windowing and transformation pipelines. Using this would allow for simpler 
optimisation of the fetch logic in the Kafka/Mongo/Memory client when 
processing bulk messages from history.

Best Regards,

Tim

On 2 Jan 2019, at 07:30, Christian Schneider 
<ch...@die-schneider.net<mailto:ch...@die-schneider.net>> wrote:

Am Mi., 2. Jan. 2019 um 02:05 Uhr schrieb Timothee Maret 
<tma...@apache.org<mailto:tma...@apache.org>
:

Hi,

I looked at the API considering how we could use it for our replication use
case. I identified one key concept that seemed to be missing, the indexing
of messages with monotonically increasing offsets.

For replication, we leverage those offsets extensively, for instance to
efficiently compute sub ranges of messages, to skip range of messages, to
delay processing of messages, to clean up resources, etc. If we want to
leverage the journaled event API to guarantee portability, it seems to me
that we'd need to have the offset or an equivalent construct part of the
API.

How about adding a "getOffset" signature and documenting the offset
requirement in the Position interface ?


I just started implementing the in memory impl of the API and also used an
offset.
For the cases I know an offset makes sense. Alexei and I were just unsure
if the offset
is really a general abstraction. If we all agree an offset makes sense then
I am in favour of adding it.
Actually in the case of no partitions (wich we currently assume) the
position is not more than an offset.


Another unclear intention to me in the API, is the support for partitions
(similar to Kafka). The documentation indicates it is not a goal, however
the API seems to contain some hints for multi-partition support such as the
"TopicPosition" interface. How about supporting multiple partitions in the
API by allowing to specify a key (with a semantic similar to Kafka) in the
"newMessage" signature ?


I removed the TopicPosition interface again a few days ago. It was not part
of the API Alexei and I discussed and makes no
sense when we limit ourself to no partitions (or 1 partition in case of
kafka).
So the main question is if limiting ourselves is a good idea. I think it is
but I would be very interested in other opinions.

Cheers
Christian

--
--
Christian Schneider
http://www.liquid-reality.de<http://www.liquid-reality.de/>

Computer Scientist
http://www.adobe.com<http://www.adobe.com/>

Re: [DISCUSS] Goals, Requirements and API for journaled events

Reply via email to