Hey Colt! Thanks for the KIP -- this will be a great addition to Streams, I
can't believe we've gone so long without this.

Overall the proposal makes sense, but I had a handful of fairly minor
questions and suggestions/requests

1. Seems like the last sentence in the 2nd paragraph of the Motivation
section is cut off and incomplete -- "want to be able to know " what
exactly?

2. This isn't that important since the motivation as a whole is clear to me
and convincing enough, but I'm not quite sure I understand the example at
the end of the Motivation section. How are standby tasks (and the ability
to hook into and monitor their status) related to the session.timeout.ms
config?

3. To help both old and new users of Kafka Streams understand this new
restore listener and its purpose/semantics, can we try to name the class and
 callbacks in a way that's more consistent with the active task restore
listener?

3a. StandbyTaskUpdateListener:
The existing restore listener is called StateRestoreListener, so the new
one could be called something like StandbyStateRestoreListener. Although
we typically refer to standby tasks as "processing" rather than "restoring"
records -- ie restoration is a term for active task state specifically. I
actually
like the original suggestion if we just drop the "Task" part of the name,
ie StandbyUpdateListener. I think either that or StandbyRestoreListener
would be fine and probably the two best options.
Also, this probably goes without saying but any change to the name of this
class should of course be reflected in the KafkaStreams#setXXX API as well

3b. #onTaskCreated
 I know the "start" callback feels a bit different for the standby task
updater vs an active task beginning restoration, but I think we should try
to
keep the various callbacks aligned to their active restore listener
counterpart. We can/should just replace the term "restore" with "update"
for the
callback method names the same way we do for the class name, which in this
case would give us #onUpdateStart. Personally I like this better,
but it's ultimately up to you. However, I would push back against anything
that includes the word "Task" (eg #onTaskCreated) as the listener
 is actually not scoped to the task itself but instead to the individual
state store(s). This is the main reason I would prefer calling it something
like #onUpdateStart, which keeps the focus on the store being updated
rather than the task that just happens to own this store
One last thing on this callback -- do we really need both the
`earliestOffset` and `startingOffset`? I feel like this might be more
confusing than it
is helpful (tbh even I'm not completely sure I know what the earliestOffset
is supposed to represent) More importantly, is this all information
that is already available and able to be passed in to the callback by
Streams? I haven't checked on this but it feels like the earliestOffset is
likely to require a remote call, either by the embedded consumer or via the
admin client. If so, the ROI on including this parameter seems
quite low (if not outright negative)

3c. #onBatchRestored
If we opt to use the term "update" in place of "restore" elsewhere, then we
should consider doing so here as well. What do you think about
#onBatchUpdated, or even #onBatchProcessed?
I'm actually not super concerned about this particular API, and honestly I
think we can use restore or update interchangeably here, so if you
 don't like any of the suggested names (and no one can think of anything
better), I would just stick with #onBatchRestored. In this case,
it kind of makes the most sense.

3d. #onTaskSuspended
Along the same lines as 3b above, #onUpdateSuspended or just
#onRestoreSuspended probably makes more sense for this callback. Also,
 I notice the StateRestoreListener passes in the total number of records
restored to its #onRestoreSuspended. Assuming we already track
that information in Streams and have it readily available to pass in at
whatever point we would be invoking this callback, that might be a
useful  parameter for the standby listener to have as well

4. I totally love the SuspendReason thing, just two notes/requests:

4a. Feel free to push back against adding onto the scope of this KIP, but
it would be great to expand the active state restore listener with this
SuspendReason enum as well. It would be really useful for both variants of
restore listener

4b. Assuming we do 4a, let's rename PROMOTED to RECYCLED -- for standby
tasks it means basically the same thing, the point is that active
tasks can also be recycled into standbys through the same mechanism. This
way they can share the SuspendReason enum -- not that it's
necessary for them to share, I just think it would be a good idea to keep
the two restore listeners aligned to the highest degree possible for as
we can.
I was actually considering proposing a short KIP with a new
RecyclingListener (or something) specifically for this exact kind of thing,
since we
currently have literally zero insight into the recycling process. It's
practically impossible to tell when a store has been converted from active
to
standby, or vice versa. So having access to the SuspendReason, and more
importantly having a callback guaranteed to notify you when a
state store is recycled whether active or standby, would be amazing.

Thanks for the KIP!

-Sophie "otterStandbyTaskUpdateListener :P" Blee-Goldman


---------- Forwarded message ---------
> From: Colt McNealy <c...@littlehorse.io>
> Date: Tue, Oct 3, 2023 at 12:48 PM
> Subject: [DISCUSS] KIP-988 Streams Standby Task Update Listener
> To: <dev@kafka.apache.org>
>
>
> Hi all,
>
> We would like to propose a small KIP to improve the ability of Streams apps
> to monitor the progress of their standby tasks through a callback
> interface.
>
> We have a nearly-working implementation on our fork and are curious for
> feedback.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-988%3A+Streams+Standby+Task+Update+Listener
>
> Thank you,
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>

Reply via email to