Greg Fodor created KAFKA-4043:
---------------------------------
Summary: User-defined handler for topology restart
Key: KAFKA-4043
URL: https://issues.apache.org/jira/browse/KAFKA-4043
Project: Kafka
Issue Type: Improvement
Components: streams
Affects Versions: 0.10.0.1
Reporter: Greg Fodor
Assignee: Guozhang Wang
Since Kafka Streams is just a library, there's a lot of cool stuff we've been
able to do that would be trickier if it were part of a larger cluster-oriented
job execution system that had assumptions about the semantics of a job. One of
the jobs we have uses Kafka Streams to do top level data flow, and then one of
our processors actually will kick off background threads to do work based upon
the data flow state. Happy to fill in more details of our use-case, but
fundamentally the model is that we have a Kafka Streams data flow that is
reading state from upstream, and that state dictates that work needs to be
done, which results in a dedicated work thread to be spawned by our job.
This works great, but we're running into an issue when there is partition
reassignment, since we have no way to detect this and cleanly shut down these
threads. In our case, we'd like to shut down the background worker threads if
there is a partition rebalance or if the job raises an exception and attempts
to restart. In practice what is happening is we are getting duplicate threads
for the same work on a partition rebalance.
Implementation-wise, this seems like some type of event handler that can be
attached to the topology at build time that can will be called when the data
flow needs to rebalance or rebuild its task threads in general (ideally passing
as much information about the reason along.) I could imagine this being
factored similarly to the KafkaStreams#setUncaughtExceptionHandler.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)