Re: [DISCUSS] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

Guozhang Wang Sat, 19 Sep 2020 17:25:04 -0700

I'm fine with always removing threads at DEAD state from
localThreadsMetadata().


On the other hand, if we want to give users more info for debugging, we can
consider eating the complexity ourselves by guaranteeing that, within the
user registered uncaught exception handler, the localThreadsMetadata()
would include the thread who's dying / throwing at that moment.
Implementation wise we can always register our own internal handler which
calles the user registered one if it is set, and only after that we remove
the thread metadata from the instance cache. Personally I think it is a bit
too much, so I'm in favor of "adding it later when people complain" :)


Guozhang


On Fri, Sep 18, 2020 at 5:17 PM Sophie Blee-Goldman <sop...@confluent.io>
wrote:

> Makes sense to me :)
>
> On Thu, Sep 17, 2020 at 9:34 AM Bruno Cadonna <br...@confluent.io> wrote:
>
> > Hi Sophie,
> >
> > Thank you for the feedback! I replied inline.
> >
> > Best,
> > Bruno
> >
> > On 16.09.20 19:19, Sophie Blee-Goldman wrote:
> > >>
> > >> We guarantee that the metadata of the dead stream threads  will be
> > >> returned by KafkaStreams#localThreadsMetadata() at least until the
> next
> > >> call to KafkaStreams#addStreamThread() or
> > >> KafkaStreams#removeStreamThread() after the stream thread transited to
> > >> DEAD
> > >
> > >
> > > This seems kind of tricky...personally I would find it pretty odd if I
> > > queried the
> > > local thread metadata and found two threads, A (alive) and B (dead),
> and
> > > then
> > > called removeStreamThread() and now suddenly I have zero. Or if I call
> > > addStreamThread and now I still have two threads.
> > >
> >
> > The behavior might be unusual, but it is well defined and not random by
> > any means.
> >
> > > Both of those results seem to indicate that only live threads "count"
> and
> > > are returned
> > > by localThreadsMetadata(). But in reality we do temporarily keep the
> dead
> > > thread,
> > > but only for the arbitrary amount of time until the next time you want
> to
> > > add or
> > > remove some other stream thread? That seems like a weird side effect of
> > the
> > > add/removeStreamThread APIs.
> > >
> >
> > This is not a side effect that just happens to occur. This is a
> > guarantee that users get. It gives users the possibility to retrieve the
> > metadata of the dead stream threads since the last call to
> > add/removeStreamThread. Admittedly, this guarantee overlap with the
> > current/planned implementation. But that is more a coincidence.
> >
> > I would be more concerned about when add/removeStreamThread is called
> > from different threads which could happen if an uncaught exception
> > handler is called that wants to replace a stream thread and a thread
> > that is responsible for automated scaling up is running.
> >
> > > If we really think users might want to log the metadata of dead
> threads,
> > > then
> > > let's just do that for them or give them a way to do exactly that.
> > >
> >
> > Logging the metatdata of dead stream threads for the user is a valid
> > alternative. Giving users the way to do exactly that is hard because the
> > StreamThread class is not part of the public API. They would always need
> > to call a method on the KafkaStreams object where we already have
> > localThreadsMetadata().
> >
> > > I'm not that concerned about the backwards compatibility of removing
> dead
> > > threads from the localThreadsMetadata, because I find it hard to
> believe
> > > that
> > > users do anything other than just skip over them in the list (set?)
> that
> > > gets
> > > returned. But maybe someone can chime in with an example use case.
> > >
> >
> > I am also not too much concerned about backwards compatibility. That
> > would indeed be a side effect of the current proposal.
> >
> > > I'm actually even a little skeptical that any users might want to log
> the
> > > metadata of a
> > > dead thread, since all of the metadata is only useful for IQ on live
> > > threads or
> > > already covered by other easily discoverable logging elsewhere, or
> both.
> > >
> >
> > Said all of the above, I actually agree with you that there is not that
> > much information in the metadata of a dead stream thread that is
> > interesting. The name of the stream thread is known in the uncaught
> > exception handler. The names of the clients, like consumer etc., used by
> > the stream thread can be derived from the name of the stream thread.
> > Finally, the sets of active and standby tasks should be empty for a dead
> > stream thread.
> >
> > Hence, I backpedal and propose to filter out dead stream threads from
> > localThreadsMetadata(). WDYT?
> >
> > > On Wed, Sep 16, 2020 at 2:07 AM Bruno Cadonna <br...@confluent.io>
> > wrote:
> > >
> > >> Hi again,
> > >>
> > >> I just realized that if we filter out DEAD stream threads in
> > >> localThreadsMetadata(), users cannot log the metadata of dying stream
> > >> threads in the uncaught exception handler.
> > >>
> > >> I realized this thanks to the example Guozhang requested in the KIP.
> > >> Thank you for that, Guozhang!
> > >>
> > >> Hence, I adapted the KIP as follows:
> > >>
> > >> - We do not filter out DEAD stream threads in
> > >> KafkaStreams#localThreadsMetadata()
> > >>
> > >> - We guarantee that the metadata of the dead stream threads  will be
> > >> returned by KafkaStreams#localThreadsMetadata() at least until the
> next
> > >> call to KafkaStreams#addStreamThread() or
> > >> KafkaStreams#removeStreamThread() after the stream thread transited to
> > >> DEAD. Besides giving users the opportunity to log the metadata of a
> > >> dying stream thread in its uncaught exception handler, this guarantee
> > >> makes KafkaStreams#localThreadsMetadata() completely backward
> compatible
> > >> to the current behavior, because if KafkaStreams#addStreamThread() and
> > >> KafkaStreams#removeStreamThread() are never called,
> > >> KafkaStreams#localThreadsMetadata() will also return the metadata of
> all
> > >> streams threads that have ever died which corresponds to the current
> > >> behavior.
> > >>
> > >> - We guarantee that dead stream threads are removed from a Kafka
> Streams
> > >> client at latest after the next call to KafkaStreams#addStreamThread()
> > >> or KafkaStreams#removeStreamThread() following the transition of the
> > >> stream thread to DEAD. This guarantees that the number of maintained
> > >> stream threads does not grow indefinitely.
> > >>
> > >>
> > >> Best,
> > >> Bruno
> > >>
> > >>
> > >>
> > >> On 16.09.20 09:23, Bruno Cadonna wrote:
> > >>> Hi Guozhang,
> > >>>
> > >>> Good point! I would propose to filter out DEAD stream threads in
> > >>> localThreadsMetadata() to get consistent results that do not depend
> on
> > >>> timing. I will update the KIP accordingly.
> > >>>
> > >>> Best,
> > >>> Bruno
> > >>>
> > >>> On 16.09.20 06:02, Guozhang Wang wrote:
> > >>>> Thanks Bruno, your replies make sense to me. As for
> > >>>> localThreadsMetadata() itself,
> > >>>> I'd like to clarify if it would return any still-bookkept threads,
> or
> > >>>> would
> > >>>> it specifically filter out those DEAD threads even if they are not
> yet
> > >>>> removed.
> > >>>>
> > >>>> Otherwise, the KIP LGTM.
> > >>>>
> > >>>> Guozhang
> > >>>>
> > >>>> On Tue, Sep 15, 2020 at 2:58 AM Bruno Cadonna <br...@confluent.io>
> > >> wrote:
> > >>>>
> > >>>>> Hi Guozhang,
> > >>>>>
> > >>>>> Thank you for your feedback. I replied inline.
> > >>>>>
> > >>>>> Best,
> > >>>>> Bruno
> > >>>>>
> > >>>>> On 09.09.20 23:43, Guozhang Wang wrote:
> > >>>>>> Hello Bruno,
> > >>>>>>
> > >>>>>> Finally got some time to review your KIP and the discussion thread
> > >>>>>> now..
> > >>>>> a
> > >>>>>> few comments below:
> > >>>>>>
> > >>>>>> 1) I'm with Matthias about the newly added
> > numberOfAliveStreamThreads
> > >>>>> v.s.
> > >>>>>> existing localThreadsMetadata: to me it seems we can always
> achieve
> > >> the
> > >>>>>> first based on the second. It seems not worthy to provide some
> > "syntax
> > >>>>>> sugar" to the API but just let users do the filtering themselves.
> > >>>>>
> > >>>>> I am not married to that method. I removed it.
> > >>>>>
> > >>>>>> Furthermore, I'm wondering what's the rationale behind removing
> the
> > >>>>>> DEAD
> > >>>>>> threads from localThreadsMetadata()? Personally I feel returning
> all
> > >>>>>> threads, including those who are ever closed, either due to
> > >>>>>> exception or
> > >>>>>> due to removeStreamThread, would be beneficial for debugging
> > >>>>>> purposes, as
> > >>>>>> long as within the lifetime of an instance we expect the amount of
> > >> such
> > >>>>>> dead threads will not increase linearly --- and if we agree with
> > that,
> > >>>>>> maybe we can rename "removeStreamThread" to sth. like
> > >>>>>> "terminateStreamThread" indicating it is only terminated but not
> > >>>>>> removed
> > >>>>>> --- and of course if users do not want to see those DEAD threads
> > >>>>>> they can
> > >>>>>> always filter them out. I'm just proposing that we should still
> > >>>>>> leave the
> > >>>>>> door open for those who want to check those ever terminated
> threads.
> > >>>>>>
> > >>>>>
> > >>>>> I actually think the number of dead stream threads might increase
> > >>>>> linearly. Assume users have a systematic error that continuously
> > kills
> > >> a
> > >>>>> stream thread and they blindly start a new stream thread in the
> > >> uncaught
> > >>>>> exception handler. This scenario might be a mistake but if the
> > >>>>> systematic error does not occur at a high rate, it could also be a
> > >>>>> strategy to keep the application running during the investigation
> of
> > >> the
> > >>>>> systematic error.
> > >>>>>
> > >>>>> IMO, removing dead stream threads makes Kafka Streams more robust
> > >>>>> because it prevent a possibly unbounded increase of memory usage.
> If
> > >>>>> users want to debug the dead stream threads they can monitor the
> > number
> > >>>>> of dead threads with the metric proposed in the KIP and they could
> > >>>>> additionally log the metadata of the dying stream thread in the
> > >> uncaught
> > >>>>> exception handler. I do not think that there is need to keep dead
> > >> stream
> > >>>>> threads around.
> > >>>>>
> > >>>>>> 2) I think it would help to write down some example user code in
> > >>>>> exception
> > >>>>>> handler e.g. to illustrate how this would be implemented -- e.g.
> we
> > >>>>>> know
> > >>>>>> that practically the handler need to maintain a "this" reference
> of
> > >> the
> > >>>>>> instance anyways in order to shutdown the whole instance or,
> > >>>>> add/terminate
> > >>>>>> threads dynamically, but I want to see if we have listed all
> > possible
> > >>>>> call
> > >>>>>> paths like: a) a thread's handler logic to terminate another
> thread,
> > >>>>>> b) a
> > >>>>>> thread handler to add new threads, etc are all appropriately
> > supported
> > >>>>>> without deadlocks.
> > >>>>>>
> > >>>>>
> > >>>>> I added an example for an uncaught exception handler that adds a
> > stream
> > >>>>> thread to the KIP. Removing a stream thread in an uncaught
> exception
> > >>>>> handler doesn't seem a common use case to me. Nevertheless, we need
> > to
> > >>>>> make sure that we do not run in a deadlock in that case. I will
> > >> consider
> > >>>>> that during the implementation and write tests to check for
> > deadlocks.
> > >>>>>
> > >>>>> Shutting down the Kafka Streams client from inside an uncaught
> > >> exception
> > >>>>> handler is outside the scope of this KIP. In the beginning it was
> > part
> > >>>>> of the KIP, but during the discussion it turned out that we can fix
> > our
> > >>>>> existing close() method to accomplish the shutdown from inside an
> > >>>>> uncaught exception handler. But I completely agree with you that we
> > >> need
> > >>>>> to ensure that we do not run into a deadlock in this case.
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> Guozhang
> > >>>>>>
> > >>>>>>
> > >>>>>> On Wed, Sep 9, 2020 at 11:35 AM Matthias J. Sax <mj...@apache.org
> >
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> I would prefer to not add a new method. It seems unnecessary.
> > >>>>>>> `localThreadMetadata` does return all threads in all states(*)
> and
> > >>>>>>> thus
> > >>>>>>> provides full insight.
> > >>>>>>>
> > >>>>>>> (*) A thread in state DEAD could be returned as long as it's not
> > >>>>>>> removed
> > >>>>>>> yet.
> > >>>>>>>
> > >>>>>>> I don't see any advantage to pre-filter threads and to exclude
> > >> threads
> > >>>>>>> in state CREATE or PENDING_SHUTDOWN. Even if a CREATED thread is
> > not
> > >>>>>>> started yet, it is still "alive" in a broader sense. For example,
> > if
> > >> a
> > >>>>>>> user wants to scale out to 10 thread, and 8 are RUNNING and 2 are
> > in
> > >>>>>>> state CREATED, a user won't need to add 2 more threads -- there
> are
> > >>>>>>> already 10 threads.
> > >>>>>>>
> > >>>>>>> For PENDING_SHUTDOWN and scale in it would be different I guess,
> as
> > >>>>>>> the
> > >>>>>>> proposal would be to filter them out right away. However,
> filtering
> > >>>>>>> them
> > >>>>>>> seems actually not to be "correct", as a thread in
> PENDING_SHUTDOWN
> > >>>>>>> might still process data and it's thus still "alive".
> > >>>>>>>
> > >>>>>>> If there is still a need later to add a new method about "alive
> > >>>>>>> thread"
> > >>>>>>> we can always add as a follow up -- removing things is much
> harder.
> > >>>>>>>
> > >>>>>>> I also don't think that there is value in returning names of dead
> > >>>>>>> threads, as we recycle names.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> -Matthias
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 9/9/20 10:04 AM, Sophie Blee-Goldman wrote:
> > >>>>>>>> I agree that the current behavior of localThreadsMetadata() does
> > not
> > >>>>> seem
> > >>>>>>>> to match, but it seems like we will be forced to change it to
> only
> > >>>>> return
> > >>>>>>>> currently-alive threads. For one thing, we plan to recycle old
> > >> thread
> > >>>>>>> names.
> > >>>>>>>> It would be pretty confusing for a user to get two (or more)
> > >>>>>>> ThreadMetadata
> > >>>>>>>> objects returned with the same name, since AFAICT this is the
> only
> > >>>>>>>> distinguishing identifier of stream threads. I think we should
> > >>>>>>>> enforce
> > >>>>>>> that
> > >>>>>>>> only live threads are returned by localThreadsMetadata(). Plus,
> as
> > >>>>>>> Matthias
> > >>>>>>>> pointed out, we plan to remove dead threads from the
> KafkaStreams
> > >>>>> client,
> > >>>>>>>> so still returning them in the metadata would be extremely odd.
> > >>>>>>>>
> > >>>>>>>> If we think that there might be some use case that requires
> > knowing
> > >>>>> which
> > >>>>>>>> threads have died, we could consider adding a method that
> returns
> > >> the
> > >>>>>>>> names of dead threads. But the only use case I can imagine would
> > >>>>> probably
> > >>>>>>>> be better served by a callback that gets invoked when the thread
> > >>>>>>>> dies,
> > >>>>>>> which
> > >>>>>>>> we already have.
> > >>>>>>>>
> > >>>>>>>> On Tue, Sep 8, 2020 at 11:46 PM Bruno Cadonna <
> br...@confluent.io
> > >
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Matthias and Sophie,
> > >>>>>>>>>
> > >>>>>>>>> I agree that localThreadsMetadata() can be used here. However,
> > >>>>>>>>> localThreadsMetadata() returns all stream threads
> irrespectively
> > of
> > >>>>>>>>> their states. Alive stream threads are specified as being in
> one
> > of
> > >>>>> the
> > >>>>>>>>> following states: RUNNING, STARTING, PARTITIONS_REVOKED, and
> > >>>>>>>>> PARTITIONS_ASSIGNED. Hence, users would need to filter the
> result
> > >> of
> > >>>>>>>>> localThreadsMetadata(). I thought, it would be neat to have a
> > >> method
> > >>>>>>>>> that hides this filtering and returns the number of alive
> stream
> > >>>>>>>>> threads, because that is the most basic information you might
> > >>>>>>>>> need to
> > >>>>>>>>> decide about adding or removing stream threads. For all more
> > >>>>>>>>> advanced
> > >>>>>>>>> use cases users should use localThreadsMetadata(). I am also
> > happy
> > >>>>> with
> > >>>>>>>>> removing the method. WDYT?
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Bruno
> > >>>>>>>>>
> > >>>>>>>>> On 09.09.20 03:51, Matthias J. Sax wrote:
> > >>>>>>>>>> Currently we, don't cleanup dead threads, but the KIP proposes
> > to
> > >>>>>>> change
> > >>>>>>>>>> this:
> > >>>>>>>>>>
> > >>>>>>>>>>> Stream threads that are in state DEAD will be removed from
> the
> > >>>>> stream
> > >>>>>>>>> threads of a Kafka Streams client.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> -Matthias
> > >>>>>>>>>>
> > >>>>>>>>>> On 9/8/20 2:37 PM, Sophie Blee-Goldman wrote:
> > >>>>>>>>>>> Ah, I forgot about localThreadsMetadata(). In that. case I
> > agree,
> > >>>>>>>>> there's
> > >>>>>>>>>>> no reason
> > >>>>>>>>>>> to introduce a new method when we can get both the names and
> > >>>>>>>>>>> number
> > >>>>> of
> > >>>>>>>>> all
> > >>>>>>>>>>> running threads from this.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I assume that we would update localThreadsMetadata to only
> > return
> > >>>>>>>>> currently
> > >>>>>>>>>>> alive threads as part of this KIP -- at a quick glance, it
> > seems
> > >>>>> like
> > >>>>>>> we
> > >>>>>>>>>>> don't do
> > >>>>>>>>>>> any pruning of dead threads at the moment
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Sep 8, 2020 at 1:58 PM Matthias J. Sax <
> > mj...@apache.org
> > >>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I am not sure if we need a new method? There is already
> > >>>>>>>>>>>> `localThreadsMetadata()`. What do we gain by adding a new
> one?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Returning the thread's name (as `Optional<String>`) for both
> > >>>>>>>>>>>> add()
> > >>>>>>> and
> > >>>>>>>>>>>> remove() is fine with me.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 9/8/20 12:58 PM, Sophie Blee-Goldman wrote:
> > >>>>>>>>>>>>> Sorry Bruno, I think I missed the end of your message with
> > the
> > >>>>>>>>>>>>> numberOfAliveStreamThreads()
> > >>>>>>>>>>>>> proposal. I agree, that would be better than the
> > alternatives I
> > >>>>>>>>> listed.
> > >>>>>>>>>>>>> That said:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> They rather suggest that the method returns a list of
> > >>>>>>>>>>>>>> handles to
> > >>>>>>> the
> > >>>>>>>>>>>>> stream threads.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I hadn't thought of that originally, but now that you
> mention
> > >>>>>>>>>>>>> it,
> > >>>>>>> this
> > >>>>>>>>>>>>> might be a good idea.
> > >>>>>>>>>>>>> I don't think we should return actual handles on the
> threads,
> > >>>>>>>>>>>>> but
> > >>>>>>>>> maybe a
> > >>>>>>>>>>>>> list of the thread
> > >>>>>>>>>>>>> names rather than a single number of currently alive
> threads.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Since we seem to think it would be difficult if not
> > >>>>>>>>>>>>> impossible to
> > >>>>>>> keep
> > >>>>>>>>>>>>> track of the number
> > >>>>>>>>>>>>> of running stream threads, we should apply the same
> reasoning
> > >> to
> > >>>>> the
> > >>>>>>>>>>>> names
> > >>>>>>>>>>>>> and not
> > >>>>>>>>>>>>> assume the user can/will keep track of every thread
> returned
> > by
> > >>>>>>>>>>>>> addStreamThread() or
> > >>>>>>>>>>>>> removeStreamThread(). Users should generally take any
> > required
> > >>>>>>> action
> > >>>>>>>>>>>>> immediately
> > >>>>>>>>>>>>> after adding/removing the thread -- eg deregistering the
> > thread
> > >>>>>>>>> metrics
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>>> but it might
> > >>>>>>>>>>>>> still be useful to provide a convenience method listing all
> > >>>>>>>>>>>>> of the
> > >>>>>>>>>>>> current
> > >>>>>>>>>>>>> threads
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> And of course you could still get the number of threads
> > >>>>>>>>>>>>> easily by
> > >>>>>>>>>>>> invoking
> > >>>>>>>>>>>>> size() on the
> > >>>>>>>>>>>>> returned list (or ordered set?).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Tue, Sep 8, 2020 at 12:16 PM Bruno Cadonna
> > >>>>>>>>>>>>> <br...@confluent.io
> > >>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thank you again for the feedback Sophie!
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> As I tried to point out in my previous e-mail, removing a
> > >>>>>>>>>>>>>> stream
> > >>>>>>>>> thread
> > >>>>>>>>>>>>>> from a Kafka Streams client that does not have alive
> stream
> > >>>>> threads
> > >>>>>>>>> is
> > >>>>>>>>>>>>>> nothing exceptional for the client per se. However, it can
> > >>>>>>>>>>>>>> become
> > >>>>>>>>>>>>>> exceptional within the context of the user. For example,
> if
> > >>>>>>>>>>>>>> users
> > >>>>>>>>> want
> > >>>>>>>>>>>>>> to remove a stream thread from a client without alive
> stream
> > >>>>>>> threads
> > >>>>>>>>>>>>>> because one if their metrics say so, then this is
> > >>>>>>>>>>>>>> exceptional in
> > >>>>>>> the
> > >>>>>>>>>>>>>> context of that user metric not in the context of the
> Kafka
> > >>>>> Streams
> > >>>>>>>>>>>>>> client. In that case, users should throw an exception and
> > >>>>>>>>>>>>>> handle
> > >>>>>>> it.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Regarding returning null, I do not like to return null
> > because
> > >>>>>>> from a
> > >>>>>>>>>>>>>> development point of view there is no distinction between
> > >>>>> returning
> > >>>>>>>>> null
> > >>>>>>>>>>>>>> because we have a bug in the code or returning null
> because
> > >>>>>>>>>>>>>> there
> > >>>>>>>>> are no
> > >>>>>>>>>>>>>> alive stream threads. Additionally, Optional<String> makes
> > it
> > >>>>> more
> > >>>>>>>>>>>>>> explicit that the result could also be empty.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thank you for the alternative method names! However, with
> > the
> > >>>>> names
> > >>>>>>>>> you
> > >>>>>>>>>>>>>> propose it is not immediately clear that the method
> returns
> > an
> > >>>>>>>>> amount of
> > >>>>>>>>>>>>>> stream threads. They rather suggest that the method
> returns
> > a
> > >>>>> list
> > >>>>>>> of
> > >>>>>>>>>>>>>> handles to the stream threads. I chose to use
> > >>>>> "aliveStreamThreads"
> > >>>>>>>>> to be
> > >>>>>>>>>>>>>> consistent with the client-level metric
> > "alive-stream-threads"
> > >>>>>>> which
> > >>>>>>>>>>>>>> reports the same number of stream threads that
> > >>>>>>>>>>>>>> numberOfAliveStreamThreads() should report. If others also
> > >>>>>>>>>>>>>> think
> > >>>>>>> that
> > >>>>>>>>>>>>>> the proposed name in the KIP is too clumsy, I am open to
> > >> rename
> > >>>>> it,
> > >>>>>>>>>>>> though.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>> Bruno
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 08.09.20 20:12, Sophie Blee-Goldman wrote:
> > >>>>>>>>>>>>>>>> it's never a good sign when the discussion moves into
> the
> > >>>>>>>>>>>>>>>> vote
> > >>>>>>>>> thread
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hah, sorry, the gmail consolidation of [VOTE] and
> [DISCUSS]
> > >>>>>>> threads
> > >>>>>>>>>>>>>> strikes
> > >>>>>>>>>>>>>>> again.
> > >>>>>>>>>>>>>>> Thanks for redirecting me Bruno
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I suppose it's unfair to expect the callers to keep
> perfect
> > >>>>> track
> > >>>>>>> of
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> current
> > >>>>>>>>>>>>>>>       number of stream threads, but it also seems like
> you
> > >>>>> shouldn't
> > >>>>>>> be
> > >>>>>>>>>>>>>> calling
> > >>>>>>>>>>>>>>> removeStreamThread() when there are no threads left.
> Either
> > >>>>> you're
> > >>>>>>>>> just
> > >>>>>>>>>>>>>>> haphazardly removing threads and could unintentionally
> slip
> > >>>>> into a
> > >>>>>>>>>>>> state
> > >>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> no
> > >>>>>>>>>>>>>>> running threads without realizing it, or more
> > realistically,
> > >>>>>>> you're
> > >>>>>>>>>>>>>>> carefully
> > >>>>>>>>>>>>>>> removing threads based on some metric(s) that convey
> > >>>>>>>>>>>>>>> whether the
> > >>>>>>>>> system
> > >>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>> over or under-provisioned. If your metrics say you're
> > >>>>>>>>> over-provisioned
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>> there's
> > >>>>>>>>>>>>>>> not one thread running, well, that certainly sounds
> > >>>>>>>>>>>>>>> exceptional
> > >>>>> to
> > >>>>>>>>> me.
> > >>>>>>>>>>>> Or
> > >>>>>>>>>>>>>>> you might
> > >>>>>>>>>>>>>>> be right in that the cluster is over-provisioned but have
> > >> just
> > >>>>>>> been
> > >>>>>>>>>>>>>>> directing the
> > >>>>>>>>>>>>>>> removeStreamThread() and addStreamThread() calls to
> > >>>>>>>>>>>>>>> instances at
> > >>>>>>>>>>>> random,
> > >>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>> end up with one massive instance and one with no threads
> at
> > >>>>>>>>>>>>>>> all.
> > >>>>>>>>> Again,
> > >>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>> probably merits some human intervention (or system
> > redesign)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> That said, I don't think there's any real harm to just
> > >>>>>>>>>>>>>>> returning
> > >>>>>>>>> null
> > >>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> this case, but I hope
> > >>>>>>>>>>>>>>> that users would pay attention to this since it seems
> > >>>>>>>>>>>>>>> likely to
> > >>>>>>>>>>>> indicate
> > >>>>>>>>>>>>>>> something has gone
> > >>>>>>>>>>>>>>> seriously wrong. I suppose Optional<String> would be a
> > >>>>> reasonable
> > >>>>>>>>>>>>>>> compromise.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> As for the method name, what about activeStreamThreads()
> or
> > >>>>>>>>>>>>>>> liveStreamThreads() ?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 1:45 AM Bruno Cadonna <
> > >>>>> br...@confluent.io>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi John,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I agree with you except for checking null. I would
> rather
> > >>>>> prefer
> > >>>>>>> to
> > >>>>>>>>>>>> use
> > >>>>>>>>>>>>>>>> Optional<String> as the return type to both methods.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I changed the subject from [VOTE] to [DISCUSS] so that
> we
> > >> can
> > >>>>>>>>> follow
> > >>>>>>>>>>>> up
> > >>>>>>>>>>>>>>>> in the discussion thread.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>> Bruno
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 04.09.20 23:12, John Roesler wrote:
> > >>>>>>>>>>>>>>>>> Hi Sophie,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Uh, oh, it's never a good sign when the discussion
> moves
> > >>>>>>>>>>>>>>>>> into the vote thread :)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I agree with you, it seems like a good touch for
> > >>>>>>>>>>>>>>>>> removeStreamThread() to return the name of the thread
> > that
> > >>>>>>>>>>>>>>>>> got removed, rather than a boolean flag. Maybe the
> return
> > >>>>>>>>>>>>>>>>> value would be `null` if there is no thread to remove.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> If we go that way, I'd suggest that addStreamThread()
> > also
> > >>>>>>>>>>>>>>>>> return the name of the newly created thread, or null if
> > no
> > >>>>>>>>>>>>>>>>> thread can be created right now.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I'm not completely sure if I think that callers of this
> > >>>>>>>>>>>>>>>>> method would know exactly how many threads there are.
> > Sure,
> > >>>>>>>>>>>>>>>>> if a human being is sitting there looking at the
> metrics
> > or
> > >>>>>>>>>>>>>>>>> logs and decides to call the method, it would work out,
> > but
> > >>>>>>>>>>>>>>>>> I'd expect this kind of method to find its way into
> > >>>>>>>>>>>>>>>>> automated tooling that reacts to things like current
> > system
> > >>>>>>>>>>>>>>>>> load or resource saturation. Those kinds of toolchains
> > >> often
> > >>>>>>>>>>>>>>>>> are part of a distributed system, and it's probably not
> > >> that
> > >>>>>>>>>>>>>>>>> easy to guarantee that the thread count they observe is
> > >>>>>>>>>>>>>>>>> fully consistent with the number of threads that are
> > >>>>>>>>>>>>>>>>> actually running. Therefore, an in-situ `int
> > >>>>>>>>>>>>>>>>> numStreamThreads()` method might not be a bad idea.
> Then
> > >>>>>>>>>>>>>>>>> again, it seems sort of optional. A caller can catch an
> > >>>>>>>>>>>>>>>>> exception or react to a `null` return value just the
> same
> > >>>>>>>>>>>>>>>>> either way. Having both add/remove methods behave
> > similarly
> > >>>>>>>>>>>>>>>>> is probably more valuable.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>> -John
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Thu, 2020-09-03 at 12:15 -0700, Sophie Blee-Goldman
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>> Hey, sorry for the late reply, I just have one minor
> > >>>>>>> suggestion.
> > >>>>>>>>>>>> Since
> > >>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>>>>> make any guarantees about which thread gets removed or
> > >>>>>>>>>>>>>>>>>> allow
> > >>>>>>> the
> > >>>>>>>>>>>> user
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> specify, I think we should return either the index or
> > full
> > >>>>> name
> > >>>>>>>>> of
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> thread
> > >>>>>>>>>>>>>>>>>> that does get removed by removeThread().
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I know you just updated the KIP to return true/false
> if
> > >>>>>>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>> are/aren't any
> > >>>>>>>>>>>>>>>>>> threads to be removed, but I think this would be more
> > >>>>>>>>> appropriate as
> > >>>>>>>>>>>>>> an
> > >>>>>>>>>>>>>>>>>> exception than as a return type. I think it's
> reasonable
> > >> to
> > >>>>>>>>> expect
> > >>>>>>>>>>>>>>>> users to
> > >>>>>>>>>>>>>>>>>> have some sense to how many threads are remaining, and
> > not
> > >>>>> try
> > >>>>>>> to
> > >>>>>>>>>>>>>> remove
> > >>>>>>>>>>>>>>>>>> a thread when there is none left. To me, that
> indicates
> > >>>>>>> something
> > >>>>>>>>>>>>>> wrong
> > >>>>>>>>>>>>>>>>>> with the user application code and should be treated
> as
> > an
> > >>>>>>>>>>>> exceptional
> > >>>>>>>>>>>>>>>> case.
> > >>>>>>>>>>>>>>>>>> I don't think the same code clarify argument applies
> > >>>>>>>>>>>>>>>>>> here as
> > >>>>> to
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> addStreamThread() case, as there's no reason for an
> > >>>>> application
> > >>>>>>>>> to
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>> looping and retrying removeStreamThread()  since if
> that
> > >>>>> fails,
> > >>>>>>>>> it's
> > >>>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>>> there are no threads left and thus it will continue to
> > >>>>>>>>>>>>>>>>>> always
> > >>>>>>>>> fail.
> > >>>>>>>>>>>>>> And
> > >>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> user actually wants to shut down all threads, they
> > should
> > >>>>> just
> > >>>>>>>>> close
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> whole application rather than call
> removeStreamThread()
> > >>>>>>>>>>>>>>>>>> in a
> > >>>>>>>>> loop.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> While I generally think it should be straightforward
> for
> > >>>>> users
> > >>>>>>> to
> > >>>>>>>>>>>>>> track
> > >>>>>>>>>>>>>>>> how
> > >>>>>>>>>>>>>>>>>> many stream threads they have running, maybe it would
> be
> > >>>>>>>>>>>>>>>>>> nice
> > >>>>>>> to
> > >>>>>>>>> add
> > >>>>>>>>>>>>>>>>>> a small utility method that does this for them.
> > Something
> > >>>>> like
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> // Returns the number of currently alive threads
> > >>>>>>>>>>>>>>>>>> boolean runningStreamThreads();
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Thu, Sep 3, 2020 at 7:41 AM Matthias J. Sax <
> > >>>>>>> mj...@apache.org
> > >>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> +1 (binding)
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On 9/3/20 6:16 AM, Bruno Cadonna wrote:
> > >>>>>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I would like to start the voting on KIP-663 that
> > >> proposes
> > >>>>> to
> > >>>>>>>>> add
> > >>>>>>>>>>>>>>>> methods
> > >>>>>>>>>>>>>>>>>>>> to the Kafka Streams client to add and remove stream
> > >>>>> threads
> > >>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>> execution.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads
> > >>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>>> Bruno
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>
> > >
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

Reply via email to