Hi Wlaker (pun intended),

Thank you for your feedback! Please find my answers inline.

Best,
Bruno

On 26.08.20 18:46, Walker Carlson wrote:
Hello Burno,

Thanks for the KIP!

Not to pile on, but I also had a couple additional questions. I am not
super familiar with the StreamThread internals so please forgive any
misconceptions if these are not relevant questions.

1. In requestClose if a thread does not close properly and deadlocks for
some reason how will we avoid the client waiting on the thread to close?
like when you try to close the Kafka Streams client from a thread
UncaughtExcetiopnHandler now.

The kip said it would improve the handling of these conditions, However I
did not find it clear what strategy this improvement  would use. Maybe
handling broken threads is out of the scope of this KIP or am I missing
something?


The main goal of requestClose() is to avoid the deadlock when closing the Kafka Streams client from the uncaught exception handler of a stream thread. The case in which the close may encounter broken stream threads is orthogonal to this KIP, IMO. But as stated in my previous e-mail, I will rethink the necessity of a method requestClose().

2a. Will the removal of Stream Threads in state DEAD be automatic? And will
it be for all in that state or just for those closed with
shutDownStreamThread?


Yes, it would be automatic and it would be for all. I will some words about that in the KIP.


2b. From the wording it seems that removing DEAD threads form the Kafka
Streams client will be a new feature of this kip. If that is the case is
the reasonable possibility that keeping the dead threads in metadata might
be useful? For example if a thread is continually erroring and restarting a
replacement


Yes, that is a good question. I do not know how reasonable it is to keep DEAD stream threads around. However, what we could do is introducing a metric in the KIP to keep track of the stream threads that have died. IMO, such a metric is better than keeping around DEAD stream thread. I will add the metric to the KIP.


3. Maybe instead of addThread() we could use startNewThread()?
I agree with John that startStreamThread could easily be misinterpreted
i.e. as startStreamThreads.


I will change the names in the KIP.

Thanks,
Walker

On Wed, Aug 26, 2020 at 8:48 AM John Roesler <vvcep...@apache.org> wrote:

Hi Bruno,

Thanks for the well motivated and throrough KIP!

It's a good point that the record cache should be re-
distributed over the threads.

Reading your KIP leads me to a few questions:

1. Start vs. Add

Maybe this is paranoid, but I'm mildly concerned that users
who don't read the docs too carefully might think they need
to call "start thread" to start their application's threads
after calling `KafkaStreams.start` to start the app.
Definitely not sure about this, but I'm wondering if it
would be more clear to say `addThread` and
`remove/dropThread` to make it clear that we are adding to
or subtracting from the total number of threads, not just
starting and stopping threads that are already in the pool.

2. requestClose() vs. close(Duration.ZERO)

It's a very good point about deadlocks. Can you explain why
we need a new method, though? The specified behavior seems
the same as `close(Duration.ZERO)`.

3. Thread Naming

Maybe this point is silly, but how will newly added threads
be numbered? Will dead and hence removed threads' names be
reused? Or will there be a monotonic counter for the
lifetime of the instance?

It seems relevant to mention this because it will affect
metrics and logs. I guess _maybe_ it would be nice for the
thread that replaces a crashed thread to take over its name,
but since the crashed thread still exists while the
UncaughtExceptionHandler is executing, its name wouldn't be
up for grabs in any case yet.

On the other hand, it might be nicer for operators to be
able to distinguish the logs/metrics of the replacement
thread from the dead one, so not reusing thread names might
be better.

On the other hand, not reusing thread names in a
"replacement" exception handler means that a crashy
application would report an unbounded number of thread ids
over its lifespan. This might be an issue for people using
popular metrics aggregation services that charge per unique
combination of metrics tags. Then again, maybe this is a
pathological case not worth considering.

And yes, I realized I just implied that I have three hands.

4. ERROR State

Can you elaborate why users explicitly stopping all threads
should put the application into ERROR state? It does seem
like it's not exactly "running" at that point, but it also
doesn't seem like an error.

Right now, ERROR is a terminal state that indicates users
must discard the application instance and create a new one.
If there is still a possiblity that we'd need a terminally
corrupted state, it would probably be a mistake to add an
out-transition from it.

The documentation on that state says that it happens when
all StreamThreads die or when the Global thread dies. We
have a proposal from Navinder (KIP-406) to allow the Global
thread to automatically come back to life after some errors,
but presumably others would still be fatal.

I guess your reasoning is that if the cause of the ERROR
state happens to be just from all the StreamThreads dying,
and if there's a way to replace them, then it should be
possible to recover from this ERROR state.

Maybe that makes sense, and we should just note that if the
Global thread is dead, it won't be possible to transition
out of ERROR state.

5. Global thread

Should it be in the scope of this KIP to consider replacing
the GlobalStreamThread?

6. Restarting

It seems like, if I configure Streams to have eg. 1 thread
and then add more threads over its lifetime (maybe up to 8),
then I might be dismayed after a restart of the app to see
it went back to 1.

I guess it could just be my job as a user to update the
config to match when I add or remove threads. Maybe that's
the best place to land right now. It still might be a good
point to mention in the KIP (and the docs).


Thanks again!
-John


On Wed, 2020-08-26 at 16:31 +0200, Bruno Cadonna wrote:
Hi,

I would like to propose the following KIP to start and shut down stream
threads during execution as well as to shut down asynchronously a Kafka
Streams client from an uncaught exception handler.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads+and+to+Request+Closing+of+Kafka+Streams+Clients


Best,
Bruno



Reply via email to