[
https://issues.apache.org/jira/browse/KAFKA-18066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900609#comment-17900609
]
Peter Lee commented on KAFKA-18066:
-----------------------------------
Hi [~ableegoldman]
I’m working on this and would like to hear your thoughts.:D
Currently, if we move the *creation* logic directly into the {{StreamThread}}
constructor, it becomes harder to refactor tests that use mocks. For reference,
see the [current test
cases|https://github.com/peterxcli/kafka/blob/2519e4af0c19d2540093c283f14dfe4111a5a21e/streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java#L1391-L1461].
To address this, I’m considering splitting the initialization process as
follows:
# Keep the {{StreamThread}} constructor focused on mandatory, static
dependencies:
{code:java}
final StreamThread streamThread = new StreamThread(
time,
config,
adminClient,
streamsMetrics,
topologyMetadata,
threadId,
logContext,
referenceContainer.assignmentErrorCode,
referenceContainer.nextScheduledRebalanceMs,
referenceContainer.nonFatalExceptionsToHandle,
shutdownErrorHook,
streamsUncaughtExceptionHandler,
cache::resize
); {code}
# Add an {{initializeComponents}} method for setting up additional components:
{code:java}
streamThread.initializeComponents(
mainConsumer,
restoreConsumer,
changelogReader,
originalReset,
taskManager,
stateUpdater
);{code}
However, this approach requires removing the {{final}} modifier from the
properties set in {{{}initializeComponents{}}}. While it simplifies testing
with mocks, it might introduce potential mutability concerns.
I’d appreciate any suggestions or insights! Thanks!
> Misleading/mismatched StreamThread id in logging
> ------------------------------------------------
>
> Key: KAFKA-18066
> URL: https://issues.apache.org/jira/browse/KAFKA-18066
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Reporter: A. Sophie Blee-Goldman
> Assignee: Peter Lee
> Priority: Minor
> Labels: newbie, newbie++
>
> While debugging a test application I was confused to see a number of log
> lines where the StreamThread name appeared twice but had a different thread
> id/index in the same message. For example:
> {code:java}
> [INFO ] 2024-11-19 04:59:14.541
> [e2e-963c5b74-0353-4253-bdf2-b71881d9d9f2-StreamThread-1] StreamThread -
> stream-thread [e2e-963c5b74-0353-4253-bdf2-b71881d9d9f2-StreamThread-3]
> Creating thread producer client{code}
> Generally you would expect that the actual Logger prefix (the first thread
> name, in this case StreamThread-1) is the same as the LogContext prefix (the
> second thread name, ie the StreamThread-3 in this example). I dug into it and
> figured out that this happens for all of the messages logged during the
> StreamThread#create method, ie before the new thread is actually created.
> What happened was StreamThread-1 had actually died, and started up a new
> thread (StreamThread-3) to replace itself before shutting down. So we were
> logging things _about_ StreamThread-3, but _from_ StreamThread-1.
> While this doesn't necessarily harm anyone, it's quite confusing to see and
> requires extensive knowledge of Streams to understand (a) that it's not a
> bug, and (b) which thread the messages are actually referring to. It also
> makes things harder to parse and read – for example I often filter logs on
> the Logger prefix to gather everything related to a particular thread and eg
> the clients it owns. The name of the currently executing thread is more
> reliable and gathers everything whereas not every logger is configured with
> the LogContext prefix (eg `stream-thread
> [e2e-963c5b74-0353-4253-bdf2-b71881d9d9f2-StreamThread-3]`).
> We should move things out of the static StreamThread#create method and into
> the thread constructor to make the logging consistent and reliable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)