Re: [DISCUSS] KIP-449: Add connector contexts to Connect worker logs

Konstantine Karantasis Mon, 15 Apr 2019 12:03:13 -0700

Thanks for the quick updates on the KIP Randall!

3. Indeed. I thought I remembered some ambiguity in the numbering of tasks,
but looking closer I think we are fine, especially now that you mention
explicitly 0-based indexing for task ids.
4. Fine with the example as is.
5. Sounds good.
6. "client.id" is set by default. It's just that it's not unique across the
connect cluster by default. I think it's ok to use it as a default value
for "worker.id" in MDC. Regarding placement, I'd consider adding it
wherever there's the least redundancy in logging and also make it
optional/configurable based on the logger's conversion pattern.
Up to you if you think there's room for inclusion in this KIP.


Konstantine



On Fri, Apr 12, 2019 at 3:24 PM Randall Hauch <[email protected]> wrote:

> Thanks for the review and feedback, Konstantine.
>
> 1. Great suggestion. I've updated the KIP to hopefully make it more clear
> that the uncommented line is unchanged from the existing Log4J
> configuration file.
>
> 2. Regarding including a `-` before the task number is acceptable if it
> makes it easier to, read and filter. I've updated the KIP and PR to
> incorporate your suggestion.
>
> 3. Task numbers do start at 0, as seen by the DistributedHerder code that
> creates the tasks (
>
> https://github.com/apache/kafka/blob/02221bd907a23041c95ce6446986bff631652b3a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L608-L611
> ).
> I've updated the KIP to highlight that task numbers are 0-based. As you can
> see in the sample log included in the KIP, "task-0" corresponds to the
> first task, not to the connector. In fact, the following are examples of
> the log messages with the "worker" scope where the Connector implementation
> is called:
>
> [2019-04-02 17:01:38,315] INFO [local-file-source|worker] Creating
> connector local-file-source of type FileStreamSource
> (org.apache.kafka.connect.runtime.Worker:227)
> [2019-04-02 17:01:38,317] INFO [local-file-source|worker] Instantiated
> connector local-file-source with version 2.3.0-SNAPSHOT of type class
> org.apache.kafka.connect.file.FileStreamSourceConnector
> (org.apache.kafka.connect.runtime.Worker:230)
> ...
>
> [2019-04-02 17:11:46,408] INFO [local-file-sink|worker] Stopping connector
> local-file-sink (org.apache.kafka.connect.runtime.Worker:334)
> [2019-04-02 17:11:46,408] INFO [local-file-sink|worker] Stopped connector
> local-file-sink (org.apache.kafka.connect.runtime.Worker:350)
>
>
> The FileStreamSourceConnector class does not actually log any messages
> itself, but if it did  those would be with the "[local-file-source|worker]"
> context and would include the name of the class and line number in
> parentheses (instead of e.g.,
> "org.apache.kafka.connect.runtime.Worker:230").
>
> 4. I thought about doing a more complex example, but IMO the extra log
> lines made the *sample* in the KIP quite a bit longer harder to understand.
> I thought it important to keep the KIP a bit more readable while showing
> which scope appear for the different log messages. A second task would
> essentially have the nearly the same log messages, just with a task number
> in the scope.
>
> 5. The PR already changes the `runtime/src/test/resources/log4j.properties`
> to include the connector context MDC parameter. Because that's not a public
> API, I've not mentioned it in the KIP.
>
> 6. I guess I understand your goal - it's not always clear which worker is
> being referenced. However, I'm not sure whether it's that valuable to
> include the "client.id" (if set) just in the "worker" scope. That means
> that it's maybe more useful to introduce a second MDC parameter (e.g., "%{
> worker.id}") and optionally include that on all log messages. We'd have to
> set the MDC context in the code in each thread, which isn't too much
> effort. The other advantage of this approach is that it doesn't have to be
> configurable: you can control it via your own logging configuration file
> (rather than optionally including it in the "worker" scope on some of the
> log messages). Thoughts? What would the "%{worker.id}" MDC value be set to
> if "client.id" is not set?
>
> Final notes: Removed the placeholders, and corrected the typos and grammar.
>
> Thanks again for the detailed review!
>
> On Fri, Apr 12, 2019 at 2:05 PM Konstantine Karantasis <
> [email protected]> wrote:
>
> > Thanks for the KIP Randall.
> >
> > It might not be obvious right away, but this is a great improvement when
> > running Connect with multiple connectors or when debugging Connect, for
> > instance in integration tests or system tests. KIP looks good to me
> > overall, I have just a few comments below:
> >
> > 1. In the snippet of the config, by including an uncommented line, maybe
> > it's not immediately clear that this line is an existing line in
> > connect-log4j.properties and not an addition. Should this be mentioned
> in a
> > separate code block or in a different way?
> >
> > 2. Currently when adding the taskId to a connector or connector task, we
> > precede it with a dash (-). I feel this addition would make it easier to
> > parse the taskId visually as well as with parsing tools (for example I
> can
> > use `-` as a delimiter easier than 'k' or even a multi-character
> delimiter
> > such as 'task'). Do you think it's an overhead?
> >
> > 3. I think a specific mention to the convention used on the base-index
> for
> > taskIds would be useful. For example, taskId equal to zero in most cases
> > represents the (source/sink) connector, while taskId > 0 corresponds to
> > (source/sink) tasks. By reading the KIP, I assume that task0 (or task-0
> if
> > you agree with my previous comment) will be the connector. Is this the
> > case?
> >
> > 4. Should the example include a snippet for taskId > 0?
> >
> > 5. Do you intend to enable connector MDC in tests? In this case how would
> > you distinguish between multiple workers in integrations tests? Do you
> > intend to address this here, in the PR corresponding to this KIP or
> follow
> > up later?
> >
> > 6. Related to my previous comment (5). The property 'client.id' is
> already
> > used to identify a Connect worker. Have you thought of offering the
> ability
> > to identify the worker MDC if this property is set, keeping at the same
> > time with MDC the ability to not include it as identifier?
> >
> > Finally, I'd suggest removing text placeholders. I think it will be
> easier
> > to follow the KIP. Currently I see placeholder text below Contents and in
> > Rejected alternatives.
> >
> > And my least favorite section, but I can't help it :)
> > Typos/grammar:
> > to understand what (is) happening within the worker,
> > Kafka Connect use(s) ...
> > quotes maybe not needed, text already verbatim in
> > `config/connect-log4j.properties`
> >
> > Best,
> > Konstantine
> >
> > On Tue, Apr 2, 2019 at 4:20 PM Randall Hauch <[email protected]> wrote:
> >
> > > I've been working on https://github.com/apache/kafka/pull/5743 for a
> > > while,
> > > but there were a number of comment, suggestions, and mild concerns on
> the
> > > PR. One of those comments was that maybe changing the Connect log
> content
> > > in this way probably warrants a KIP. So here it is:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs
> > >
> > > I've also updated my PR reflect the KIP. Please reply with comments
> > and/or
> > > feedback.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> >
>

Re: [DISCUSS] KIP-449: Add connector contexts to Connect worker logs

Reply via email to