Re: [PR] Increasing timeout for wait_for_loggers to 30s [kafka]

via GitHub Tue, 04 Jun 2024 10:52:07 -0700


C0urante commented on PR #15915:
URL: https://github.com/apache/kafka/pull/15915#issuecomment-2148092416

Even under heavy load, ten seconds is an extremely long time for loggers to
be set. I think something else may be wrong.

It looks like the startup check for distributed workers could be
insufficient. By default, we wait to see if a worker's REST API is initialized,
which is done by querying the `/connectors` endpoint (see
[here](https://github.com/apache/kafka/blob/55d38efcc5505a5a1bddb08ba05f4d923f8050f9/tests/kafkatest/services/connect.py#L115)).
However, as was noted in https://github.com/apache/kafka/pull/15249, that
check barely does anything aside from ensure that a worker has a valid config
and has initialized its REST server. Is it possible that the failures you've
seen were caused because workers in the cluster were still starting by the time
we issued the logging level adjustment request and waited for it to take effect?

If so, we can try first to change the startup mode for this test from
`STARTUP_MODE_LISTEN` (the default) to `STARTUP_MODE_JOIN`, which should give
stronger guarantees about worker readiness. And as a follow-up, there's
[KIP-1017](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1017%3A+Health+check+endpoint+for+Kafka+Connect),
which can be used in situations like this to avoid having to use hacks like
parsing log files or checking for nonexistent connectors to determine a
worker's health and readiness.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Increasing timeout for wait_for_loggers to 30s [kafka]

Reply via email to