Re: [PR] Log slow controller events [kafka]
github-actions[bot] commented on PR #15622: URL: https://github.com/apache/kafka/pull/15622#issuecomment-2297941376 This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log slow controller events [kafka]
soarez commented on PR #15622: URL: https://github.com/apache/kafka/pull/15622#issuecomment-2026356492 > events have relatively unique names (we include offset in some) so the cardinality is quite high I see. What I meant would result in too many more metrics. Makes sense then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log slow controller events [kafka]
mumrah commented on code in PR #15622: URL: https://github.com/apache/kafka/pull/15622#discussion_r1543844228 ## metadata/src/main/java/org/apache/kafka/controller/metrics/SlowEventsLogger.java: ## @@ -0,0 +1,58 @@ +package org.apache.kafka.controller.metrics; + +import org.apache.kafka.common.utils.Time; +import org.apache.kafka.common.utils.Timer; +import org.slf4j.Logger; + +import java.util.function.Supplier; + +import static java.util.concurrent.TimeUnit.MICROSECONDS; +import static java.util.concurrent.TimeUnit.MILLISECONDS; +import static java.util.concurrent.TimeUnit.NANOSECONDS; + +public class SlowEventsLogger { +/** + * Don't report any p99 events below this threshold. This prevents the controller from reporting p99 event + * times in the idle case. + */ +private static final int MIN_SLOW_EVENT_TIME_MS = 100; + +/** + * Calculating the p99 from the histogram consumes some resources, so only update it every so often. + */ +private static final int P99_REFRESH_INTERVAL_MS = 3; + +private final Supplier p99Supplier; +private final Logger log; +private final Time time; +private double p99; +private Timer percentileUpdateTimer; + +public SlowEventsLogger( +Supplier p99Supplier, +Logger log, +Time time +) { +this.p99Supplier = p99Supplier; +this.log = log; +this.time = time; +this.percentileUpdateTimer = time.timer(P99_REFRESH_INTERVAL_MS); +} + +public void maybeLog(String name, long durationNs) { +long durationMs = MILLISECONDS.convert(durationNs, NANOSECONDS); +if (durationMs > MIN_SLOW_EVENT_TIME_MS && durationMs > p99) { Review Comment: Yea, i need to initialize it in the constructor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log slow controller events [kafka]
mumrah commented on PR #15622: URL: https://github.com/apache/kafka/pull/15622#issuecomment-2026321417 @soarez wdym by tagging here? Note that the events have relatively unique names (we include offset in some) so the cardinality is quite high. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log slow controller events [kafka]
soarez commented on code in PR #15622: URL: https://github.com/apache/kafka/pull/15622#discussion_r1543829135 ## metadata/src/main/java/org/apache/kafka/controller/metrics/SlowEventsLogger.java: ## @@ -0,0 +1,58 @@ +package org.apache.kafka.controller.metrics; + +import org.apache.kafka.common.utils.Time; +import org.apache.kafka.common.utils.Timer; +import org.slf4j.Logger; + +import java.util.function.Supplier; + +import static java.util.concurrent.TimeUnit.MICROSECONDS; +import static java.util.concurrent.TimeUnit.MILLISECONDS; +import static java.util.concurrent.TimeUnit.NANOSECONDS; + +public class SlowEventsLogger { +/** + * Don't report any p99 events below this threshold. This prevents the controller from reporting p99 event + * times in the idle case. + */ +private static final int MIN_SLOW_EVENT_TIME_MS = 100; + +/** + * Calculating the p99 from the histogram consumes some resources, so only update it every so often. + */ +private static final int P99_REFRESH_INTERVAL_MS = 3; + +private final Supplier p99Supplier; +private final Logger log; +private final Time time; +private double p99; +private Timer percentileUpdateTimer; + +public SlowEventsLogger( +Supplier p99Supplier, +Logger log, +Time time +) { +this.p99Supplier = p99Supplier; +this.log = log; +this.time = time; +this.percentileUpdateTimer = time.timer(P99_REFRESH_INTERVAL_MS); +} + +public void maybeLog(String name, long durationNs) { +long durationMs = MILLISECONDS.convert(durationNs, NANOSECONDS); +if (durationMs > MIN_SLOW_EVENT_TIME_MS && durationMs > p99) { Review Comment: It seems possible that `p99 = p99Supplier.get();` may not have run yet at this point. Should `p99` have an explicit default value? ## metadata/src/main/java/org/apache/kafka/controller/metrics/QuorumControllerMetrics.java: ## @@ -28,6 +28,8 @@ import java.util.Optional; import java.util.concurrent.atomic.AtomicLong; import java.util.function.Consumer; +import java.util.function.Function; +import java.util.function.Supplier; Review Comment: Is `Supplier` needed here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Log slow controller events [kafka]
mumrah opened a new pull request, #15622: URL: https://github.com/apache/kafka/pull/15622 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org