aho135 commented on PR #18598: URL: https://github.com/apache/druid/pull/18598#issuecomment-3374783855
> Thanks for the fix, @aho135 ! I have left some minor suggestions. > > Could you share some screenshots where we can see stale metrics being reported? Thanks for the review @kfaraz! This is the ingest/kafka/partitionlag metric being emitted by 2 Coordinators. The active one is emitting the proper metric (0) but the previous Coordinator is emitting a stale metric that doesn't get reset until we manually restart it. The scenario we run into is that if a leader change occurs while there is lag on a topic then the old Coordinator continues to emit that stale lag metric. We have some alerting set up for lag, so the stale value ends up triggering false alarms. <img width="1645" height="422" alt="Screenshot 2025-10-06 at 1 59 05 PM" src="https://github.com/user-attachments/assets/25ed5bf2-0986-478c-880d-8075c0978739" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
