Hi, Jose, Thanks for the explanation. Other than depending on KIP-1022 to be approved, the KIP looks good to me now.
Jun On Thu, Mar 28, 2024 at 2:56 PM José Armando García Sancio <jsan...@confluent.io.invalid> wrote: > Hi Jun, > > See my comments below. > > On Thu, Mar 28, 2024 at 11:09 AM Jun Rao <j...@confluent.io.invalid> wrote: > > If I am adding a new voter and it takes a long time (because the new > voter > > is catching up), I'd want to know if the request is indeed being > processed. > > I thought that's the usage of uncommitted-voter-change. > > They can get related information by using the 'kafka-metadata describe > --replication" command (or the log-end-offset metric from KIP-595). > That command (and metric) displays the LEO of all of the replicas > (voters and observers), according to the leader. They can use that > output to discover if the observer they are trying to add is lagging > or is not replicating at all. > > When the user runs the command above, they don't know the exact offset > that the new controller needs to reach but they can do some rough > estimation of how far behind it is. What do you think? Is this good > enough? > > > Also, I am still not sure about having multiple brokers reporting the > same > > metric. For example, if they don't report the same value (e.g. because > one > > broker is catching up), how does a user know which value is correct? > > They are all correct according to the local view. Here are two > examples of monitors that the user can write: > > 1. Is there a voter that I need to remove from the quorum? They can > create a monitor that fires, if the number-of-offline-voters metric > has been greater than 0 for the past hour. > 2. Is there a cluster that doesn't have 3 voters? They can create a > monitor that fires, if any replica doesn't report three for > number-of-voters for the past hour. > > Is there a specific metric that you have in mind that should only be > reported by the KRaft leader? > > Thanks, > -- > -José >