David Ribeiro Alves has posted comments on this change. Change subject: WIP: KUDU-1506 Add consensus lag metrics ......................................................................
Patch Set 6: We could probably do the time thing by tracking assigned timestamps in addition to indexes. The problem is that it's largely arbitrary. A replica might be lagging by a little time and have thousands of ops in the queue, or be lagging by a large chunk of time but have a relatively small amount of ops in the queue. I think the problem we set out to solve here is to give users some insight into whether a replica is lagging, and this cannot be conveyed by a single number, its something that needs to be tracked over time to have meaning. That is a user won't care about or even understand that a replica is lagging by 1000 ops, or that it's lagging by 5 minutes (where 5 mins is the timestamp diff between the last appended op on the leader and the last received op by the replica). It cares whether this number goes down over time (replica is catching up) or whether it goes up over time (replica won't ever catch up). To this point how accurate we are in defining this number is largely irrelevant as long as we do it consistently. -- To view, visit http://gerrit.cloudera.org:8080/6451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida8e992cc2397ca8d5873e62961a65f618d52c36 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: No