Andrew Wong has posted comments on this change.

Change subject: KUDU-1506 Add consensus lag metrics
......................................................................


Patch Set 3:

(5 comments)

> 1. I am having trouble deciding whether these metrics are meant to
 > have meaning on nodes other than the leader. I think yes.
Yeah, so the leader will determine the max replicated index that it knows about 
from all the peers it's tracking and send that index out to its followers via 
the ConsensusRequestPB. To a follower, this can indicate how far behind in 
terms of number of ops it is. The follower should see this when it 
UpdateReplica()s

 > 2. I am not sure what this lag metric means exactly. It seems like
 > it means means highest index that the leader knows about or highest
 > index the current node has ever seen.
It's the highest index of an op committed by a peer that's being tracked by the 
leader.

 > Is there any way we can get this metric to be time-based instead of
 > op-based? op-based has the downside that it's dependent on the rate
 > of incoming operations?
Have been thinking about this as well, since using op-id was more a means to 
get familiar with consensus. Moving forward, I think so. If the follower can 
track the time it receives a replicate request, it just needs to update the 
metric when it finishes applying.

http://gerrit.cloudera.org:8080/#/c/6451/3/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

PS3, Line 358: any peer in the config
> Just looking at the .proto file, it's hard for me to understand what this m
Yep! I think that's an apt description. Also important to note that the request 
gets sent to the followers, allowing them to track their own op lag.


http://gerrit.cloudera.org:8080/#/c/6451/3/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS3, Line 106: max
> How about calling this 'latest' instead of 'max'? Here and elsewhere. When 
This has been changed to "num_ops_behind_max_replicated


PS3, Line 157: queue_state_.max_replicated_index = committed_index
> Hmm, can't we get a more accurate / tighter bound than this?
Hmm, good point, think we can make it max(max_replicated_index, committed_index)


http://gerrit.cloudera.org:8080/#/c/6451/3/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS3, Line 338: max_replicated_index
> Shouldn't this be bounded by queue_state_.last_appended? Is not, is that a 
That's the case if this is the leader, but consensus_queue still functions when 
not in LEADER mode. If another follower has a higher index, 
max_replicated_index could be higher than this follower's last_appended.


http://gerrit.cloudera.org:8080/#/c/6451/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS3, Line 1047: queue_->UpdateLagMetrics(request->max_replicated_index());
> The leader's validity is only checked in LOC 1145, so this would have to be
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida8e992cc2397ca8d5873e62961a65f618d52c36
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to