Will Berkeley has posted comments on this change.

Change subject: Add tablet state summary metrics and fix KUDU-2044
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7618/2//COMMIT_MSG
Commit Message:

PS2, Line 11: The numbers are computed by the heartbeater. There's two reasons
            : for this:
            : 1. the heartbeater was already computing the number of RUNNING
            : and BOOTSTRAPPING tablets by holding the appropriate lock and
            : iterating over the tablet map
            : 2. the alternative to having some thread periodically iterate
            : over the tablet map is to increment and decrement the metrics
            : when tablets transition states. This is error-prone,
            : particularly if new states are added, and mistakes will
            : accumulate until the metric is worse than useless.
> I appreciate the justification, but I don't think the heartbeater is a grea
I could use a function gauge, but a really fast metric poll could contend with 
other access to the tablet manager's tablet map: I would need n function gauges 
for n tablet states, which is n locks and iterations each time someone hits 
/metrics. That's why I liked the heartbeat approach despite the drawbacks...it 
just adapts the iteration over the tablet map that was already done for 
heartbeats (which already iterated k times for k masters).

I don't think multiply-instantiating a metrics has bad effects. It doesn't 
create multiple metrics.

Is the alternative you support to track the state transitions directly? I.e. in 
TransitionFromFooToBar() do
num_foos->IncrementBy(-1);
num_bars->Increment();


PS2, Line 23: A metric entity can now be
            : marked as hidden, so it will not print to /metrics, and
            : tablet metric entities are marked as hidden when the tablet is
            : tombstoned, and un-hidden if and when the tablet is revived.
> Why hide them and not remove them outright? Don't "hidden" entities still a
They take up some memory as part of the TabletReplica that remains in the 
tablet manager's map, and they need to be iterated over and skipped when 
metrics are gathered. I think that's small overhead.

The benefit of making them hidden is that it's very simple. Actually removing 
an entity means
1. removing all references to it
2. removing all reference to its child metrics
3. Hitting /metrics after the retirement period has passed (default 2 minutes)
I think only the Tablet actually holds a reference to its entity, so 1 is easy. 
2 is harder, because a number of other objects hold references to tablet 
metrics, e.g. the transaction tracker and log. Some are destroyed when the 
TabletReplica is shut down, for example the Tablet itself, and some aren't, 
e.g. the log and the transaction tracker.

But, if you do think it's worth it to remove and then re-init the metrics in 
the various places when a tablet is tombstoned and then revived, I can do that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7618
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to