Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14630 )

Change subject: [metrics] Add metric severity level
......................................................................


Patch Set 7:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14630/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14630/6//COMMIT_MSG@13
PS6, Line 13: Additionally, as the tablet count grows within the cluster, the 
tablet level
            : metrics may overwhelm users and some metric collection tools. For 
example
            : a server with 37
> I'd rephrase this a bit: "Additionally, as the tablet count grows within th
Done


http://gerrit.cloudera.org:8080/#/c/14630/6//COMMIT_MSG@30
PS6, Line 30:
> over time
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/docs/administration.adoc
File docs/administration.adoc:

http://gerrit.cloudera.org:8080/#/c/14630/6/docs/administration.adoc@111
PS6, Line 111: - `/metrics?level=info` - limits the returned metrics based on 
their severity level.
> Should probably doc the default level here too.
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/consensus/log_metrics.cc
File src/kudu/consensus/log_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/consensus/log_metrics.cc@30
PS6, Line 30:                         kudu::MetricLevel::kInfo,
> Likewise.
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/consensus/log_metrics.cc@36
PS6, Line 36:                         kudu::MetricLevel::kInfo,
> This one is more useful, kInfo?
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/consensus/log_reader.cc
File src/kudu/consensus/log_reader.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/consensus/log_reader.cc@55
PS6, Line 55:                         kudu::MetricLevel::kInfo,
> kInfo
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/master/sentry_client_metrics.cc
File src/kudu/master/sentry_client_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/master/sentry_client_metrics.cc@46
PS6, Line 46:                         kudu::MetricLevel::kInfo,
> kInfo on this one too.
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_metrics.cc@30
PS6, Line 30: METRIC_DEFINE_counter(tablet, rows_inserted, "Rows Inserted",
            :     kudu::MetricUnit::kRows,
            :     "Number of rows inserted into this tablet since service 
start",
            :     kudu::MetricLevel::kDebug);
            : METRIC_DEFINE_counter(tablet, rows_upserted, "Rows Upserted",
            :     kudu::MetricUnit::kRows,
            :     "Number of rows upserted into this tablet since service 
start",
            :     kudu::MetricLevel::kDebug);
            : METRIC_DEFINE_counter(tablet, rows_updated, "Rows Updated",
            :     kudu::MetricUnit::kRows,
            :     "Number of row update operations performed on this tablet 
since service start",
            :     kudu::MetricLevel::kDebug);
            : METRIC_DEFINE_counter(tablet, rows_deleted, "Rows Deleted",
            :     kudu::MetricUnit::kRows,
            :     "Number of row delete operations performed on this tablet 
since service start",
            :     kudu::MetricLevel::kDebug);
> I think these should be kInfo.
Why are these especially useful? Given they are cumulative since service start 
I find them harder to use and interpret.


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_metrics.cc@247
PS6, Line 247: METRIC_DEFINE_histogram(tablet, flush_dms_duration,
> These histograms are super useful; they should be kInfo.
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_metrics.cc@307
PS6, Line 307:   kudu::MetricLevel::kWarn);
> kWarn, this could indicate too much memory pressure.
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_replica.cc@64
PS6, Line 64: METRIC_DEFINE_histogram(tablet, op_prepare_queue_length, 
"Operation Prepare Queue Length",
            :                         kudu::MetricUnit::kTasks,
            :                         "Number of operations waiting to be 
prepared within this tablet. "
            :                         "High queue lengths indicate that the 
server is unable to process "
            :                         "operations as fast as they are being 
written to the WAL.",
            :                         kudu::MetricLevel::kInfo,
            :                         10000, 2);
            :
            : METRIC_DEFINE_histogram(tablet, op_prepare_queue_time, "Operation 
Prepare Queue Time",
            :                         kudu::MetricUnit::kMicroseconds,
            :                         "Time that operations spent waiting in 
the prepare queue before being "
            :                         "processed. High queue times indicate 
that the server is unable to "
            :                         "process operations as fast as they are 
being written to the WAL.",
            :                         kudu::MetricLevel::kInfo,
            :                         10000000, 2);
            :
            : METRIC_DEFINE_histogram(tablet, op_prepare_run_time, "Operation 
Prepare Run Time",
            :                         kudu::MetricUnit::kMicroseconds,
            :                         "Time that operations spent being 
prepared in the tablet. "
            :                         "High values may indicate that the server 
is under-provisioned or "
            :                         "that operations are experiencing high 
contention with one another for "
            :                         "locks.",
            :                         kudu::MetricLevel::kInfo,
            :                         10000000, 2);
> kInfo, these are often useful in diagnosing lock contention (i.e. several "
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/tablet/tablet_replica_mm_ops.cc@84
PS6, Line 84: METRIC_DEFINE_histogram(tablet, log_gc_duration,
> Generally speaking I think all histograms associated with MM ops should be
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/util/metrics.h@420
PS6, Line 420: always wa
> Nit: second use of 'generally', drop
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/util/metrics.h@427
PS6, Line 427: enum class MetricLevel {
> Why wrap it in a struct? You could define Name as a free function and just
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/util/metrics.h@452
PS6, Line 452:
             : struct MergeAttributes {
> Why a vector and not just a Level? Doesn't one level imply all the levels b
Done


http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/util/metrics.cc
File src/kudu/util/metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14630/6/src/kudu/util/metrics.cc@41
PS6, Line 41:
            : // Process/server-wide metrics should go into the 'server' entity.
            : // More complex applications will define other entities.
            : METRIC_DEFINE_entity(server);
            : 
            : namespace kudu {
            :
            : using std::string;
> On second thought, I don't think we want a gflag for this at all. Level-bas
The idea here was that if we know certain metrics aren't commonly used and 
introduce significant load at scale (either through logging or URL) we could 
omit them by default but allow users to opt-in to seeing them.

I can remove the flag for now. We can add it back if we find it useful.



--
To view, visit http://gerrit.cloudera.org:8080/14630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5aa7d2323bb75700104c348a3ae859fc449e1715
Gerrit-Change-Number: 14630
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <405403...@qq.com>
Gerrit-Reviewer: helifu <hzhel...@corp.netease.com>
Gerrit-Comment-Date: Thu, 07 Nov 2019 22:44:34 +0000
Gerrit-HasComments: Yes

Reply via email to