Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13426 )
Change subject: KUDU-2797: the master aggregates tablet statistics ...................................................................... Patch Set 15: (5 comments) Looking good for the most part! I'm less familiar with our metrics framework; Mike, could you take a look if you can? http://gerrit.cloudera.org:8080/#/c/13426/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13426/15//COMMIT_MSG@12 PS15, Line 12: In this patch, tablet statistics including disk size and live row : counts of all the replicas are aggregated on the master and only : the leader ones are summed up and displayed on the 'TablePage'. : At the same time, every replica's statistics are displayed on the : 'TabletPage'. Could you also add the detail that these are also exposed as metrics on the masters and tablet servers. http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/integration-tests/ts_tablet_manager-itest.cc@632 PS15, Line 632: while (true) { : NO_FATALS(TriggerHeartbeat()); : ASSERT_EQ(0, table_info->metrics()->live_row_count->value()); : if (table_info->metrics()->on_disk_size->value() > 0) { : break; : } : } nit: if you use ASSERT_EVENTUALLY, it will do some back-off for you. E.g. ASSERT_EVENTUALLY([&] { NO_FATALS(TriggerHeartbeat()); ASSERT_EQ(0, table_info->metrics()->live_row_count->value()); ASSERT_LT(0, table_info->metrics()->on_disk_size->value()); }); Same below. http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/master/master_stats.cc File src/kudu/master/master_stats.cc: PS15: This has more to do with "metrics" than it does "stats". How about calling it TableMetrics? http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/tablet/tablet_replica.cc@845 PS15, Line 845: const ReportedTabletStatsPB& TabletReplica::stats_pb() const { : return stats_pb_; : } On second thought, I think for more obvious thread-safety, this should indeed return a copy instead of a cref. And also maybe move stats_lock_ from the TsTableManager into this class here instead, rather than taking a server-wide lock for each replica. http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/13426/15/src/kudu/tserver/ts_tablet_manager.cc@1308 PS15, Line 1308: lock_stats_.lock_shared(); : ReportedTabletStatsPB stats_pb = replica->stats_pb(); : lock_stats_.unlock_shared(); The fact that we're taking a lock just for this makes me think the locking should actually be in the TabletReplica. See my comment in tablet_replica.cc -- To view, visit http://gerrit.cloudera.org:8080/13426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74406ab7cca7c22fda455c328b8ee9989a6b2d99 Gerrit-Change-Number: 13426 Gerrit-PatchSet: 15 Gerrit-Owner: helifu <hzhel...@corp.netease.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: helifu <hzhel...@corp.netease.com> Gerrit-Comment-Date: Mon, 24 Jun 2019 22:20:43 +0000 Gerrit-HasComments: Yes