helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13426 )
Change subject: KUDU-2797: the master aggregates tablet metrics ...................................................................... Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/catalog_manager.h@189 PS5, Line 189: ReportedTabletMetricsPB peer_metrics_return(); > +1 Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/catalog_manager.h@189 PS5, Line 189: ReportedTabletMetricsPB peer_metrics_return(); > There may be multiple replica metrics associated with a tablet. Which gets Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/catalog_manager.h@187 PS5, Line 187: // Upsert or delete the metrics. : void peer_metrics_upsert(const std::string& peer, const ReportedTabletMetricsPB& metrics); : ReportedTabletMetricsPB peer_metrics_return(); : void peer_metrics_delete(const std::string& peer); > Looking at the other functions of the CatalogManager, perhaps these names w Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/catalog_manager.cc@5243 PS5, Line 5243: peer_metrics_upsert > In newer code, we've generally used "replica" instead of "peer" when referr Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/master.proto@233 PS5, Line 233: MetricsPB > I don't love the idea that this is called a "metric", since I think this co Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/master/master.proto@233 PS5, Line 233: MetricsPB > +1, statistics seems more descriptive of what we're building in here Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.h File src/kudu/tserver/heartbeater.h: http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.h@77 PS5, Line 77: void SetUpdateTimeForTests(const MonoTime& time); > nit: If possible, make the ForTests() stuff private and add the test as a f Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.h@76 PS5, Line 76: const MonoTime GetUpdateTime(); : void SetUpdateTimeForTests(const MonoTime& time); : void UpdateTabletMetrics(); : Status GetTabletMetricsPB(const std::string& tablet_id, master::ReportedTabletMetricsPB* pb); > Could you add comments describing these and how a user of the Heartbeater c Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc File src/kudu/tserver/heartbeater.cc: http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@318 PS5, Line 318: // 1. > nit: Could you remove these numbers? This function is short enough that it' Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@319 PS5, Line 319: lock_.try_lock() > Please ignore my question about the threads, I left that as a note to mysel Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@319 PS5, Line 319: lock_.try_lock() > nit: this would be more resistant to bugs caused by future code changes if Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@344 PS5, Line 344: for (auto it = tablet_metrics.begin(); it != tablet_metrics.end(); ++it) { > Should we remove metrics that don't exist in tablet_metrics (i.e. tablets t Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@343 PS5, Line 343: vector<string> dirty_tablets; : for (auto it = tablet_metrics.begin(); it != tablet_metrics.end(); ++it) { : auto iter = tablet_metrics_.find(it->first); : if (iter != tablet_metrics_.end() && : iter->second->disk_size() == it->second->disk_size() && : iter->second->row_count() == it->second->row_count()) { : continue; : } : : tablet_metrics_[it->first] = std::move(it->second); : dirty_tablets.emplace_back(it->first); : } > It's hard to follow this because of how similar 'iter' and 'it' are. Maybe Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@357 PS5, Line 357: update_time_ = MonoTime::Now() + : MonoDelta::FromMilliseconds(FLAGS_update_tablet_metrics_interval_ms); > nit: I think it'd be more intuitive to keep around a 'time_of_last_update_' Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/heartbeater.cc@642 PS5, Line 642: // If that time expires, let's update the tablet metrics. : if (MonoTime::Now() > parent_->GetUpdateTime()) { : parent_->UpdateTabletMetrics(); : } > How about encapsulating this check into the function itself, e.g. UpdateTab Done http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/ts_tablet_manager-test.cc File src/kudu/tserver/ts_tablet_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/13426/5/src/kudu/tserver/ts_tablet_manager-test.cc@338 PS5, Line 338: 1. > nit: when formatting numbering, please add a space after the dot, like: Done -- To view, visit http://gerrit.cloudera.org:8080/13426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74406ab7cca7c22fda455c328b8ee9989a6b2d99 Gerrit-Change-Number: 13426 Gerrit-PatchSet: 5 Gerrit-Owner: helifu <hzhel...@corp.netease.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: helifu <hzhel...@corp.netease.com> Gerrit-Comment-Date: Tue, 11 Jun 2019 13:39:30 +0000 Gerrit-HasComments: Yes