Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16938 to look at the new patch set (#2). Change subject: [util] add a few new metrics in MaintenanceManager ...................................................................... [util] add a few new metrics in MaintenanceManager This patch adds a couple of metrics for MaintenanceManager to track the duration of choosing the best candidate among available maintenance operations and number of times the Prepare() method for a maintenance operation failed: * maintenance_op_find_best_candidate_duration * maintenance_op_prepare_failed In addition, it adds SCOPED_LOG_SLOW_EXECUTION with the threshold of 10 seconds into the MaintenanceManager::FindBestOp() method. At this point, I manually verified that those metrics are present and show relevant information. I'm planning to add an automated test to cover the behavior of these new metrics in [1] to have less conflicts with the mentioned patch. The motivation for this change is a finding that FindBestOp()'s computational complexity is O(n^2) of the number of replicas per tablet server (each tablet replica registers about 8 maintenance operations). Also, BudgetedCompactionPolicy::RunApproximation()'s computational complexity is O(n^2) of the number of rowset in max and min keys. In the wild, there was an instance of a Kudu cluster with high data ingest ratio with the following stack showing in every snapshot in the diagnostic logs for many hours in a row: 0xa11735 kudu::tablet::BudgetedCompactionPolicy::RunApproximation() 0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets() 0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats() 0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats() 0x1b3de5c kudu::MaintenanceManager::FindBestOp() 0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread() 0x1b86014 kudu::Thread::SuperviseThread() [1] https://gerrit.cloudera.org/#/c/16937/ Change-Id: If5420afd605f9bd22207af142b49e73336907486 --- M src/kudu/master/master.cc M src/kudu/tserver/tablet_server.cc M src/kudu/util/CMakeLists.txt M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h A src/kudu/util/maintenance_manager_metrics.cc A src/kudu/util/maintenance_manager_metrics.h 8 files changed, 138 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/16938/2 -- To view, visit http://gerrit.cloudera.org:8080/16938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486 Gerrit-Change-Number: 16938 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120)