Hello Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16938

to look at the new patch set (#2).

Change subject: [util] add a few new metrics in MaintenanceManager
......................................................................

[util] add a few new metrics in MaintenanceManager

This patch adds a couple of metrics for MaintenanceManager to track the
duration of choosing the best candidate among available maintenance
operations and number of times the Prepare() method for a maintenance
operation failed:
  * maintenance_op_find_best_candidate_duration
  * maintenance_op_prepare_failed

In addition, it adds SCOPED_LOG_SLOW_EXECUTION with the threshold of
10 seconds into the MaintenanceManager::FindBestOp() method.

At this point, I manually verified that those metrics are present and
show relevant information.  I'm planning to add an automated test
to cover the behavior of these new metrics in [1] to have less conflicts
with the mentioned patch.

The motivation for this change is a finding that FindBestOp()'s
computational complexity is O(n^2) of the number of replicas per tablet
server (each tablet replica registers about 8 maintenance operations).
Also, BudgetedCompactionPolicy::RunApproximation()'s computational
complexity is O(n^2) of the number of rowset in max and min keys.
In the wild, there was an instance of a Kudu cluster with high data
ingest ratio with the following stack showing in every snapshot in the
diagnostic logs for many hours in a row:

     0xa11735 kudu::tablet::BudgetedCompactionPolicy::RunApproximation()
     0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
     0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats()
     0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats()
    0x1b3de5c kudu::MaintenanceManager::FindBestOp()
    0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread()
    0x1b86014 kudu::Thread::SuperviseThread()

[1] https://gerrit.cloudera.org/#/c/16937/

Change-Id: If5420afd605f9bd22207af142b49e73336907486
---
M src/kudu/master/master.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
A src/kudu/util/maintenance_manager_metrics.cc
A src/kudu/util/maintenance_manager_metrics.h
8 files changed, 138 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/16938/2
--
To view, visit http://gerrit.cloudera.org:8080/16938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486
Gerrit-Change-Number: 16938
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to