Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24125

to look at the new patch set (#2).

Change subject: KUDU-829 Add online gc op to clean orphaned blocks
......................................................................

KUDU-829 Add online gc op to clean orphaned blocks

This patch adds a background manintenance GC operation that makes use
of persisted information about set of orphaned blocks that could not be
deleted earlier due to transient errors, and tries to delete those in
the background.

Main highlights:
- Add RetryOrphanedBlockDeletion as the main driving function that does
  fetching of orphaned blocks, call DeleteOrphanedBlocks for the set of
  orphaned blocks.
- Ensure DeleteOrphanedBlocks doesn't erase all the block entries from
  the set of orphaned blocks when either partial or nothing from the set
  was successfully deleted. The blocks with failed deletions are to be
  used in next iteration by either GC or LoadSuperBlock.
- With this change, ensure when superblock load kicks in, we try to
  delete the stale orphaned blocks only once. Irrespective of what the
  outcome is from deletion, erase all the block entries from in-memory
  set. This avoids possibility of those blocks being retried for delete
  unnecessarily.
- Add unit tests to:
  * Test that OrphanedGCOp runs when there orphaned blocks to be deleted.
  * Measure performance benchmark for the KUDU-1060 startup overhead.
  * Test startup overhead with many tombstoned tablets.
- To address the issue of frequent scheduling of GC operation if there
  are persistent I/O errors, add backoff logic that increments the time
  duration exponentially after each repeated failure hit during cleaning
  of orphaned blocks (possibly due to some persisten disk error). Until
  this time duration passes, no orphaned blocks GC op is scheduled.
- Add unit test to verify the backoff logic to slow down frequent
  scheduling orphaned blocks GC op if consecutive cleaning failures are
  seen.

=== KUDU-1060 Startup Overhead Benchmark ===
  Tablets                    : 1000
  Orphaned blocks / tablet   : 1000000
  With cleanup + Flush()     : 613387 ms
  Without cleanup + Flush()  : 253331 ms
  Extra overhead (Flush x N) : 360056 ms
  Avg overhead per tablet    : 360 ms

=== KUDU-1060 One-Time-Cost Benchmark ===
  Tablets                                    : 1000
  Orphaned blocks / tablet                   : 1000000
  Pass 1 - 1st restart (fix ON, dirty)       : 629216 ms  [cleanup+Flush for 
all tablets]
  Pass 2 - 2nd restart (fix ON, clean)       : 114 ms  [superblocks already 
clean]
  Pass 3 - clean baseline (fix OFF, clean)   : 112 ms  [no orphaned blocks to 
parse]
  Pass 4 - legacy baseline (fix OFF, dirty)  : 262499 ms  [parses orphan IDs, 
no Flush]
  Pass1 overhead vs Pass4 legacy             : 366717 ms
  Pass2 vs Pass3 clean baseline              : 2 ms  (should be ~0)

Change-Id: I732133b460df2fe4c91d05f420dda6f0274d440e
---
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops-test.cc
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
10 files changed, 982 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/24125/2
--
To view, visit http://gerrit.cloudera.org:8080/24125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I732133b460df2fe4c91d05f420dda6f0274d440e
Gerrit-Change-Number: 24125
Gerrit-PatchSet: 2
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to