Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24125
to look at the new patch set (#2).
Change subject: KUDU-829 Add online gc op to clean orphaned blocks
......................................................................
KUDU-829 Add online gc op to clean orphaned blocks
This patch adds a background manintenance GC operation that makes use
of persisted information about set of orphaned blocks that could not be
deleted earlier due to transient errors, and tries to delete those in
the background.
Main highlights:
- Add RetryOrphanedBlockDeletion as the main driving function that does
fetching of orphaned blocks, call DeleteOrphanedBlocks for the set of
orphaned blocks.
- Ensure DeleteOrphanedBlocks doesn't erase all the block entries from
the set of orphaned blocks when either partial or nothing from the set
was successfully deleted. The blocks with failed deletions are to be
used in next iteration by either GC or LoadSuperBlock.
- With this change, ensure when superblock load kicks in, we try to
delete the stale orphaned blocks only once. Irrespective of what the
outcome is from deletion, erase all the block entries from in-memory
set. This avoids possibility of those blocks being retried for delete
unnecessarily.
- Add unit tests to:
* Test that OrphanedGCOp runs when there orphaned blocks to be deleted.
* Measure performance benchmark for the KUDU-1060 startup overhead.
* Test startup overhead with many tombstoned tablets.
- To address the issue of frequent scheduling of GC operation if there
are persistent I/O errors, add backoff logic that increments the time
duration exponentially after each repeated failure hit during cleaning
of orphaned blocks (possibly due to some persisten disk error). Until
this time duration passes, no orphaned blocks GC op is scheduled.
- Add unit test to verify the backoff logic to slow down frequent
scheduling orphaned blocks GC op if consecutive cleaning failures are
seen.
=== KUDU-1060 Startup Overhead Benchmark ===
Tablets : 1000
Orphaned blocks / tablet : 1000000
With cleanup + Flush() : 613387 ms
Without cleanup + Flush() : 253331 ms
Extra overhead (Flush x N) : 360056 ms
Avg overhead per tablet : 360 ms
=== KUDU-1060 One-Time-Cost Benchmark ===
Tablets : 1000
Orphaned blocks / tablet : 1000000
Pass 1 - 1st restart (fix ON, dirty) : 629216 ms [cleanup+Flush for
all tablets]
Pass 2 - 2nd restart (fix ON, clean) : 114 ms [superblocks already
clean]
Pass 3 - clean baseline (fix OFF, clean) : 112 ms [no orphaned blocks to
parse]
Pass 4 - legacy baseline (fix OFF, dirty) : 262499 ms [parses orphan IDs,
no Flush]
Pass1 overhead vs Pass4 legacy : 366717 ms
Pass2 vs Pass3 clean baseline : 2 ms (should be ~0)
Change-Id: I732133b460df2fe4c91d05f420dda6f0274d440e
---
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops-test.cc
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
10 files changed, 982 insertions(+), 13 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/24125/2
--
To view, visit http://gerrit.cloudera.org:8080/24125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I732133b460df2fe4c91d05f420dda6f0274d440e
Gerrit-Change-Number: 24125
Gerrit-PatchSet: 2
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)