Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar 
Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11554

to look at the new patch set (#11).

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
......................................................................

[tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

ksck checksum scans allow the user to checksum with snapshot scans, so
that a checksum can be done even as tablets are mutated. It also allows
users to omit a snapshot timestamp. Previously, in this case, the
snapshot timestamp would be retrieved from some healthy tablet server at
the beginning of the checksum process, and used for every replica. This
didn't work well for checksumming large tables, because eventually the
snapshot timestamp fell before the ancient history mark, and subsequent
checksums scans would not be accepted by the tablet servers.

This changes how checksum scans work to address this problem:
1. A background process periodically updates timestamps from tablet
   servers.
2. The checksum process is reorganized so the replicas of one tablet
   are checksummed together.
3. When a tablet is about to be checksummed, and the checksum scan is a
   snapshot scan with no user-provided timestamp, the tablet is assigned
   an up-to-date timestamp from one of the tablet servers that hosts a
   replica. Every replica is then checksummed using this snapshot
   timestamp.
4. The original default timeout of 3600 seconds for a checksum scan is
   too low, but it didn't really matter because the default tablet
   history max age was 900 seconds. Now that checksum scans can continue
   for many hours, the default timeout is raised to 86400 seconds (1
   day), and a new idle timeout is added. If a checksum process does not
   checksum an additional row for this idle timeout (default 10
   minutes), it will idle time out.

Note that there is a new scheduling problem given #2: each tablet server
has a fixed number of slots for checksum scans, but every tablet server
hosting a replica must have a slot available before any replica's
checksum can start, so deciding in which order to checksum tablets and
how to find which are available to schedule is important. Given that the
bulk of the time in checksums is occupied waiting for tablet servers to
read lots of data off disk, materialize it as rows, and checksum it,
it's worth spending a lot of effort to make sure the cluster is fully
utilized given the scan concurrency constraints. So, the tool uses a
brute force approach and simply checks all tablets to see which can be
checksummed, any time a replica checksum finishes and frees a slot.
Tablets are considered in tablet id order. Since tablet ids are UUIDs,
there should be no correlation between a tablet's id and how its
replicas are distributed across tablet servers.

There are several tests added:
1. For the KUDU-2179 fix itself.
2. For the idle timeout.
3. For when a checksum finds mismatches. Yes, we didn't have a test for
   this before. After adding this test I saw that the output is a little
   confusing since it reported the number of replicas with mismatches
   rather than the number of tablets, so I altered the output to fix
   that.
4. A couple of tests exercising situations when all tablet servers are
   unavailable and when all peers of a tablet are unavailable.

I also checksummed a very large cluster with 500TB of data or so, across
about 37000 replicas. The checksum scan completed successfully after
more than 12 hours.

Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_checksum.cc
M src/kudu/tools/ksck_checksum.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
7 files changed, 869 insertions(+), 276 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/11
--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>

Reply via email to