[kudu-CR] log block manager: fix corruption after re-opening compacted metadata

Todd Lipcon (Code Review) Thu, 08 Jun 2017 13:27:33 -0700

Todd Lipcon has posted comments on this change.

Change subject: log_block_manager: fix corruption after re-opening compacted 
metadata
......................................................................



Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7113/3//COMMIT_MSG
Commit Message:

Line 32: The bulk of the changes here are to tests:
> As this is a severe bug, I'd like to better understand why our existing bla
block_manager-stress-test missed it for a couple reasons from what I can tell:
- it only restarted once, as you mentioned
- it set the fd cache to 512, which was enough to cause some churn in the FBM, 
but not much in the LBM. For this bug you trigger, you need "just the right 
amount" of churn -- it has to be big enough that you will reuse a cached FD 
when re-appending to the metadata after restart, but small enough that you will 
then evict and re-open the same FD again before the next restart
- block deletion is faster than block creation, so the deleter threads were 
"keeping up with" the writer threads, and on restart we always had only a very 
small number of live containers. Most were 100% dead and therefore rarely got 
compacted, but rather just removed.

Even with the changes I made, it seemed to reproduce reliably with some 
runtimes (eg 4 seconds) but not with slightly longer or shorter. It was hard to 
get it to reproduce reliably, so I went and added the more specific test.

As for other tests (non-specific to block manager), the typical amounts of data 
are small enough that the FD cache doesn't really churn. Given you need the 
specific interleaving described above, we didn't have any likelihood of hitting 
it in our other various tests which do multiple restarts.

As for tests outside of the automated ctests, we did actually hit this on our 
automated long-lived YCSB runs that we run internally at Cloudera. We didn't 
hit it sooner due to some test infra subtleties where it wasn't properly 
tracking master for the last week or two. When it rains, it pours, though, and 
it seems that at least 3 of our test clusters have now hit it, so we do at 
least have some ability to feel confident in our tests.


http://gerrit.cloudera.org:8080/#/c/7113/3/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

PS3, Line 1242: Since we have more metadata files than
              :   // we have file_cache capacity, this will also generate a mix 
of cache hits,
              :   // misses, and re-insertions.
> Without the fix, how often does this test trigger the bug?
pretty much every time (at least it triggered it every time I ran it)


http://gerrit.cloudera.org:8080/#/c/7113/3/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 242:   // fd.
> Nit: odd wrapping?
Done


Line 245:   ASSERT_EQ(size, kData1.size());
> Nit: expected should come first (as per L259).
Done


http://gerrit.cloudera.org:8080/#/c/7113/3/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 143:   void MarkInvalidated() {
> Nit: add a comment explaining what it means to be invalidated.
Done


Line 525: void FileCache<FileType>::Invalidate(const string& file_name) {
> There are three interesting things going on here and I want to make sure I 
Yes, your understanding is correct.

I think to handle the race we can do the following order:

1. under the lock: mark any existing descriptor as invalidated, or insert an 
invalidated descriptor if there is nothing in the map
2. erase the entry from the cache
3. under the lock again: erase the descriptor from the map

Thinking about a concurrent OpenExistingFile() call at each of the points:
before 1: continues to open the old file (i.e serialized before Invalidate())
before 2: fetches the invalidated descriptor and gets an Invalidated on first 
access (i.e serialized before Invalidate())
before 3: continues to get the invalidated descriptor (same as above)
after 3: creates a new Descriptor, and gets a cache miss, thus opening the 
_new_ path (serialized after Invalidate())

For concurrent access to _existing_ descriptors:
before 1: nothing notable (serialized before)
any time after 1: gets an error because it's invalidated

If we expected this to be used concurrently I'd write up a stress test for this 
but the expected use case is that we would externally synchronize this, so 
don't think it's worth the effort.


-- 
To view, visit http://gerrit.cloudera.org:8080/7113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I14b2c64685e24d27591258911db4aeb9e8020a4d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] log block manager: fix corruption after re-opening compacted metadata

Reply via email to