Adar Dembo has posted comments on this change.

Change subject: WIP KUDU-1943: Add BlockTransaction to Block Manager
......................................................................


Patch Set 8:

(33 comments)

I reviewed everything outside the LBM, and then began reviewing the LBM 
changes. I stopped because I saw two pretty serious issues that we need to 
think about:

Container thread safety. WritableBlocks aren't supposed to be shared by 
multiple threads, which meant that containers were also single threaded as 
blocks were added to them. Now that's no longer the case, and this introduces 
some synchronization issues when accessing container state. I think they can be 
addressed with the introduction of a per-container lock (and careful thinking 
about the right order in which to acquire the container lock and the LBM lock).

Consistent container in-memory accounting when operations fail. KUDU-1793 was a 
serious issue, and to fix it, I tried to guarantee that container in-memory 
state is only updated after the "point of no return" (that is, when a block was 
guaranteed to be successfully written). This change explicitly does away with 
that due to how Finalize() works. There's no way around this, but it means we 
need to think very carefully about what happens to container in-memory state in 
the event of a failure after a block is finalized.

http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/cfile/bloomfile.h
File src/kudu/cfile/bloomfile.h:

PS8, Line 47: blocks
It's just one block, right? Why 'blocks'?


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 283:     block_->Finalize();
RETURN_NOT_OK.


Line 289:     block_->Finalize();
RETURN_NOT_OK.


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/cfile/cfile_writer.h
File src/kudu/cfile/cfile_writer.h:

PS8, Line 109: blocks
Just one block.


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/block_manager-stress-test.cc
File src/kudu/fs/block_manager-stress-test.cc:

Line 396: 
Nit: why add an empty line here and not in the WriterThread/ReaderThread 
functions, which do the same thing? Or was this a mistake?


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 995:   ASSERT_OK(transaction.CommitWrites());
Before committing, how about trying to open all the blocks in block_ids and 
verifying that they can't be found?


http://gerrit.cloudera.org:8080/#/c/7207/3/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 143: 
> yes, there is some cases the block is flushed but not finalized, you can ca
I think FLAGS_cfile_do_on_finish has mostly outlived its usefulness (it was 
used for performance experiments early on). So let's merge the "block is 
finalized" and "block is flushing" states. This means you should make two 
gflags changes:
- Change cfile_do_on_finish to something like cfile_close_block_on_finish. It'd 
be a boolean with a default value of false.
- Add something like block_manager_flush_on_finalize. It'd be a boolean with a 
default value of true.


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

PS8, Line 73: For LogWritableBlock, container
            : //    is released for further usage to avoid unnecessarily 
creating new containers.
Generalize this example so that it doesn't refer to implementation details.


PS8, Line 76:  For LogWritableBlock,
            : //    it also finishes the blocks belong to the same container 
together to
            : //    reduce fsync() usage.
Generalize this so that it doesn't refer to a particular implementation.


Line 91:     // The block is finalized as it has been fully written.
How about: "There is some dirty data in the block and no more may be written."


Line 146:   // Finalize a fully written block.
This doesn't explain what restrictions the FINALIZED state places on the 
WritableBlock. Can I keep writing to it? Can I call FlushDataAsync() on it? Is 
the block data now durably on disk? Use the comments in 
FlushDataAsync()/Append()/Close() for inspiration.


Line 147:   virtual Status Finalize() = 0;
Nit: since this has a real effect on the state of the block, please move it to 
be just above BytesAppended(). That way the "simple accessor" functions are 
grouped together, and the "complicated" functions likewise.


Line 149:   virtual int64_t offset() const { return 0; }
I don't think this belongs in WritableBlock. It doesn't make any sense to 
anyone outside a block manager implementation, because it's not even clear what 
it's an offset into (a file? a container? something else?). Why do you need it?


Line 285: // Group a set of blocks writes together in a transaction.
The names and comments here conflate "write" and "create". Please choose one 
and use it consistently.


Line 320:   std::vector<WritableBlock*> created_blocks_;
Since we're now writing C++11 code, could you change this to be a vector of 
unique_ptr<WritableBlock>? Then you won't need the STLDeleteElements() call in 
the destructor, you could call created_blocks_.clear() in CommitWrites(), and 
you could probably remove the stl_util.h include.


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

Line 237:   virtual Status Finalize() OVERRIDE;
Nit: move to be just above BytesAppended().


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

Line 377: // Test a container can be reused when the block is finalized
Please also add a variant of this for block_manager-test, to exercise FBM code.


Line 382:     unique_ptr<WritableBlock> writer;
The writer goes out of scope (and thus is closed) with every iteration of the 
loop, so wouldn't this test pass even without the call to Finalize()? Seems to 
me you'd need to retain the writers in-memory and not allow them to be closed, 
then test that only one container is created.


Line 385:     writer->Finalize();
ASSERT_OK


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 272:   Status FinishBlocks(std::vector<WritableBlock*> blocks);
Don't need std:: prefix.


Line 1174:   return DoClose(SYNC, true);
What happens if I Close() (or Abort()) a block that's FINALIZED? Won't we call 
MakeContainerAvailable() on it twice, once in Finalize() and once in Close()?

What happens if I Close() multiple FINALIZED blocks, each from its own thread? 
Won't we have thread safety issues in various LogBlockContainer functions?

Please add test cases for both of these. The first case could be in 
block_manager-test (in some test that brings a block through various states). 
Actually, both could probably live in block_manager-test.


PS8, Line 1260: logBlockManager
How about just 'lbm' or 'bm'?


Line 1265:     logBlockManager->RoundUpContainer(container_, block_offset_,
How does this work when a FINALIZED block fails to Close() (or Abort())? Won't 
the in-memory container accounting be all messed up?

See BlockManagerTest.TestMetadataOkayDespiteFailedWrites for a test that tries 
to simulate this. I bet if you added Finalize() calls to the blocks being 
written, you'd see all sorts of interesting failures.


Line 1304:       if (!should_finish) Finalize();
Finalize() can fail; you should update s if it does.


Line 1726:       std::lock_guard<simple_spinlock> l(lock_);
Why do you need to take the lock?


Line 1732:   for (const auto &entry : created_block_map) {
const auto&


Line 1813: void LogBlockManager::RoundUpContainer(LogBlockContainer* container,
AFAICT, the only reason to go through the LBM for this (and not call 
container->RoundUpContainer() directly) is so you can acquire lock_. Is that 
right?

I don't think that makes sense. For one, it's n (where n == number of blocks) 
lock acquisitions when we're closing a lot of blocks together. For two, if we 
need to protect container state, I think it'd be cleaner to introduce a 
per-container lock.


Line 1816:   RoundUpContainerUnlocked(container, offset, length);
Since RoundUpContainerUnlocked is only ever called here, you can omit the 
Unlocked variant.


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 195:   FRIEND_TEST(LogBlockManagerTest, TestFinalizeBlock);
Nit: alphabetize.


Line 264:                        int64_t offset, int64_t length);
Nit: indentation


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/tablet/deltafile.h
File src/kudu/tablet/deltafile.h:

Line 72:   // Closes the delta file, releasing the underlying block to 'closer'.
Update


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/tablet/diskrowset.h
File src/kudu/tablet/diskrowset.h:

PS8, Line 91: it
them


http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/tablet/multi_column_writer.h
File src/kudu/tablet/multi_column_writer.h:

Line 69:   // to 'closer'.
Update.


-- 
To view, visit http://gerrit.cloudera.org:8080/7207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a88884bd
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to