Adar Dembo has posted comments on this change. Change subject: WIP KUDU-1943: Add BlockTransaction to Block Manager ......................................................................
Patch Set 8: (33 comments) I reviewed everything outside the LBM, and then began reviewing the LBM changes. I stopped because I saw two pretty serious issues that we need to think about: Container thread safety. WritableBlocks aren't supposed to be shared by multiple threads, which meant that containers were also single threaded as blocks were added to them. Now that's no longer the case, and this introduces some synchronization issues when accessing container state. I think they can be addressed with the introduction of a per-container lock (and careful thinking about the right order in which to acquire the container lock and the LBM lock). Consistent container in-memory accounting when operations fail. KUDU-1793 was a serious issue, and to fix it, I tried to guarantee that container in-memory state is only updated after the "point of no return" (that is, when a block was guaranteed to be successfully written). This change explicitly does away with that due to how Finalize() works. There's no way around this, but it means we need to think very carefully about what happens to container in-memory state in the event of a failure after a block is finalized. http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/cfile/bloomfile.h File src/kudu/cfile/bloomfile.h: PS8, Line 47: blocks It's just one block, right? Why 'blocks'? http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 283: block_->Finalize(); RETURN_NOT_OK. Line 289: block_->Finalize(); RETURN_NOT_OK. http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/cfile/cfile_writer.h File src/kudu/cfile/cfile_writer.h: PS8, Line 109: blocks Just one block. http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 396: Nit: why add an empty line here and not in the WriterThread/ReaderThread functions, which do the same thing? Or was this a mistake? http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 995: ASSERT_OK(transaction.CommitWrites()); Before committing, how about trying to open all the blocks in block_ids and verifying that they can't be found? http://gerrit.cloudera.org:8080/#/c/7207/3/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 143: > yes, there is some cases the block is flushed but not finalized, you can ca I think FLAGS_cfile_do_on_finish has mostly outlived its usefulness (it was used for performance experiments early on). So let's merge the "block is finalized" and "block is flushing" states. This means you should make two gflags changes: - Change cfile_do_on_finish to something like cfile_close_block_on_finish. It'd be a boolean with a default value of false. - Add something like block_manager_flush_on_finalize. It'd be a boolean with a default value of true. http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS8, Line 73: For LogWritableBlock, container : // is released for further usage to avoid unnecessarily creating new containers. Generalize this example so that it doesn't refer to implementation details. PS8, Line 76: For LogWritableBlock, : // it also finishes the blocks belong to the same container together to : // reduce fsync() usage. Generalize this so that it doesn't refer to a particular implementation. Line 91: // The block is finalized as it has been fully written. How about: "There is some dirty data in the block and no more may be written." Line 146: // Finalize a fully written block. This doesn't explain what restrictions the FINALIZED state places on the WritableBlock. Can I keep writing to it? Can I call FlushDataAsync() on it? Is the block data now durably on disk? Use the comments in FlushDataAsync()/Append()/Close() for inspiration. Line 147: virtual Status Finalize() = 0; Nit: since this has a real effect on the state of the block, please move it to be just above BytesAppended(). That way the "simple accessor" functions are grouped together, and the "complicated" functions likewise. Line 149: virtual int64_t offset() const { return 0; } I don't think this belongs in WritableBlock. It doesn't make any sense to anyone outside a block manager implementation, because it's not even clear what it's an offset into (a file? a container? something else?). Why do you need it? Line 285: // Group a set of blocks writes together in a transaction. The names and comments here conflate "write" and "create". Please choose one and use it consistently. Line 320: std::vector<WritableBlock*> created_blocks_; Since we're now writing C++11 code, could you change this to be a vector of unique_ptr<WritableBlock>? Then you won't need the STLDeleteElements() call in the destructor, you could call created_blocks_.clear() in CommitWrites(), and you could probably remove the stl_util.h include. http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 237: virtual Status Finalize() OVERRIDE; Nit: move to be just above BytesAppended(). http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: Line 377: // Test a container can be reused when the block is finalized Please also add a variant of this for block_manager-test, to exercise FBM code. Line 382: unique_ptr<WritableBlock> writer; The writer goes out of scope (and thus is closed) with every iteration of the loop, so wouldn't this test pass even without the call to Finalize()? Seems to me you'd need to retain the writers in-memory and not allow them to be closed, then test that only one container is created. Line 385: writer->Finalize(); ASSERT_OK http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 272: Status FinishBlocks(std::vector<WritableBlock*> blocks); Don't need std:: prefix. Line 1174: return DoClose(SYNC, true); What happens if I Close() (or Abort()) a block that's FINALIZED? Won't we call MakeContainerAvailable() on it twice, once in Finalize() and once in Close()? What happens if I Close() multiple FINALIZED blocks, each from its own thread? Won't we have thread safety issues in various LogBlockContainer functions? Please add test cases for both of these. The first case could be in block_manager-test (in some test that brings a block through various states). Actually, both could probably live in block_manager-test. PS8, Line 1260: logBlockManager How about just 'lbm' or 'bm'? Line 1265: logBlockManager->RoundUpContainer(container_, block_offset_, How does this work when a FINALIZED block fails to Close() (or Abort())? Won't the in-memory container accounting be all messed up? See BlockManagerTest.TestMetadataOkayDespiteFailedWrites for a test that tries to simulate this. I bet if you added Finalize() calls to the blocks being written, you'd see all sorts of interesting failures. Line 1304: if (!should_finish) Finalize(); Finalize() can fail; you should update s if it does. Line 1726: std::lock_guard<simple_spinlock> l(lock_); Why do you need to take the lock? Line 1732: for (const auto &entry : created_block_map) { const auto& Line 1813: void LogBlockManager::RoundUpContainer(LogBlockContainer* container, AFAICT, the only reason to go through the LBM for this (and not call container->RoundUpContainer() directly) is so you can acquire lock_. Is that right? I don't think that makes sense. For one, it's n (where n == number of blocks) lock acquisitions when we're closing a lot of blocks together. For two, if we need to protect container state, I think it'd be cleaner to introduce a per-container lock. Line 1816: RoundUpContainerUnlocked(container, offset, length); Since RoundUpContainerUnlocked is only ever called here, you can omit the Unlocked variant. http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: Line 195: FRIEND_TEST(LogBlockManagerTest, TestFinalizeBlock); Nit: alphabetize. Line 264: int64_t offset, int64_t length); Nit: indentation http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/tablet/deltafile.h File src/kudu/tablet/deltafile.h: Line 72: // Closes the delta file, releasing the underlying block to 'closer'. Update http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/tablet/diskrowset.h File src/kudu/tablet/diskrowset.h: PS8, Line 91: it them http://gerrit.cloudera.org:8080/#/c/7207/8/src/kudu/tablet/multi_column_writer.h File src/kudu/tablet/multi_column_writer.h: Line 69: // to 'closer'. Update. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a88884bd Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes