Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )
Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata ...................................................................... Patch Set 62: (22 comments) http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@50 PS61, Line 50: constructed > nit: constructed Done http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@52 PS61, Line 52: Construct > nit: Construct Done http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@56 PS61, Line 56: logr > "logr" a little short and not clear. use 'rocksdb' directly? Now there are block manage types named "file" and "log", where "file" type indicate the FileBlockManager which maps each block to its own file on disk, and "log" type indicate the LogBlockManagerNativeMeta which combine tens of thousands of blocks to a shared data file, the data file is a log-backed file (i.e. sequentially allocated file). The newly introduced type (class LogBlockManagerRdbMeta) is the same to LogBlockManagerNativeMeta when write the data file, the difference between them are the metadata part, while the former store metadata in RocksDB and the latter store metadata in a native file, both of them inherit from class LogBlockManager. So I keep the "log" part and append "r" to indicate rocksdb. If using "rocksdb", users may misunderstand this is to store all data (both data and metadata) in rocksdb. http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h File src/kudu/fs/dir_manager.h: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@49 PS61, Line 49: tus. > nit: is used only internally Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@109 PS61, Line 109: > Does it make sense to use the sub-classing here, creating a class that inhe I've updated the psatch to do this. Because the inheritance hierarchy is a bit of complex, if just adding a sub-class, say RdbDir, it would be more complex, the graph show the inheritance hierarchy: Dir <-------- DataDir ^ ^ | | RdbDir <--- RdbDataDir And this will make the RdbDataDir constructor hard to implement [1]. > Since B, C inherit A virtually, A must be constructed in each child Additionally, there are unique pointers in the constructor's parameter list. So I made a small refactor before this (merge that if necessary), remove the meaningless DataDir, see [2]. After [1], the inheritance hierarchy would be: Dir <--- RdbDir 1. https://en.wikipedia.org/wiki/Virtual_inheritance 2. https://gerrit.cloudera.org/#/c/20833/ http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@121 PS61, Line 121: ures s > nit: returns Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@199 PS61, Line 199: > nit: fields Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@201 PS61, Line 201: > nit: . Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@308 PS61, Line 308: return failed_dirs_.size() == dirs_.size(); : } : : // Return a list of the canonicalized root directory names. > If this is test-only, is it feasible to move this into the protected/privat Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc File src/kudu/fs/dir_manager.cc: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc@134 PS61, Line 134: return; > It seems DCHECK_STREQ() is used for the same invariant check on line 187. Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h File src/kudu/fs/fs_report.h: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h@291 PS61, Line 291: corrupted_rdb_record_check > This field seems no used. So add a TODO? It is used actually, garthered in [1] and repaired in [2]. 1. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/log_block_manager.cc#2054 2. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/log_block_manager.cc#4031 http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc File src/kudu/fs/log_block_manager-test-util.cc: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc@626 PS61, Line 626: auto* > Why CHECK() here and DCHECK() elsewhere? Is it possible to switch to DCHEC Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@420 PS61, Line 420: Some initialize work > Could you be more specific what exactly is supposed to be initialized by th It depends on the sub-class implementation, LogBlockManagerNativeMeta::InitDataDir() is noop, LogBlockManagerRdbMeta::InitDataDir() needs to initialize something, I've added some comments to the latter. http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599 PS61, Line 599: in backgro > in background Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599 PS61, Line 599: The data in RocksDB will be compacted > BTW, what happens if a tablet server shuts down while the RocksDB backgroun When the tserver shut down normally, the RocksDB object will be destoryed in gracefully [1][2] (I've updated the code to call db_->Close() to expose the error messages if any). If the tserver crashed unexpectedly, the RocksDB internal threads (includes the compaction/flush threads) will also exit. If the server crashed unexpectedly, it will not cause inconsistency. Simply speaking, RocksDB has a mechanism of "VersionSet" (Similar to Kudu's tablet-meta to keep rowsets and block-ids consistency), which maintains a set of data in different "versions", to ensure that different versions of data are correctly linked and referenced, even during compaction and deletion processes. Data files before and after comptation are in different versions, so as long as the old version is valid, the data files are exist and the data is consistency. 1. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/dir_manager.cc#205 2. https://github.com/facebook/rocksdb/wiki/Basic-Operations#closing-a-database http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@657 PS61, Line 657: > failing to repair an inconsistency In fact, I copied this comment from the super class, I will leave the "fatal" word here for clarification. There are 4 type of inconsistencies in Kudu fs layer, see [1]. The difference is whether they are "fatal" - For the non-fatal inconsistencies, the repair procedure may fail, but just log it and will not return non-OK to the caller, the server will still run happily - If the fatal inconsistencies repair failed, non-OK will be returned from DoRepair() to the caller, the Kudu process will exit in this case. 1. https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_report.h#L187 http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@1987 PS61, Line 1987: t auto kEncryptionHea > nit: why is this name for the variable that's not a static constant? It's copied from LogBlockContainerNativeMeta::CheckContainerFiles(). I've updated to use static constants. http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@2093 PS61, Line 2093: // Construct key. > this statement happens several times in this file, may be a simple function Good idea! I've updated all related code, including tests. http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@4035 PS61, Line 4035: ast<RdbDir*>(dir > Should this key be id + "." + block_id ? Sorry it's the bad naming make you misunderstand, the "block_id" should be "rocksdb_key" which is pushed in line 2054. It is the key in rocksdb which is in the form of "<container_id>.<block_id>" already. I've updated the name. http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763 PS61, Line 763: will fail start > will fail starting Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763 PS61, Line 763: se of > of the failure Done http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@764 PS61, Line 764: t it > case Done -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 62 Gerrit-Owner: Yingchun Lai <laiyingc...@apache.org> Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: KeDeng <kdeng...@gmail.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Reviewer: Yingchun Lai <laiyingc...@apache.org> Gerrit-Reviewer: Yuqi Du <shenxingwuy...@gmail.com> Gerrit-Comment-Date: Tue, 26 Dec 2023 03:03:55 +0000 Gerrit-HasComments: Yes