Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 62:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@50
PS61, Line 50: constructed
> nit: constructed
Done


http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@52
PS61, Line 52: Construct
> nit: Construct
Done


http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@56
PS61, Line 56: logr
> "logr" a little short  and not clear.  use 'rocksdb' directly?
Now there are block manage types named "file" and "log", where "file" type 
indicate the FileBlockManager which maps each block to its own file on disk, 
and "log" type indicate the LogBlockManagerNativeMeta which combine tens of 
thousands of blocks to a shared data file, the data file is a log-backed file 
(i.e. sequentially allocated file).

The newly introduced type (class LogBlockManagerRdbMeta) is the same to 
LogBlockManagerNativeMeta when write the data file, the difference between them 
are the metadata part, while the former store metadata in RocksDB and the 
latter store metadata in a native file, both of them inherit from class 
LogBlockManager.

So I keep the "log" part and append "r" to indicate rocksdb. If using 
"rocksdb", users may misunderstand this is to store all data (both data and 
metadata) in rocksdb.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@49
PS61, Line 49: tus.
> nit: is used only internally
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@109
PS61, Line 109:
> Does it make sense to use the sub-classing here, creating a class that inhe
I've updated the psatch to do this.

Because the inheritance hierarchy is a bit of complex, if just adding a 
sub-class, say RdbDir, it would be more complex, the graph show the inheritance 
hierarchy:

 Dir <-------- DataDir
  ^               ^
  |               |
 RdbDir <---  RdbDataDir

And this will make the RdbDataDir constructor hard to implement [1].
> Since B, C inherit A virtually, A must be constructed in each child

Additionally, there are unique pointers in the constructor's parameter list.
So I made a small refactor before this (merge that if necessary), remove the 
meaningless DataDir, see [2].

After [1], the inheritance hierarchy would be:

 Dir <--- RdbDir

1. https://en.wikipedia.org/wiki/Virtual_inheritance
2. https://gerrit.cloudera.org/#/c/20833/


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@121
PS61, Line 121: ures s
> nit: returns
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@199
PS61, Line 199:
> nit: fields
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@201
PS61, Line 201:
> nit: .
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@308
PS61, Line 308:     return failed_dirs_.size() == dirs_.size();
              :   }
              :
              :   // Return a list of the canonicalized root directory names.
> If this is test-only, is it feasible to move this into the protected/privat
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc@134
PS61, Line 134:   return;
> It seems DCHECK_STREQ() is used for the same invariant check on line 187.
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h
File src/kudu/fs/fs_report.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h@291
PS61, Line 291: corrupted_rdb_record_check
> This field seems no used. So add a TODO?
It is used actually, garthered in [1] and repaired in [2].

1. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/log_block_manager.cc#2054
2. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/log_block_manager.cc#4031


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc
File src/kudu/fs/log_block_manager-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc@626
PS61, Line 626: auto*
> Why CHECK() here and DCHECK() elsewhere?  Is it possible to switch to DCHEC
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@420
PS61, Line 420: Some initialize work
> Could you be more specific what exactly is supposed to be initialized by th
It depends on the sub-class implementation, 
LogBlockManagerNativeMeta::InitDataDir() is noop, 
LogBlockManagerRdbMeta::InitDataDir() needs to initialize something, I've added 
some comments to the latter.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: in backgro
> in background
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: The data in RocksDB will be compacted
> BTW, what happens if a tablet server shuts down while the RocksDB backgroun
When the tserver shut down normally, the RocksDB object will be destoryed in 
gracefully [1][2] (I've updated the code to call db_->Close() to expose the 
error messages if any). If the tserver crashed unexpectedly, the RocksDB 
internal threads (includes the compaction/flush threads) will also exit.
If the server crashed unexpectedly, it will not cause inconsistency. Simply 
speaking, RocksDB has a mechanism of "VersionSet" (Similar to Kudu's 
tablet-meta to keep rowsets and block-ids consistency), which maintains a set 
of data in different "versions", to ensure that different versions of data are 
correctly linked and referenced, even during compaction and deletion processes. 
Data files before and after comptation are in different versions, so as long as 
the old version is valid, the data files are exist and the data is consistency.

1. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/dir_manager.cc#205
2. https://github.com/facebook/rocksdb/wiki/Basic-Operations#closing-a-database


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@657
PS61, Line 657:
> failing to repair an inconsistency
In fact, I copied this comment from the super class, I will leave the "fatal" 
word here for clarification.

There are 4 type of inconsistencies in Kudu fs layer, see [1]. The difference 
is whether they are "fatal"
- For the non-fatal inconsistencies, the repair procedure may fail, but just 
log it and will not return non-OK to the caller, the server will still run 
happily
- If the fatal inconsistencies repair failed, non-OK will be returned from 
DoRepair() to the caller, the Kudu process will exit in this case.

1. https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_report.h#L187


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@1987
PS61, Line 1987: t auto kEncryptionHea
> nit: why is this name for the variable that's not a static constant?
It's copied from LogBlockContainerNativeMeta::CheckContainerFiles().

I've updated to use static constants.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@2093
PS61, Line 2093:     // Construct key.
> this statement happens several times in this file, may be a simple function
Good idea!

I've updated all related code, including tests.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@4035
PS61, Line 4035: ast<RdbDir*>(dir
> Should this key be      id + "." + block_id ?
Sorry it's the bad naming make you misunderstand, the "block_id" should be 
"rocksdb_key" which is pushed in line 2054. It is the key in rocksdb which is 
in the form of "<container_id>.<block_id>" already.

I've updated the name.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763
PS61, Line 763: will fail start
> will fail starting
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763
PS61, Line 763: se of
> of the failure
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@764
PS61, Line 764: t it
> case
Done



--
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 62
Gerrit-Owner: Yingchun Lai <laiyingc...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: KeDeng <kdeng...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com>
Gerrit-Reviewer: Yingchun Lai <laiyingc...@apache.org>
Gerrit-Reviewer: Yuqi Du <shenxingwuy...@gmail.com>
Gerrit-Comment-Date: Tue, 26 Dec 2023 03:03:55 +0000
Gerrit-HasComments: Yes

Reply via email to