Adar Dembo has posted comments on this change.

Change subject: separate DataDirManager from BlockManagers
......................................................................


Patch Set 6:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/7602/6//COMMIT_MSG
Commit Message:

PS6, Line 23: directory
directly


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS6, Line 205:   // Pass in a report to prevent the block manager from logging
             :   // unnecessarily.
             :   FsReport report;
Nit: move this down to just below Open().


Line 691:     ASSERT_OK(env_util::CreateDirIfMissing(this->env_, paths[i]));
Why not a simple CreateDir() here and below? Under what circumstances would 
this path already be created?


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

PS6, Line 260:   // Exposes the underlying DataDirManager.
             :   virtual DataDirManager* dd_manager() = 0;
Why does this still exist? Shouldn't callers outside the block managers get the 
DataDirManager via the FsManager now?


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

Line 65:     LOG(INFO) << GetDirNames(kNumDirs)[0];
Still want this?


Line 77:       CHECK_OK(env_util::CreateDirIfMissing(env_, ret[i]));
Why not a simpler CreateDir?


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 90: TAG_FLAG(fs_lock_data_dirs, unsafe);
Also tag as 'evolving', for completeness.


Line 272: const char* kInvalidPath = "";
Where is this used?


PS6, Line 288:   if (metric_entity) {
             :     metrics_.reset(new DataDirMetrics(metric_entity));
             :   }
You changed metric_entity to be pass-by-cref because it's not being moved here, 
right? How about adding an std::move here  instead?


Line 334:   data_root_map_.swap(data_root_map);
Maybe write to the member at the end of the function, after all the 
intermediate state has been assembled?


Line 340:     if (!canonicalized_data_fs_root.empty()) {
When is this empty?


Line 372:     // Ensure the data dirs exist and create the instance files.
IMHO this comment was better placed just before the for loop, so it's clear 
that it applies to the _entirety_ of the loop rather than the first few lines.


Line 375:     // In test, this is the input path, IRL, this is the paths with 
kDataDirName
Please reword; not exactly sure what "input path" means, this should be more 
specific regarding kDataDirName (i.e. paths with it _appended_), and should say 
"In production" rather than IRL.


Line 526:   group_by_tablet_map_.clear();
Isn't this empty at Open() time anyway? Why bother?


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS6, Line 195:   static const int kFlagCreateTestHolePunch = 0x1;
             :   static const int kFlagCreateFsync = 0x2;
These were only used in Create(), so they needn't exist at all anymore.


Line 198:   static const char* kDataDirName;
Why does this need to be public? Doc it.


Line 220:                  AccessMode mode = AccessMode::READ_WRITE);
Can we get by without the default value? There aren't that many constructor 
calls, are there?


Line 240:   // max allowed, or if 'mode' is MANDATORY and locks could not be 
taken.
'mode' refers to a parameter that no longer exists.


PS6, Line 229:   // Initializes, sanitizes, and canonicalizes the directories.
             :   Status Init();
             : 
             :   // Initializes the data directories on disk.
             :   //
             :   // Returns an error if initialized directories already exist.
             :   Status Create();
             : 
             :   // Opens existing data roots from disk and indexes the files 
found.
             :   //
             :   // Returns an error if the number of on-disk directories found 
exceeds the
             :   // max allowed, or if 'mode' is MANDATORY and locks could not 
be taken.
             :   Status Open();
Normally we try to provide at most one, sometimes two functions with which to 
construct an object:
1. Just one: public constructor. For simple objects that can be either stack or 
heap allocated. Initialization cannot fail (since constructors do not return 
anything and our style guide forbids throwing exceptions).
2. Two: constructor and separate initializer. Sometimes the constructor is 
private, in which case the initializer is static and returns a heap allocated 
object. The initializer usually returns Status to indicate failures. We use 
this pattern ("two-phase initialization") when something in initialization may 
fail and we want to expose that.

In rare cases, we apply pattern #2 with two different branches: one branch for 
creating something new on disk, and another for loading an existing something 
from disk.

All this is to say: a public constructor, Init(), Create(), and Open() seems 
like overkill for DataDirManager. Given that it's a singleton in most cases, 
you should be able to whittle this down to two public methods: CreateNew() and 
OpenExisting(). You can make the constructor private.


PS6, Line 312:   std::vector<std::string> GetDataRoots() const;
             : 
             :   // Return a list of data directory names.
             :   std::vector<std::string> GetDataRootDirs() const;
How about "GetDataRoots" and "GetCanonicalizedDataRoots" to make the 
distinction more clear?


PS6, Line 371:   // Input directories verbatim from the user input fs_data_dirs.
             :   const std::vector<std::string> data_fs_roots_;
Style nit: members set by the constructor (including const members) should 
precede other members. That is, this should join mode_, env_, and 
block_manager_type_.


PS6, Line 380: fs_data_dirs
Nit: let's not refer to this gflag by name (since a DataDirManager can be 
constructed with an arbitrary value for 'data_roots'). Just refer back to the 
verbatim representation provided at construction time.


Line 382:   RootMap data_root_map_;
When reading through the code, it was tough to differentiate between 
data_fs_roots_ and data_root_map_, especially with respect to canonicalization. 
Perhaps change this to "canonicalized_data_fs_root_map_"?

Alternatively, use an std::map (which retains insertion order I believe) and do 
away with canonicalized_data_fs_roots_ altogether.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

PS6, Line 686:   uint16_t uuid_idx = 
internal::FileBlockLocation::GetDataDirIdx(block_id);
             :   if (PREDICT_FALSE(dd_manager_->IsDataDirFailed(uuid_idx))) {
             :     return Status::OK();
             :   }
What's this doing here?


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

Line 72:   static std::string BlockManagerType();
Where is this used?


PS6, Line 74: 'env', 'dd_manager', and 'error_manager'
Nit: maybe condense with something like "all objects passed by pointer"


Line 121:   // Manages and owns all of the block manager's data directories.
Should probably be reworded given the new relationship and responsibility 
structure.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

PS6, Line 185:     // Sanitize and canonicalize the wal root.
             :     string canonicalized;
             :     RETURN_NOT_OK_PREPEND(FsManager::SanitizePath(wal_fs_root_),
             :         "Failed to sanitize Write-Ahead-Log directory 
(fs_wal_dir)");
             :     
RETURN_NOT_OK_PREPEND(env_->Canonicalize(DirName(wal_fs_root_), &canonicalized),
             :         Substitute("Failed to canonicalize $0", 
DirName(wal_fs_root_)));
             :     canonicalized_wal_fs_root_ = JoinPathSegments(canonicalized, 
BaseName(wal_fs_root_));
Is this the same as the logic in data_dirs.cc? Maybe it can all be incorporated 
into FsManager::SanitizePath?


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 217:   static Status SanitizePath(const std::string& path);
Needs doc.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

PS6, Line 89:     // The directory manager must outlive the block manager. 
Destroy the block
            :     // manager first to enforce this.
This seems like a misplaced comment; there's no destroying of either manager 
here.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 168:   static std::string BlockManagerType();
Where is this used?


PS6, Line 169: 'env', 'dd_manager', and 'error_manager' 
Nit: see file_block_manager.h.


Line 361:   // Manages and owns all of the block manager's data directories.
Should probably be reworded given the new relationship and responsibility 
structure.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/util/path_util.cc
File src/kudu/util/path_util.cc:

PS6, Line 56:   std::vector<std::string> out;
            :   for (const std::string& path : v) {
Remove std:: prefixes.


-- 
To view, visit http://gerrit.cloudera.org:8080/7602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to