Andrew Wong has posted comments on this change.

Change subject: separate DataDirManager from BlockManagers
......................................................................


Patch Set 6:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/7602/6//COMMIT_MSG
Commit Message:

PS6, Line 23: directory
> directly
Done


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS6, Line 205:   // Pass in a report to prevent the block manager from logging
             :   // unnecessarily.
             :   FsReport report;
> Nit: move this down to just below Open().
Done


Line 691:     ASSERT_OK(env_util::CreateDirIfMissing(this->env_, paths[i]));
> Why not a simple CreateDir() here and below? Under what circumstances would
These shouldn't have been created so it should be safe to just create.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

PS6, Line 260:   // Exposes the underlying DataDirManager.
             :   virtual DataDirManager* dd_manager() = 0;
> Why does this still exist? Shouldn't callers outside the block managers get
Had been used in tests, but those tests can use the dd_manager directly; 
removed.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

Line 65:     LOG(INFO) << GetDirNames(kNumDirs)[0];
> Still want this?
Done


Line 77:       CHECK_OK(env_util::CreateDirIfMissing(env_, ret[i]));
> Why not a simpler CreateDir?
Done


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 90: TAG_FLAG(fs_lock_data_dirs, unsafe);
> Also tag as 'evolving', for completeness.
Done


Line 272: const char* kInvalidPath = "";
> Where is this used?
Future patch. Eventually we'll need something stored when the directories fail 
to canonicalize. This isn't it, but it was a precursor to what's implemented 
later on. Removed.


PS6, Line 288:   if (metric_entity) {
             :     metrics_.reset(new DataDirMetrics(metric_entity));
             :   }
> You changed metric_entity to be pass-by-cref because it's not being moved h
Done


Line 334:   data_root_map_.swap(data_root_map);
> Maybe write to the member at the end of the function, after all the interme
Done


Line 340:     if (!canonicalized_data_fs_root.empty()) {
> When is this empty?
Here and elsewhere you saw traces of an implementation where things that failed 
to canonicalize were given an invalid "" path name. This has changed, and this 
should be removed (particularly since this patch is independent of disk 
failures).


Line 372:     // Ensure the data dirs exist and create the instance files.
> IMHO this comment was better placed just before the for loop, so it's clear
Done


Line 375:     // In test, this is the input path, IRL, this is the paths with 
kDataDirName
> Please reword; not exactly sure what "input path" means, this should be mor
Yeah, this is a pretty dumb comment and shouldn't have been in this patch.


Line 526:   group_by_tablet_map_.clear();
> Isn't this empty at Open() time anyway? Why bother?
In tests, I found it convenient to have calls to Open() be idempotent, i.e. 
calling Open() multiple times would create leave the directory manager in the 
same state each time. Given the reset behavior of Init(), this should always be 
the case even without this.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS6, Line 195:   static const int kFlagCreateTestHolePunch = 0x1;
             :   static const int kFlagCreateFsync = 0x2;
> These were only used in Create(), so they needn't exist at all anymore.
Done


Line 198:   static const char* kDataDirName;
> Why does this need to be public? Doc it.
Was only used publicly in tests. Made private.


Line 220:                  AccessMode mode = AccessMode::READ_WRITE);
> Can we get by without the default value? There aren't that many constructor
Done.

It's used in a handful of tests, but we needn't make that an issue in 
production code.


Line 240:   // max allowed, or if 'mode' is MANDATORY and locks could not be 
taken.
> 'mode' refers to a parameter that no longer exists.
Done


PS6, Line 229:   // Initializes, sanitizes, and canonicalizes the directories.
             :   Status Init();
             : 
             :   // Initializes the data directories on disk.
             :   //
             :   // Returns an error if initialized directories already exist.
             :   Status Create();
             : 
             :   // Opens existing data roots from disk and indexes the files 
found.
             :   //
             :   // Returns an error if the number of on-disk directories found 
exceeds the
             :   // max allowed, or if 'mode' is MANDATORY and locks could not 
be taken.
             :   Status Open();
> Normally we try to provide at most one, sometimes two functions with which 
I whittled it down to Init(), Create(), and Open(). The logic in Init() 
(canonicalization of directories) must always be run first, so Init() is really 
just a replacement for the public constructor.


PS6, Line 312:   std::vector<std::string> GetDataRoots() const;
             : 
             :   // Return a list of data directory names.
             :   std::vector<std::string> GetDataRootDirs() const;
> How about "GetDataRoots" and "GetCanonicalizedDataRoots" to make the distin
The difference here isn't the canonicalization, but rather that GetDataRoots() 
returns the roots (i.e. those missing the kDataDirName suffix). I've changed it 
to GetDataDirs() and GetDataRoots().


PS6, Line 371:   // Input directories verbatim from the user input fs_data_dirs.
             :   const std::vector<std::string> data_fs_roots_;
> Style nit: members set by the constructor (including const members) should 
Done


PS6, Line 380: fs_data_dirs
> Nit: let's not refer to this gflag by name (since a DataDirManager can be c
Removed this map since it's not unnecessary.


Line 382:   RootMap data_root_map_;
> When reading through the code, it was tough to differentiate between data_f
That's fair. The only non-canonicalized strings are data_fs_roots_, which I'm 
taking out of the DataDirManager. The original intent was to give the directory 
manager access to the user-input directories so it could canonicalize them and 
keep track of failed canonicalization. This isn't particularly necessary given 
my next patch, which just prepends failed directories with a non-sanitized 
string.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

PS6, Line 686:   uint16_t uuid_idx = 
internal::FileBlockLocation::GetDataDirIdx(block_id);
             :   if (PREDICT_FALSE(dd_manager_->IsDataDirFailed(uuid_idx))) {
             :     return Status::OK();
             :   }
> What's this doing here?
Done


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

Line 72:   static std::string BlockManagerType();
> Where is this used?
It's used in parameterized block manager tests to create the dd_manager. I've 
since changed the implementation so this is unnecessary.


PS6, Line 74: 'env', 'dd_manager', and 'error_manager'
> Nit: maybe condense with something like "all objects passed by pointer"
Done


Line 121:   // Manages and owns all of the block manager's data directories.
> Should probably be reworded given the new relationship and responsibility s
Done


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

PS6, Line 185:     // Sanitize and canonicalize the wal root.
             :     string canonicalized;
             :     RETURN_NOT_OK_PREPEND(FsManager::SanitizePath(wal_fs_root_),
             :         "Failed to sanitize Write-Ahead-Log directory 
(fs_wal_dir)");
             :     
RETURN_NOT_OK_PREPEND(env_->Canonicalize(DirName(wal_fs_root_), &canonicalized),
             :         Substitute("Failed to canonicalize $0", 
DirName(wal_fs_root_)));
             :     canonicalized_wal_fs_root_ = JoinPathSegments(canonicalized, 
BaseName(wal_fs_root_));
> Is this the same as the logic in data_dirs.cc? Maybe it can all be incorpor
I'd wanted to keep the logic of SanitizePath separate since it's inherently 
static and is useful in error checking (e.g. a faulty path can be "injected" by 
prepending with a prefix with spaces, as an easy indicator that 
canonicalization has failed).

Regardless, having a full Canonicalize method seems reasonable.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 217:   static Status SanitizePath(const std::string& path);
> Needs doc.
Done


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

PS6, Line 89:     // The directory manager must outlive the block manager. 
Destroy the block
            :     // manager first to enforce this.
> This seems like a misplaced comment; there's no destroying of either manage
You're right, belongs in ReopenBlockManager.


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 168:   static std::string BlockManagerType();
> Where is this used?
It's used to help certain parameterized tests that require the block name. 
There's a way to do this without anything in prod code.


PS6, Line 169: 'env', 'dd_manager', and 'error_manager' 
> Nit: see file_block_manager.h.
Done


Line 361:   // Manages and owns all of the block manager's data directories.
> Should probably be reworded given the new relationship and responsibility s
Done


http://gerrit.cloudera.org:8080/#/c/7602/6/src/kudu/util/path_util.cc
File src/kudu/util/path_util.cc:

PS6, Line 56:   std::vector<std::string> out;
            :   for (const std::string& path : v) {
> Remove std:: prefixes.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to