Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8312 )
Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration ...................................................................... Patch Set 3: (11 comments) curious about timeouts - I dont see any timeouts set here but I'm guessing they're necessary (what if you kill -STOP the minihms, what happens to our calls?) http://gerrit.cloudera.org:8080/#/c/8312/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8312/3//COMMIT_MSG@23 PS3, Line 23: master_hms-itest before this is committed: any plan for fault tests? eg crashing either the HMS or the master at some point in the middle of these sequences? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/catalog_manager.cc@231 PS3, Line 231: "Addresses of Hive Metastore"); a bit more docs here, eg what the multiple addrs mean, etc. Also does the HMS expose more than one port? (eg an HTTP interface?) If so we should clarify that it's the Thrift addr http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/catalog_manager.cc@1419 PS3, Line 1419: SetupError(resp->mutable_error(), MasterErrorPB::HIVE_METASTORE_ERROR, s); I think it's worth logging such errors as well, since it may be an end user that gets the response, and then they'll probably complain to the admin who will look in the logs for more details. We should probably be especially verbose in these logs because some failures at this point can cause orphaned pointers in the HMS (eg a timeout may be thrown even if the op succeeded) (same below) http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc File src/kudu/master/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@88 PS3, Line 88: const Schema& schema) { worth a CHECK(running_) in these functions? otherwise it seems like they would just hang forever http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@198 PS3, Line 198: "Kudu table name must contain only lower-case ASCII characters, " no digits allowed? that's surprising http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@203 PS3, Line 203: for (int idx = 0; idx < table.size(); idx++) { I feel like this code could be written more concisely using some of the stuff from gutil/strings/util.h. For example, you could use Split() on '.' and then ensure that it has 1 or 2 results, and IsIdentifier() returns true for all parts. Or use AdvanceIdentifier up to twice? Or match the whole string against a class that includes '.' and then use strcount('.') to see that there is at most one dot? I think the 'split' is probably most straight forward http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@228 PS3, Line 228: for (const HostPort& address : addresses_) { I'd think we might need to store an index in the class so that we swap to the new HMS when one has an issue. Otherwise what if an HMS is "up" (ie accepts thrift connections) but times out all calls, or "up" but for some reason is unable to talk to its backing RDBMS or whatever? I'd guess we'd want to reconnect to the other one. http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@289 PS3, Line 289: cond_.TimedWait(MonoDelta::FromMilliseconds(wait)); worth some logging here? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@304 PS3, Line 304: Status s = rpc.run(&client_.get()); I think wrapping this in a TRACE_EVENT scope and a LOG_IF_SLOW and such would be helpful. Perhaps a TRACE message and/or a TRACE_COUNTER_INCREMENT too to keep track of how much time is spent in calls to HMS for a given CreateTable call. That will also require propagating the current Trace object into the 'Rpc' object. If you'd rather defer this work please add a TODO. I'm happy to do it if you want to assign me the todo http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@308 PS3, Line 308: client_ = boost::none; is there a more explicit Shutdown we should do on the client? or does calling the dtor like this suffice? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/util/net/net_util.h File src/kudu/util/net/net_util.h: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/util/net/net_util.h@133 PS3, Line 133: bool ValidateAddressListFlag(const char* flag_name, const std::string& addr_list); doc this? specifically I think this is meant to be used as a gflag VALIDATOR and thus the somewhat odd signature -- To view, visit http://gerrit.cloudera.org:8080/8312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf Gerrit-Change-Number: 8312 Gerrit-PatchSet: 3 Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 24 Oct 2017 21:20:47 +0000 Gerrit-HasComments: Yes