Todd Lipcon has posted comments on this change. Change subject: Add basic Hive MetaStore client ......................................................................
Patch Set 6: (22 comments) http://gerrit.cloudera.org:8080/#/c/7053/6//COMMIT_MSG Commit Message: Line 9: This patch lays the groundwork for a Hive metastore client. any progress on the design doc for this work? Would be good to be able to refer to it side-by-side with the code. (even though this is just prep work and doesn't implement much in the way of a design, I think the context would be useful for people following along) http://gerrit.cloudera.org:8080/#/c/7053/6/CMakeLists.txt File CMakeLists.txt: Line 829: find_package(JavaHome REQUIRED) do we need to update any build docs to specify that java is now a requirement even if you don't plan to build the java client? http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindJavaHome.cmake File cmake_modules/FindJavaHome.cmake: Line 64: set(JAVA_HOME_FOUND true) can we do something here to check that EXISTS ${JAVA_HOME}/bin/java just in case the user has set some garbage value? http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindThrift.cmake File cmake_modules/FindThrift.cmake: PS6, Line 105: # TODO(dan): Add the fb303 files manually. This is a complete hack. : #list(APPEND ${SRCS} "${THRIFT_CC_OUT}") : #list(APPEND ${HDRS} "${THRIFT_H_OUT}") : hrm, could we at least add this as an optional argument like REQUIRE_FB303? http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/CMakeLists.txt File src/kudu/hms/CMakeLists.txt: Line 23: ADD_EXPORTABLE_LIBRARY(hms_thrift since the client won't use hms at all, I think we can use the built-in ADD_LIBRARY instead Line 35: ADD_EXPORTABLE_LIBRARY(hms same http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore-test.cc File src/kudu/hms/hive_metastore-test.cc: Line 53: << "Databases: " << JoinStrings(databases, ", "); I think if you include stl_logging.h then you don't need to do this << part - gtest should be able to automatically log the vector<string> Line 61: ASSERT_EQ(vector<string>({ "default", "my_db" }), databases) is there a gaurantee that they come back in sorted order or is a std::sort() above a good idea? Line 62: << "Databases: " << JoinStrings(databases, ", "); same http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.cc File src/kudu/hms/hive_metastore.cc: PS6, Line 53: "failed to " compared to the other macros we have like RETURN_NOT_OK_PREPEND, it's odd to see "failed to" here instead of in the actual usage sites PS6, Line 64: boost::shared_ptr<TSocket> socket(new TSocket(hms_address.host(), hms_address.port())); : boost::shared_ptr<TTransport> transport(new TBufferedTransport(socket)); : boost::shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport)); : client_ = Apache::Hadoop::Hive::ThriftHiveMetastoreClient(protocol); Do any of these ctors potentially throw? (aside from OOM type cases which we dont care about) http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.h File src/kudu/hms/hive_metastore.h: Line 40: // Starts the Hive MetaStore client. can you specify what this is doing under the hood a little bit more? eg does this actually resolve and connect? (so the caller knows whether to expect (a) it might be slow and (b) it might fail if the HMS is down) PS6, Line 51: Status CreateDatabase(const Apache::Hadoop::Hive::Database& db); I think GSG discourages namespace aliases, but maybe we should make an exception here and add something like: 'namespace hive = Apache::Hadoop::Hive' in this header file? It might get a bit tedious to retype this long namespace http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/mini_hms.cc File src/kudu/hms/mini_hms.cc: Line 65: home_dir = env; nit: extra space Line 91: LOG(INFO) << "db_path: " << db_path; vlog? PS6, Line 114: Kerberos copy pasta http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: PS6, Line 760: We call lsof in a loop since it typically takes a long : // time for it to initialize and bind a port. this was a bit specific to the KDC case, I think could be cleaned up a bit. http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.h File src/kudu/util/subprocess.h: PS6, Line 160: bind to listening TCP por "bind to ANY listening..."? PS6, Line 205: path the typo PS6, Line 206: // locations as well as the environment path. you mean the $PATH environment variable? http://gerrit.cloudera.org:8080/#/c/7053/6/thirdparty/LICENSE.txt File thirdparty/LICENSE.txt: Line 620: Source: http://www.boost.org/ wrong paste? Line 658: Source: http://www.python.org/ same -- To view, visit http://gerrit.cloudera.org:8080/7053 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes