Todd Lipcon has posted comments on this change.

Change subject: Add basic Hive MetaStore client
......................................................................


Patch Set 6:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/7053/6//COMMIT_MSG
Commit Message:

Line 9: This patch lays the groundwork for a Hive metastore client.
any progress on the design doc for this work? Would be good to be able to refer 
to it side-by-side with the code. (even though this is just prep work and 
doesn't implement much in the way of a design, I think the context would be 
useful for people following along)


http://gerrit.cloudera.org:8080/#/c/7053/6/CMakeLists.txt
File CMakeLists.txt:

Line 829: find_package(JavaHome REQUIRED)
do we need to update any build docs to specify that java is now a requirement 
even if you don't plan to build the java client?


http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindJavaHome.cmake
File cmake_modules/FindJavaHome.cmake:

Line 64:   set(JAVA_HOME_FOUND true)
can we do something here to check that EXISTS ${JAVA_HOME}/bin/java just in 
case the user has set some garbage value?


http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindThrift.cmake
File cmake_modules/FindThrift.cmake:

PS6, Line 105:     # TODO(dan): Add the fb303 files manually. This is a 
complete hack.
             :     #list(APPEND ${SRCS} "${THRIFT_CC_OUT}")
             :     #list(APPEND ${HDRS} "${THRIFT_H_OUT}")
             :  
hrm, could we at least add this as an optional argument like REQUIRE_FB303?


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/CMakeLists.txt
File src/kudu/hms/CMakeLists.txt:

Line 23: ADD_EXPORTABLE_LIBRARY(hms_thrift
since the client won't use hms at all, I think we can use the built-in 
ADD_LIBRARY instead


Line 35: ADD_EXPORTABLE_LIBRARY(hms
same


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore-test.cc
File src/kudu/hms/hive_metastore-test.cc:

Line 53:       << "Databases: " << JoinStrings(databases, ", ");
I think if you include stl_logging.h then you don't need to do this << part - 
gtest should be able to automatically log the vector<string>


Line 61:   ASSERT_EQ(vector<string>({ "default", "my_db" }), databases)
is there a gaurantee that they come back in sorted order or is a std::sort() 
above a good idea?


Line 62:       << "Databases: " << JoinStrings(databases, ", ");
same


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.cc
File src/kudu/hms/hive_metastore.cc:

PS6, Line 53: "failed to "
compared to the other macros we have like RETURN_NOT_OK_PREPEND, it's odd to 
see "failed to" here instead of in the actual usage sites


PS6, Line 64:   boost::shared_ptr<TSocket> socket(new 
TSocket(hms_address.host(), hms_address.port()));
            :   boost::shared_ptr<TTransport> transport(new 
TBufferedTransport(socket));
            :   boost::shared_ptr<TProtocol> protocol(new 
TBinaryProtocol(transport));
            :   client_ = 
Apache::Hadoop::Hive::ThriftHiveMetastoreClient(protocol);
Do any of these ctors potentially throw? (aside from OOM type cases which we 
dont care about)


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.h
File src/kudu/hms/hive_metastore.h:

Line 40:   // Starts the Hive MetaStore client.
can you specify what this is doing under the hood a little bit more? eg does 
this actually resolve and connect? (so the caller knows whether to expect (a) 
it might be slow and (b) it might fail if the HMS is down)


PS6, Line 51:   Status CreateDatabase(const Apache::Hadoop::Hive::Database& db);
I think GSG discourages namespace aliases, but maybe we should make an 
exception here and add something like: 'namespace hive = Apache::Hadoop::Hive' 
in this header file? It might get a bit tedious to retype this long namespace


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/mini_hms.cc
File src/kudu/hms/mini_hms.cc:

Line 65:     home_dir =  env;
nit: extra space


Line 91:   LOG(INFO) << "db_path: " << db_path;
vlog?


PS6, Line 114: Kerberos
copy pasta


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.cc
File src/kudu/util/subprocess.cc:

PS6, Line 760: We call lsof in a loop since it typically takes a long
             :   // time for it to initialize and bind a port.
this was a bit specific to the KDC case, I think could be cleaned up a bit.


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.h
File src/kudu/util/subprocess.h:

PS6, Line 160: bind to listening TCP por
"bind to ANY listening..."?


PS6, Line 205: path the
typo


PS6, Line 206: // locations as well as the environment path.
you mean the $PATH environment variable?


http://gerrit.cloudera.org:8080/#/c/7053/6/thirdparty/LICENSE.txt
File thirdparty/LICENSE.txt:

Line 620: Source: http://www.boost.org/
wrong paste?


Line 658: Source: http://www.python.org/
same


-- 
To view, visit http://gerrit.cloudera.org:8080/7053
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to