Adar Dembo has posted comments on this change.

Change subject: Add basic Hive MetaStore client
......................................................................


Patch Set 5:

(52 comments)

http://gerrit.cloudera.org:8080/#/c/7053/5/build-support/dist_test.py
File build-support/dist_test.py:

Line 82:      # Hive MetaStore tests require hadoop and hive libraries.
These directories will be recursively copied? Or are these just symlinks? Can 
you add to the comment?


http://gerrit.cloudera.org:8080/#/c/7053/5/build-support/run_dist_test.py
File build-support/run_dist_test.py:

Line 127:   env['HIVE_HOME'] = 
os.path.abspath(glob.glob("../../thirdparty/src/apache-hive-*-bin")[0])
Can you use ROOT to derive the paths instead of relying on the cwd?


Line 129:   env['JAVA_HOME'] = glob.glob("/usr/lib/jvm/java-1.8.0-*")[0]
Can we run detect-java-home.sh from bigtop instead? This seems rather fragile.


http://gerrit.cloudera.org:8080/#/c/7053/5/cmake_modules/FindJavaHome.cmake
File cmake_modules/FindJavaHome.cmake:

Line 19: # 
https://github.com/apache/bigtop/blob/38e1571b2f73bbfa6ab0c01a689fae967b8399d9/bigtop-packages/src/common/bigtop-utils/bigtop-detect-javahome
If this is a specific version for a reason, please explain why. IIRC, it's 
because this was the last version to support JDK7?


http://gerrit.cloudera.org:8080/#/c/7053/5/cmake_modules/FindThrift.cmake
File cmake_modules/FindThrift.cmake:

PS5, Line 106:     #list(APPEND ${SRCS} "${THRIFT_CC_OUT}")
             :     #list(APPEND ${HDRS} "${THRIFT_H_OUT}")
Remove this?


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/CMakeLists.txt
File src/kudu/hms/CMakeLists.txt:

PS5, Line 21: set(HMS_THRIFT_LIBS
            :   thrift)
This is only used in one place, just inline it there?


Line 23: ADD_EXPORTABLE_LIBRARY(hms_thrift
Is hms_thrift going to be linked into the Kudu client library? If not, it 
doesn't need to be an exportable library; add_library() and 
target_link_library() should suffice.


Line 35: ADD_EXPORTABLE_LIBRARY(hms
Likewise.


Line 44:   ADD_CUSTOM_TARGET(hive_link_target ALL
Nit: use lowercase for cmake built-ins (i.e. add_custom_target(...)).


Line 49:   ADD_CUSTOM_TARGET(hadoop_link_target ALL
So this means the symlinks will be recreated on every invocation of "make" that 
doesn't include a specific target, or an invocation of "make all".

Isn't that overkill though? Wouldn't it be sufficient to create the symlink on 
the first run of cmake? You could do that via execute_process().


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore-test.cc
File src/kudu/hms/hive_metastore-test.cc:

Line 23: // See 
https://stackoverflow.com/questions/1156003/c-namespace-collision-with-gtest-and-boost
But we're not explicitly including boost here. Is this from 
ThriftHiveMetastore.h? If so, could we add the workaround directly to the 
generated header?


Line 42: TEST_F(HiveMetastoreTest, TestCreateDatabase) {
Also test that Stop() works?


Line 52:   ASSERT_EQ(vector<string>({ "default" }), databases)
Surprised the compiler needs the vector<string> part and isn't able to figure 
out the type of the initializer list itself.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore.cc
File src/kudu/hms/hive_metastore.cc:

Line 49: #define HMS_RET_NOT_OK(call, msg) \
Are these all of the exception classes that may be thrown? Is there some way to 
specify a catch-all in case we're wrong?


Line 63:     : client_(nullptr) {
So client_ is actually a pointer? Is the pointer hidden behind a typedef or 
something?


Line 71:   CHECK_OK(Stop());
Maybe WARN_NOT_OK() instead? I think that's the convention we use.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore.h
File src/kudu/hms/hive_metastore.h:

Line 33: // A client for the Hive MetaStore.
Maybe HiveMetastoreClient then? Or HMSClient? HiveMetastore suggests the actual 
server-side component.


Line 39: 
What happens if Start or Stop are called multiple times, or if Stop is called 
before Start? Is that something we need to protect against?


Line 51:   Status CreateDatabase(const Apache::Hadoop::Hive::Database& db);
Forward declare and maybe avoid some includes?


Line 57:   Apache::Hadoop::Hive::ThriftHiveMetastoreClient client_;
Guess we can't forward declare this, so maybe forward declaring the above isn't 
useful.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/mini_hms-test.cc
File src/kudu/hms/mini_hms-test.cc:

Line 24: class MiniKdcTest : public KuduTest {};
MiniHmsTest


Line 26: TEST_F(MiniKdcTest, TestBasicOperation) {
Shouldn't we also test some sort of RPC?

Oh I see, that's what hive_metastore-test.cc is for. But, I still think we 
should do something here to verify that the HMS is indeed running properly. 
Maybe we can look for the presence of a file? Or read its contents?


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/mini_hms.cc
File src/kudu/hms/mini_hms.cc:

Line 20: #include <csignal>
Don't need?


Line 27: #include "kudu/gutil/strings/split.h"
Don't need?


Line 28: #include "kudu/gutil/strings/strip.h"
Don't need?


Line 33: #include "kudu/util/stopwatch.h"
Don't need?


Line 47:     proc->KillAndWait(SIGKILL);
WARN_NOT_OK() here?


Line 60:   if (env == nullptr) {
Use a ternary?


Line 63:     home_dir =  env;
Extra space here.


Line 65:   CHECK(Env::Default()->FileExists(home_dir)) << env_var << " not 
found at " << home_dir;
Shouldn't we gracefully return this sort of error via Status?

Also, if this is a symlink, could we make sure it's not broken? For example, if 
I upgrade my JVM so my java-home symlink is now broken, would be good to catch 
that before trying to run Hive.


Line 72:   //SCOPED_LOG_SLOW_EXECUTION(WARNING, 10000, "Starting HMS");
Remove? Or uncomment?


Line 77:   Env* env = Env::Default();
How about asking users to supply this in the class constructor?


Line 106:   int rc = 0;
Nit: do you actually need to initialize the value? Would prefer to adhere to 
the calling convention of "value is guaranteed to be set if function does not 
fail".


Line 109:     return Status::RuntimeError("schematool failed");
Maybe include the exit status? Should use GetExitStatus() anyway.


Line 112:   // Start the Kerberos HMS.
Kerberos HMS?


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/security/test/mini_kdc.cc
File src/kudu/security/test/mini_kdc.cc:

Line 151:   RETURN_NOT_OK(kdc_process_->WaitForUdpBind(&options_.port, 
MonoDelta::FromSeconds(1)));
This seems like a rather short timeout. The old code was even more aggressive; 
any reason not to wait longer (and perhaps avoid some future flakiness in doing 
so)?


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/subprocess.cc
File src/kudu/util/subprocess.cc:

Line 749:   return this->WaitForBind(port, "4TCP", timeout);
Nit: remove unnecessary this prefixes, here and below.


Line 756: Status Subprocess::WaitForBind(uint16_t* port, const char* kind, 
MonoDelta timeout) {
I don't really think this or GetExecutablePath belong in util/subprocess. They 
don't strike me as functionality we'd want in production, so test_util feels 
like a better home. AFAICT the only reason to put WaitForBind() in 
util/subprocess is to get access to the pid, but pid() is already exposed in 
the subprocess API.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/subprocess.h
File src/kudu/util/subprocess.h:

Line 205: // Attempts to find the path the specified executable, searching the 
provided
"to the specified executable"


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/test_util.cc
File src/kudu/util/test_util.cc:

Line 27: 
New includes not needed?


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/LICENSE.txt
File thirdparty/LICENSE.txt:

Line 620: Source: http://www.boost.org/
Copy/pasta


Line 658: Source: http://www.python.org/
Copy/pasta?


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

Line 501:     CXXFLAGS="$EXTRA_CPPFLAGS $OPENSSL_CFLAGS" \
Isn't this wrong though? Curl is a C program so I'd expect CXXFLAGS has no 
effect.


Line 670:   # Build the date_time boost lib.
Update/remove this.


Line 671:   ./bootstrap.sh --prefix=$PREFIX threading=multi 
--with-libraries=date_time,thread
Is this actually needed? I don't see a corresponding change to 
FindKuduBoost.cmake.

Oh, is it only for Thrift? If so, could you add a comment? Could you double 
check that it's indeed needed?


PS5, Line 698: -I$PREFIX/include -I$THRIFT_BDIR/lib/cpp/src
Why do we need these two additions? I can take my answer in the form of a code 
comment.


Line 732:   cp $THRIFT_SOURCE/contrib/fb303/if/fb303.thrift 
$PREFIX/share/fb303/if
Should be cp -f, right? Otherwise will it overwrite?


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/download-thirdparty.sh
File thirdparty/download-thirdparty.sh:

Line 323: if [ ! -d "$THRIFT_SOURCE" ]; then
No autoreconf?


Line 331: if [ ! -d "$BISON_SOURCE" ]; then
No autoreconf?


Line 342:   # Remove the share/doc dir which contains 2GiB of HTML project 
documentation.
Perhaps we should follow our LLVM tarball's lead and remove it a priori, to 
avoid downloading this unnecessary stuff in the first place?


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/vars.sh
File thirdparty/vars.sh:

Line 180: BISON_VERSION=2.7.1
Why use this version of bison? The latest is 3.0.4 and it's quite old already 
(~2 years).


Line 185: HIVE_NAME=apache-hive-$HIVE_VERSION-bin
Why not just hive-$HIVE_VERSION?


-- 
To view, visit http://gerrit.cloudera.org:8080/7053
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to