Dan Burkert has posted comments on this change.

Change subject: Add basic Hive MetaStore client
......................................................................


Patch Set 7:

(69 comments)

http://gerrit.cloudera.org:8080/#/c/7053/6//COMMIT_MSG
Commit Message:

Line 9: This patch lays the groundwork for a Hive metastore client.
> any progress on the design doc for this work? Would be good to be able to r
Not yet, but I plan to delay landing this until the design work is more 
concrete, so I'll make sure to add a link then.


http://gerrit.cloudera.org:8080/#/c/7053/6/CMakeLists.txt
File CMakeLists.txt:

Line 829:   # We rely on the JVM for running the HMS during tests.
> do we need to update any build docs to specify that java is now a requireme
I'm not sure.  I've changed this so that these dependencies are only required 
if !NO_TESTS.   So our options are to add Java as a dependency when building 
from source, or to add NO_TESTS to the from-source build instructions.  Not 
sure which I prefer, I'll try and take an informal poll.


http://gerrit.cloudera.org:8080/#/c/7053/5/build-support/dist_test.py
File build-support/dist_test.py:

Line 82:      # Hive MetaStore tests require hadoop and hive libraries. These 
files are
> These directories will be recursively copied? Or are these just symlinks? C
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/build-support/run_dist_test.py
File build-support/run_dist_test.py:

Line 127:   # are used in mini_hms.cc.
> Can you use ROOT to derive the paths instead of relying on the cwd?
Done


Line 129:   env['HADOOP_HOME'] = glob.glob(os.path.join(ROOT, 
"thirdparty/src/hadoop-*"))[0]
> Can we run detect-java-home.sh from bigtop instead? This seems rather fragi
This is how other dist-test projects do it: 
https://github.com/cloudera/dist_test/blob/master/grind/python/disttest/isolate.py#L114


http://gerrit.cloudera.org:8080/#/c/7053/5/cmake_modules/FindJavaHome.cmake
File cmake_modules/FindJavaHome.cmake:

Line 19: # 
https://github.com/apache/bigtop/blob/38e1571b2f73bbfa6ab0c01a689fae967b8399d9/bigtop-packages/src/common/bigtop-utils/bigtop-detect-javahome
> If this is a specific version for a reason, please explain why. IIRC, it's 
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindJavaHome.cmake
File cmake_modules/FindJavaHome.cmake:

Line 64:   set(JAVA_HOME $ENV{JAVA_HOME})
> can we do something here to check that EXISTS ${JAVA_HOME}/bin/java just in
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/cmake_modules/FindThrift.cmake
File cmake_modules/FindThrift.cmake:

PS5, Line 106:     list(APPEND ${SRCS} "${THRIFT_CC_OUT}" "fb303_types.cpp" 
"fb303_constants.cpp" "FacebookService.cpp")
             :     list(APPEND ${HDRS} "${THRIFT_H_OUT}" "
> Remove this?
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindThrift.cmake
File cmake_modules/FindThrift.cmake:

PS6, Line 105:     # TODO(dan): Add the fb303 files manually. This is a 
complete hack.
             :     list(APPEND ${SRCS} "${THRIFT_CC_OUT}" "fb303_types.cpp" 
"fb303_constants.cpp" "FacebookService.cpp")
             :     list(APPEND ${HDRS} "${THRIFT_H_OUT}" "fb303_types.h" 
"fb303_constants.h" "FacebookService.h")
             : 
> hrm, could we at least add this as an optional argument like REQUIRE_FB303?
Yah perhaps.  Right now we only have one consumer, so I'd like to punt on this 
(I already find this script complex).


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/CMakeLists.txt
File src/kudu/hms/CMakeLists.txt:

PS5, Line 21: 
            : add_libra
> This is only used in one place, just inline it there?
Done


Line 23: target_link_libraries(hms_thrift thrift)
> Is hms_thrift going to be linked into the Kudu client library? If not, it d
Done


Line 35: 
> Likewise.
Done


Line 44:   execute_process(COMMAND ln -sf
> Nit: use lowercase for cmake built-ins (i.e. add_custom_target(...)).
Done


Line 49:                   "${EXECUTABLE_OUTPUT_PATH}/java-home")
> So this means the symlinks will be recreated on every invocation of "make" 
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/CMakeLists.txt
File src/kudu/hms/CMakeLists.txt:

Line 23: target_link_libraries(hms_thrift thrift)
> since the client won't use hms at all, I think we can use the built-in ADD_
Done


Line 35: 
> same
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore-test.cc
File src/kudu/hms/hive_metastore-test.cc:

Line 23
> But we're not explicitly including boost here. Is this from ThriftHiveMetas
Then it would be leaking into non-test code.  It's also harder because it's a 
generated header.


Line 42
> Also test that Stop() works?
Done


Line 52
> Surprised the compiler needs the vector<string> part and isn't able to figu
It doesn't compile without it, not sure why.


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore-test.cc
File src/kudu/hms/hive_metastore-test.cc:

Line 53
> I think if you include stl_logging.h then you don't need to do this << part
Done


Line 61
> is there a gaurantee that they come back in sorted order or is a std::sort(
Done


Line 62
> same
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore.cc
File src/kudu/hms/hive_metastore.cc:

Line 49
> Are these all of the exception classes that may be thrown? Is there some wa
TException is something of a catch-all case, although it's not unexepected (IO 
errors fall under TException).   My understanding is that will catch anything 
thrown from thrift.


Line 63
> So client_ is actually a pointer? Is the pointer hidden behind a typedef or
No I think this is just a ctor argument (boost::shared_ptr<TProtocol>).


Line 71
> Maybe WARN_NOT_OK() instead? I think that's the convention we use.
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.cc
File src/kudu/hms/hive_metastore.cc:

PS6, Line 53: 
> compared to the other macros we have like RETURN_NOT_OK_PREPEND, it's odd t
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore.h
File src/kudu/hms/hive_metastore.h:

Line 33
> Maybe HiveMetastoreClient then? Or HMSClient? HiveMetastore suggests the ac
Done


Line 39
> What happens if Start or Stop are called multiple times, or if Stop is call
They can be called multiple times, but the bound port will change.  I don't 
think it's worth worrying about - we can cross that bridge if we need fault 
tests.


Line 51
> Forward declare and maybe avoid some includes?
Done


Line 57
> Guess we can't forward declare this, so maybe forward declaring the above i
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.h
File src/kudu/hms/hive_metastore.h:

PS6, Line 51: 
> I think GSG discourages namespace aliases, but maybe we should make an exce
SGTM.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/mini_hms-test.cc
File src/kudu/hms/mini_hms-test.cc:

Line 24
> MiniHmsTest
Done


Line 26
> Shouldn't we also test some sort of RPC?
I'm just going to get rid of this test since it's duplicated by the 
hms_client-test entirely.  I originally wrote it because I created mini_hms 
before hms_client.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/mini_hms.cc
File src/kudu/hms/mini_hms.cc:

Line 20: #include <csignal>
> Don't need?
It's needed for SIGKILL.


Line 27: #include <glog/logging.h>
> Don't need?
Done


Line 28: 
> Don't need?
Done


Line 33: #include "kudu/util/status.h"
> Don't need?
needed for SCOPED_LOG_SLOW_EXECUTION


Line 47:     VLOG(1) << "Stopping HMS";
> WARN_NOT_OK() here?
Done


Line 60:   const char* env = std::getenv(env_var.c_str());
> Use a ternary?
Done


Line 63:   if (!Env::Default()->FileExists(*home_dir)) {
> Extra space here.
Done


Line 65:   }
> Shouldn't we gracefully return this sort of error via Status?
First part is done.  Second part - I don't really think that's worth the 
trouble, hive should complain about it appropriately.


Line 72:   SCOPED_LOG_SLOW_EXECUTION(WARNING, 20000, "Starting HMS");
> Remove? Or uncomment?
Done


Line 77:   Env* env = Env::Default();
> How about asking users to supply this in the class constructor?
Why?  This isn't how mini_kdc works.


Line 106:   schematool.SetCurrentDir(tmp_dir);
> Nit: do you actually need to initialize the value? Would prefer to adhere t
I see you like to tempt the UB demons.


Line 109:   int rc;
> Maybe include the exit status? Should use GetExitStatus() anyway.
I'm skeptical that anyone can glean information from the exit status of 
schemaTool.


Line 112:     return Status::RuntimeError("schematool failed");
> Kerberos HMS?
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/mini_hms.cc
File src/kudu/hms/mini_hms.cc:

Line 65:   }
> nit: extra space
Done


Line 91:   auto db_path = JoinPathSegments(tmp_dir, "metastore_db");
> vlog?
Done


PS6, Line 114: 
> copy pasta
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/security/test/mini_kdc.cc
File src/kudu/security/test/mini_kdc.cc:

Line 151:   RETURN_NOT_OK(WaitForUdpBind(kdc_process_->pid(), &options_.port, 
MonoDelta::FromSeconds(1)));
> This seems like a rather short timeout. The old code was even more aggressi
It hasn't seemed to be a problem.  The kdc should be fast.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/subprocess.cc
File src/kudu/util/subprocess.cc:

Line 749
> Nit: remove unnecessary this prefixes, here and below.
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.cc
File src/kudu/util/subprocess.cc:

PS6, Line 760: 
             : 
> this was a bit specific to the KDC case, I think could be cleaned up a bit.
I don't think it's specific to the KDC, it's going to take some time for any 
process to start up.  The HMS, for instance, takes on the  order of 5 seconds.


http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/subprocess.h
File src/kudu/util/subprocess.h:

Line 205
> "to the specified executable"
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.h
File src/kudu/util/subprocess.h:

PS6, Line 160: t { return argv_[0]; }
> "bind to ANY listening..."?
Done


PS6, Line 205: 
> typo
Done


PS6, Line 206: 
> you mean the $PATH environment variable?
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/LICENSE.txt
File thirdparty/LICENSE.txt:

Line 620: Source: https://thrift.apache.org
> Copy/pasta
Done


Line 658: Source: https://www.gnu.org/software/bison
> Copy/pasta?
Done


http://gerrit.cloudera.org:8080/#/c/7053/6/thirdparty/LICENSE.txt
File thirdparty/LICENSE.txt:

Line 620: Source: https://thrift.apache.org
> wrong paste?
Done


Line 658: Source: https://www.gnu.org/software/bison
> same
Done


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

Line 501:     CPPFLAGS="$EXTRA_CPPFLAGS $OPENSSL_CFLAGS" \
> Isn't this wrong though? Curl is a C program so I'd expect CXXFLAGS has no 
Done


Line 671:   ./bootstrap.sh --prefix=$PREFIX threading=multi 
--with-libraries=date_time
> Is this actually needed? I don't see a corresponding change to FindKuduBoos
Done


PS5, Line 698: 
> Why do we need these two additions? I can take my answer in the form of a c
Done


Line 732:     LDFLAGS="$EXTRA_LDFLAGS" \
> Should be cp -f, right? Otherwise will it overwrite?
We don't use -f anywhere else in this file.  Could you elaborate on why it's 
needed?


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/download-thirdparty.sh
File thirdparty/download-thirdparty.sh:

Line 323: fi
> No autoreconf?
autoreconf is failing on OS X.


Line 331: fi
> No autoreconf?
Yah, it breaks on OS X, and this is a build-only dependency.


Line 342
> Perhaps we should follow our LLVM tarball's lead and remove it a priori, to
We don't do that for LLVM.  
https://github.com/apache/kudu/blob/master/thirdparty/package-llvm.sh


http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/vars.sh
File thirdparty/vars.sh:

Line 180: BISON_VERSION=3.0.4
> Why use this version of bison? The latest is 3.0.4 and it's quite old alrea
Done


Line 185: HIVE_NAME=apache-hive-$HIVE_VERSION-bin
> Why not just hive-$HIVE_VERSION?
Because they name their tarball contents in different way.


-- 
To view, visit http://gerrit.cloudera.org:8080/7053
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to