Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8312 )

Change subject: KUDU-2191 (6/n): Hive Metastore catalog manager integration
......................................................................


Patch Set 9:

(55 comments)

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc
File src/kudu/hms/hms_client.cc:

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc@193
PS8, Line 193:   // Also check that the HMS is configured to allow changing the 
type of
             :   // columns, which is required to support dropping columns.
Could you elaborate on this a bit? It's not intuitive why it needs to be legal 
to change a column's type in order to drop it.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc@202
PS8, Line 202:   if (boost::iequals(disallow_incompatible_column_type_changes, 
"true")) {
Hmm, is it really possible for the config's value to have a mixed case? Asking 
because L185 doesn't do a case-insensitive comparison when verifying that the 
required listeners are in hive.metastore.transactional.event.listeners.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc@218
PS8, Line 218: bool HmsClient::IsConnected() {
Maybe add a bit of test coverage for this new method?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/mini_hms.cc
File src/kudu/hms/mini_hms.cc:

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/mini_hms.cc@186
PS8, Line 186:   //    Configures the HMS to allow altering and dropping 
columns.
Nit: indentation is off by one character.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/mini_hms.cc@214
PS8, Line 214:     <value>jdbc:derby:$1/metadb;create=true</value>
Why the change from 'memory' back to 'directory'?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/CMakeLists.txt@87
PS8, Line 87: ADD_KUDU_TEST(master_hms-itest)
Could you run this new test with KUDU_MEASURE_TEST_CPU_CONSUMPTION=1 (see 
run-test.sh) to figure out whether to set RUN_SERIAL or to enumerate the number 
of PROCESSORS?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/integration-tests/master-stress-test.cc
File src/kudu/integration-tests/master-stress-test.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/integration-tests/master-stress-test.cc@343
PS9, Line 343:     return Substitute("default.table_$0", oid_generator_.Next());
Not necessary yet?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@122
PS8, Line 122:
Could RETURN_NOT_OK instead.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@162
PS8, Line 162:   string table_name = Substitute("$0.$1", hms_database_name, 
hms_table_name);
             :
             :   // Attempt to create the table before the database is created.
             :   Status s = CreateTable(hms_database_name, hms_table
Could be its own test (i.e. TestCreateTableWithoutDatabase)


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@190
PS8, Line 190:
             :   // Shutdown the HMS and try to create a table.
             :   ASSERT_OK(StopHms());
             :
             :   s = CreateTable(hms_database_name, "foo");
             :   ASSERT_TRUE(s.IsNetworkError()) << s.ToString();
             :
             :   // Start the HMS and try again.
             :   ASSERT_OK(StartHms());
             :   ASSERT_OK(CreateTable(hms_database_name, "foo"));
Could break out into a separate test since it only shares HMS database state 
with the previous test.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@202
PS8, Line 202:
Maybe add a test that if the HMS is down renaming fails?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@207
PS8, Line 207:
             :   // Create the database.
             :   hive::Database db;
             :   db.name = hms_database_name;
             :   ASSERT_OK(hms_client_->CreateDatabase(db));
Every test does this; consider doing it in the test fixture.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@243
PS8, Line 243:   // TODO(HIVE-18852): match on the error message.
             :
             :   // Attempt to rename the Kudu table to an invalid table name.
             :   table_alterer.reset(client_->NewTableAlterer(table_name));
             :   s = table_alterer->RenameTo(Substitute("$0.☃", 
hms_database_name))->Alter();
             :   ASSERT_TRUE(s.IsIllegalState()) << s.ToString();
Would be a good negative test for creation too, right?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@255
PS8, Line 255:   hive::EnvironmentContext env_ctx;
Why is it necessary to provide the table's ID here, on L270, and on L371, but 
not when dropping on L331?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@268
PS8, Line 268:
             :   // Drop the HMS table entry, then c
I was confused when I first read this, but now I think I understand: it's legal 
to post facto create an HMS external table for an existing Kudu table, and 
doing so causes the two to become connected such that future operations will 
affect both. Is that right? May want to elaborate the comment a bit to make 
that behavior clear.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@285
PS8, Line 285:       hms_external_table_name,
Why does this one exist? It was an external table created on L274, but didn't 
it get renamed to hms_rename_table_name on L276?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@309
PS8, Line 309:   hms_table.sd.cols.clear();
After this step could you use CheckCatalogs (or a variant thereof) to assert 
that the column entries do NOT match across HMS and Kudu?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@339
PS8, Line 339:   s = table_alterer->DropColumn("int32_val")->Alter();
Why does this fail? The non-Kudu HMS table entry has the same database and name 
as the Kudu table, so shouldn't they be "connected" to one another?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@354
PS8, Line 354:
Maybe ensure it's gone from Kudu too, just to make sure that the HMS 
integration doesn't do something silly like drop the table from HMS but not 
Kudu?


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@374
PS8, Line 374:   ASSERT_TRUE(s.IsNotFound()) << s.ToString();
Likewise, make sure it was really dropped from Kudu.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@384
PS8, Line 384:   NO_FATALS(CheckCatalogs(hms_database_name, hms_table_name));
Check that it's gone from both Kudu and HMS?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/CMakeLists.txt
File src/kudu/master/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/CMakeLists.txt@76
PS9, Line 76: ADD_KUDU_TEST(hms_catalog-test)
Likewise, please assess the perf impact of this test and annotate it 
accordingly.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@697
PS9, Line 697:     
CHECK_OK(HostPort::ParseStrings(FLAGS_hive_metastore_addresses,
Since you're inside Init, may as well convert these CHECK_OKs to RETURN_NOT_OK. 
The failure will be fatal either way; it'll just be more graceful.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@1548
PS9, Line 1548:   // Create the table in the HMS.
Can you expand this comment a bit to explain why this should happen before step 
e and not after?

This is a meaty enough operation that I'd also give it its own step letter and 
reletter the others.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@1743
PS9, Line 1743:   // Drop the HMS table entry.
As in Create, could you expand this to explain why this should happen here, and 
not after step 3?

Also, please give it its own number and renumber the others.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@1755
PS9, Line 1755:   auto abort_hms = MakeScopedCleanup([&] {
What does it mean for a ScopedCleanup to return something? It's run on scope 
exit, so where is it returning that value to exactly?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@2254
PS9, Line 2254:   if (hms_catalog_ != nullptr && has_metadata_changes) {
Add a comment explaining why we do this here and not after the catalog write, 
give it a number, renumber the rest, etc.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@2272
PS9, Line 2272:         RETURN_NOT_OK_LOG(SchemaFromPB(l.data().pb.schema(), 
&schema), WARNING,
See above comment re: returning a value from a ScopedCleanup.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog-test.cc
File src/kudu/master/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog-test.cc@38
PS9, Line 38:
Even though master_hms-test provides integration testing between CatalogManager 
and HmsCatalog, I think it'd still be worthwhile to have standalone HmsCatalog 
tests here to verify basic functionality. It'd certainly make it easier to test 
future changes to HmsCatalog.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.h
File src/kudu/master/hms_catalog.h:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.h@61
PS9, Line 61:   // Create a new table entry in the HMS.
Doc that this method fails if HMS is unreachable?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.h@66
PS9, Line 66:   // Drops a table entry from the HMS, if it exists.
Nit: it would be good to use the same tense when describing the functionality 
of each method. Here you're using present tense "[DropTable()] drops a table 
entry from the HMS..." but elsewhere you use future tense "[Start()] will start 
the HmsCatalog instance." I happen to prefer present tense (I think that's the 
Javadoc standard); could you apply it consistently in your method documentation?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.h@91
PS9, Line 91:   Status Reconnect();
Doc this?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.h@103
PS9, Line 103:   static Status ParseTableName(const std::string& table,
And this?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.h@110
PS9, Line 110:   gscoped_ptr<ThreadPool> threadpool_;
unique_ptr


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc
File src/kudu/master/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@52
PS9, Line 52: const int HmsCatalog::kNumRetries = 1;
Do you think it would it be useful to convert this into an advanced gflag?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@76
PS9, Line 76:       .set_min_threads(0)
This is the default; it can be omitted.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@77
PS9, Line 77:       .set_max_threads(1)
Doc why capping at one thread is important.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@95
PS9, Line 95:   return Execute([=] (hms::HmsClient* client) {
Execute() waits for the provided functor to run, so why not capture variables 
by reference in all of these calls?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@162
PS9, Line 162:       if (!s.ok()) {
I'm going to claim laziness and ask whether all of these edge cases are covered 
by hms_catalog-test rather than checking myself.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@180
PS9, Line 180:       // Handle all other errors.
Nit: not really "handling" per se; it's just that all other errors are fatal.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@201
PS9, Line 201:  If
             :       // the original table object and the new table object 
match exactly then
             :       // we don't need to alter the table in the HMS.
Does this short circuit make us vulnerable to a TOCTOU race, where we return OK 
to the client because we think orig_table and table are identical even though 
someone else may have altered the Hive table in the meantime?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@222
PS9, Line 222:   // TODO(Todd): wrapping this in a TRACE_EVENT scope and a 
LOG_IF_SLOW and such
Nit:

  $ git grep TODO\([a-z] | wc -l
  278
  $ git grep TODO\([A-Z] | grep -v TODO\(KUDU- | wc -l
  5


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@238
PS9, Line 238:     // Since every task submitted to the (single thread) pool 
runs this, it's
             :     // essentially a single iteration of a run loop which 
handles HMS client
             :     // reconnection and task processing.
Do you think the pool should have a min_thread count of 1 so that we don't pay 
the thread startup cost for HMS operations that are spaced out by the pool's 
idle timeout?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@253
PS9, Line 253:     // network or IO fault (not an applicaiton level failure). 
The HMS client
application


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@258
PS9, Line 258:     for (int i = 0; i <= kNumRetries; i++) {
Seems like kNumRetries should depend at least somewhat on the number of HMSes. 
If we have three HMS instances and the first two are down, the very first 
submitted task will fail while the rest will succeed.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@291
PS9, Line 291:       // A fatal error occurred. Tear down the connection, and 
try again. We
Is it possible for reconnect to succeed, for the task to exhibit a fatal error, 
and for this to repeat over and over? If so, we'll perform the retries in a 
tight loop; shouldn't we wait in between? I can see the tight loop being a 
"feature" rather than a bug if we're retrying the task on a different HMS each 
time, but not if it's on the same HMS over and over.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@300
PS9, Line 300:     return callback(s);
This passes in the last failure; should we instead pass in the first?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@316
PS9, Line 316:     hms_client_ = hms::HmsClient(address, 
hms::HmsClientOptions());
Nit: might consider making HmsClientOptions() the default value of the 
constructor's second argument, for better ergonomics.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@324
PS9, Line 324:     LOG(WARNING) << "Failed to connect to Hive Metastore ("
             :                  << address.ToString() << "): " << s.ToString();
More terse:

  WARN_NOT_OK(s, Substitute("Failed to connect to Hive Metastore ($0)", 
address.ToString());


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@330
PS9, Line 330:   return s;
If we can't connect to any of the HMSs, should we return the first failure? 
Currently we're returning the last one.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@368
PS9, Line 368:       return string();
Hmm, so what would happen at runtime if we returned an empty string here? Any 
reason not to promote the DFATAL above into a FATAL?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@387
PS9, Line 387:   table->tableType = "MANAGED_TABLE";
Should this constant be given a little more visibility? What's its significance?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@420
PS9, Line 420:         "When the Hive Metastore integration is enabled, Kudu 
table names must be a "
Nit: as a convention we should start a status message with a lower case letter, 
since they're almost always prepended with other colon-delimited messages on 
the return path.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/mini-cluster/external_mini_cluster-test.cc
File src/kudu/mini-cluster/external_mini_cluster-test.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/mini-cluster/external_mini_cluster-test.cc@19
PS9, Line 19: #include <ostream>
Revert?


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/util/net/net_util.cc
File src/kudu/util/net/net_util.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/util/net/net_util.cc@18
PS9, Line 18: #include <cstdint>
Nit: this belongs in the next group of includes.



--
To view, visit http://gerrit.cloudera.org:8080/8312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf
Gerrit-Change-Number: 8312
Gerrit-PatchSet: 9
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Fri, 09 Mar 2018 00:36:36 +0000
Gerrit-HasComments: Yes

Reply via email to