Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
......................................................................


Patch Set 9:

(20 comments)

Remaining comments, mostly about the tests.

High level, seems the open questions are:

* That bit about replacing the guts of the DB object.
* Error handling and reporting
* Whether the event handler should attempt to partially load DBs and tables, or 
just do an invalidate.
* Invalidation of specific partitions rather than all of them.

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@64
PS9, Line 64:   private static MetastoreEventsProcessor eventsProcessor;
Not thrilled with the convention, but the team likes an underscore after each 
field name.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@109
PS9, Line 109:     assertNotNull(catalog.getDb(TEST_DB_NAME));
Per prior review comments, should not add the new DB.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@120
PS9, Line 120:     createDatabase("database_to_be_dropped");
If we don't add the DB automatically, should reference it and assert it is in 
the cache.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@122
PS9, Line 122:     assertTrue(catalog.getDb("database_to_be_dropped") != null);
assertNotNull(...)


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@125
PS9, Line 125:     assertTrue("Database should not be found after processing 
drop_database event",
assertNull(...)

Oddly, later tests do use assertNull(). So, modify the code here, and the other 
cases where assertTrue is used to check for null/not null.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@136
PS9, Line 136:   @Test(expected = DatabaseNotFoundException.class)
This is confusing. If we expect an exception, then the exception terminates the 
execution. Logically, it should come from the last statement in the body. But, 
the last statement is a Unit assertion. So, something is strange.

Often easier to do this:

try {
  doSomethingThatShoudFail();
  fail();
} catch (ExpectedException e) {
  // Expected
}


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@139
PS9, Line 139:     final String TBL_TO_BE_DROPPED = "tbl_to_be_dropped";
Looks like a constant. Move outside the method and make static.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@148
PS9, Line 148:     loadTable(TBL_TO_BE_DROPPED);
Does this assert that the table exists in HMS, was loaded in the cache, and has 
the expected contents? Else, when we drop it, we can't tell if the drop code is 
broken if we don't know if the table was actually in the cache.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@155
PS9, Line 155:     // throws DatabaseNotFoundException
If it throws, we won't get to the assertTrue.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@161
PS9, Line 161:   @Ignore("Disabled until we fix Hive bug to deserialize 
alter_database event messages")
Reference HIVE JIRA ticket number.

Also, should reference this bug in the event processor code.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@217
PS9, Line 217:     createTable(TEST_TABLE_NAME_NONPARTITIONED, false);
Start by asserting that the table is not yet in the cache?


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@255
PS9, Line 255:     List<List<String>> partVals = new ArrayList<>(1);
Allocating just 1 member up front, then adding four items. Maybe just omit the 
initial size arg; not helping us much here.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@277
PS9, Line 277:     // change some partitions
There is a recent fix that does specialized calls to HDFS to check for updated 
vs. new partitions. It will be tricky to combine that test with this one, but 
we should ensure that the automated way of updating a partition uses the call 
optimizations from that fix.

I already noted in previous comments that we might not be using that path.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@296
PS9, Line 296:     eventsProcessor.processHMSNotificationEvents();
Any race conditions here? Are the events immediately available after the 
mutation call? Will all events come in a single message?


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@299
PS9, Line 299:         "Old named table is still existing", 
catalog.getTable(TEST_DB_NAME, "old_name"));
Nit: is still existing --> still exists


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@307
PS9, Line 307:     // Hive does not seem to create the events when table 
parameters are updated
Do we need them? Hive JIRA ticket number?


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@311
PS9, Line 311:     assertTrue("Table should be incomplete after alter table add 
parameter",
Seems we need more granularity here.

The catalog needs a method to report if a table is cached, used just for 
testing. If an event causes the table to become invalidated, then the best way 
to tell is to check that the table is no longer cached.

Then, we should retrieve the metadata in the normal way to ensure that the 
update was, in fact applied. This second path mimics what Impala will actually 
do and validates that the goal (Impala sees the change) did occur.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@320
PS9, Line 320:         catalog.getTable(TEST_DB_NAME, 
TEST_TABLE_NAME_NONPARTITIONED)
Again, load table, verify that the new column appears.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@331
PS9, Line 331:     // check invalidate after alter table remove column type
remove column *type*? Or, remove column?


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@430
PS9, Line 430:   private void alterTableRename(String tblName, String 
newTblName) throws TException {
Beyond the scope of this project, but the fact that you had to write these 
suggests we could do some refactoring. Impala already does these operations to 
handle DDL statements. We should not have to code up the same stanzas here. A 
future improvement is to create an Impala-specific client that wraps the Hive 
client and handles the boilerplate.



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <vih...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Dec 2018 21:12:53 +0000
Gerrit-HasComments: Yes

Reply via email to