Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12118 )
Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate ...................................................................... Patch Set 9: (20 comments) Remaining comments, mostly about the tests. High level, seems the open questions are: * That bit about replacing the guts of the DB object. * Error handling and reporting * Whether the event handler should attempt to partially load DBs and tables, or just do an invalidate. * Invalidation of specific partitions rather than all of them. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@64 PS9, Line 64: private static MetastoreEventsProcessor eventsProcessor; Not thrilled with the convention, but the team likes an underscore after each field name. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@109 PS9, Line 109: assertNotNull(catalog.getDb(TEST_DB_NAME)); Per prior review comments, should not add the new DB. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@120 PS9, Line 120: createDatabase("database_to_be_dropped"); If we don't add the DB automatically, should reference it and assert it is in the cache. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@122 PS9, Line 122: assertTrue(catalog.getDb("database_to_be_dropped") != null); assertNotNull(...) http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@125 PS9, Line 125: assertTrue("Database should not be found after processing drop_database event", assertNull(...) Oddly, later tests do use assertNull(). So, modify the code here, and the other cases where assertTrue is used to check for null/not null. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@136 PS9, Line 136: @Test(expected = DatabaseNotFoundException.class) This is confusing. If we expect an exception, then the exception terminates the execution. Logically, it should come from the last statement in the body. But, the last statement is a Unit assertion. So, something is strange. Often easier to do this: try { doSomethingThatShoudFail(); fail(); } catch (ExpectedException e) { // Expected } http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@139 PS9, Line 139: final String TBL_TO_BE_DROPPED = "tbl_to_be_dropped"; Looks like a constant. Move outside the method and make static. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@148 PS9, Line 148: loadTable(TBL_TO_BE_DROPPED); Does this assert that the table exists in HMS, was loaded in the cache, and has the expected contents? Else, when we drop it, we can't tell if the drop code is broken if we don't know if the table was actually in the cache. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@155 PS9, Line 155: // throws DatabaseNotFoundException If it throws, we won't get to the assertTrue. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@161 PS9, Line 161: @Ignore("Disabled until we fix Hive bug to deserialize alter_database event messages") Reference HIVE JIRA ticket number. Also, should reference this bug in the event processor code. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@217 PS9, Line 217: createTable(TEST_TABLE_NAME_NONPARTITIONED, false); Start by asserting that the table is not yet in the cache? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@255 PS9, Line 255: List<List<String>> partVals = new ArrayList<>(1); Allocating just 1 member up front, then adding four items. Maybe just omit the initial size arg; not helping us much here. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@277 PS9, Line 277: // change some partitions There is a recent fix that does specialized calls to HDFS to check for updated vs. new partitions. It will be tricky to combine that test with this one, but we should ensure that the automated way of updating a partition uses the call optimizations from that fix. I already noted in previous comments that we might not be using that path. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@296 PS9, Line 296: eventsProcessor.processHMSNotificationEvents(); Any race conditions here? Are the events immediately available after the mutation call? Will all events come in a single message? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@299 PS9, Line 299: "Old named table is still existing", catalog.getTable(TEST_DB_NAME, "old_name")); Nit: is still existing --> still exists http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@307 PS9, Line 307: // Hive does not seem to create the events when table parameters are updated Do we need them? Hive JIRA ticket number? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@311 PS9, Line 311: assertTrue("Table should be incomplete after alter table add parameter", Seems we need more granularity here. The catalog needs a method to report if a table is cached, used just for testing. If an event causes the table to become invalidated, then the best way to tell is to check that the table is no longer cached. Then, we should retrieve the metadata in the normal way to ensure that the update was, in fact applied. This second path mimics what Impala will actually do and validates that the goal (Impala sees the change) did occur. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@320 PS9, Line 320: catalog.getTable(TEST_DB_NAME, TEST_TABLE_NAME_NONPARTITIONED) Again, load table, verify that the new column appears. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@331 PS9, Line 331: // check invalidate after alter table remove column type remove column *type*? Or, remove column? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@430 PS9, Line 430: private void alterTableRename(String tblName, String newTblName) throws TException { Beyond the scope of this project, but the fact that you had to write these suggests we could do some refactoring. Impala already does these operations to handle DDL statements. We should not have to code up the same stanzas here. A future improvement is to create an Impala-specific client that wraps the Hive client and handles the boilerplate. -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward <vih...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <vih...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 27 Dec 2018 21:12:53 +0000 Gerrit-HasComments: Yes