Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12889 )
Change subject: IMPALA-7971: Add support for insert events in event processor. ...................................................................... Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/12889/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12889/5//COMMIT_MSG@14 PS5, Line 14: Also, renamed : TableInvalidatingEvent class to TableInvalidatingOrRefreshingEvent : to reflect new behaviour. This may not be applicable anymore http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@413 PS5, Line 413: getOrLoadTable I think this call is not correct since this will be a no-op if the table is loaded already. You should probably use reloadTable() method here to force the refresh. I also see that there is a reloadPartition() method in CatalogServiceCatalog. Can we use that method to refresh only the Partition when the insert_event is for a given partition in case of partitioned tables. http://gerrit.cloudera.org:8080/#/c/12889/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12889/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3575 PS4, Line 3575: // Attempt to remove this partition name from from partsToCreate. If remove > This is a good suggestion for partitioned tables. However, for unpartitione I see. Thanks for that info. using nulls are keys is error-prone since the methods where this collection is passed need to be aware of this to avoid NPE. Its better to just use a separate collection like you seemed to have started doing that by having filesForUnpartitionedTable before. http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@38 PS5, Line 38: import org.apache.hadoop.hive.conf.HiveConf.ConfVars; unused? http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3557 PS5, Line 3557: existingFilesForInsertedPartitions may be a better name would be to suggest that this contains files before insert.. so may something like filesBeforeInsertForPartitions? http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3558 PS5, Line 3558: filesForUnpartitionedTable is this unused? http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3580 PS5, Line 3580: HdfsPartition) partition) Is there a concern here of running into CastException? I see that FeFsPartition has two implementations HdfsPartition and LocalFsPartition. Do we need to handle LocalFsPartition as well? Also, can we do this only when event processing is enabled? http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3673 PS5, Line 3673: else { may be do a else if(catalog.isExternalEventProcessingEnabled()) here so that this code is only triggered when event processing is enabled. http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3674 PS5, Line 3674: FeFsPartition singlePart = Iterables.getOnlyElement((List<FeFsPartition>) parts); May be add a Preconditions.checkState(parts.size == 1); here to make sure that the assumption in the code below are always true http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3676 PS5, Line 3676: (HdfsPartition) singlePart same as above, Do we need to handle LocalFsPartition as well? http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3699 PS5, Line 3699: createInsertEvent May be you can create 2 methods, one for partitioned case and another for non-partitioned case. The former will take in Map<Long, Set<String>> as an argument for filesBeforeInsertInPartitions and the later will take Set<String> as an argument for filesBeforeInsertInTable you can then call them using the if (table is partitioned) createInsertEventForTable() else createInsertEventsForPartitions() http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3716 PS5, Line 3716: * fireInsertEvent() if external event processing is enabled. Add to the description that this method is a no-op if event processing is disabled. http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3719 PS5, Line 3719: is_overwrite nit, change the name to isInsertOverwrite http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3736 PS5, Line 3736: existingFilesForInsertedPartitions.get(part.getId())); I think it would be helpful to add info log here which says how many new files where detected for a given partition http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3750 PS5, Line 3750: deltaFiles.removeAll(existingFilesForInsertedPartitions.get(null)); add a info level log here which prints how many new files were added into the table. -- To view, visit http://gerrit.cloudera.org:8080/12889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52 Gerrit-Change-Number: 12889 Gerrit-PatchSet: 5 Gerrit-Owner: Anurag Mantripragada <anu...@cloudera.com> Gerrit-Reviewer: Anurag Mantripragada <anu...@cloudera.com> Gerrit-Reviewer: Bharath Krishna <bhar...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Wed, 03 Apr 2019 18:49:46 +0000 Gerrit-HasComments: Yes