Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
......................................................................


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/12889/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12889/5//COMMIT_MSG@14
PS5, Line 14: Also, renamed
            : TableInvalidatingEvent class to TableInvalidatingOrRefreshingEvent
            : to reflect new behaviour.
This may not be applicable anymore


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@413
PS5, Line 413: getOrLoadTable
I think this call is not correct since this will be a no-op if the table is 
loaded already. You should probably use reloadTable() method here to force the 
refresh.

I also see that there is a reloadPartition() method in CatalogServiceCatalog. 
Can we use that method to refresh only the Partition when the insert_event is 
for a given partition in case of partitioned tables.


http://gerrit.cloudera.org:8080/#/c/12889/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/12889/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3575
PS4, Line 3575:           // Attempt to remove this partition name from from 
partsToCreate. If remove
> This is a good suggestion for partitioned tables. However, for unpartitione
I see. Thanks for that info. using nulls are keys is error-prone since the 
methods where this collection is passed need to be aware of this to avoid NPE. 
Its better to just use a separate collection like you seemed to have started 
doing that by having filesForUnpartitionedTable before.


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@38
PS5, Line 38: import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
unused?


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3557
PS5, Line 3557: existingFilesForInsertedPartitions
may be a better name would be to suggest that this contains files before 
insert.. so may something like filesBeforeInsertForPartitions?


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3558
PS5, Line 3558: filesForUnpartitionedTable
is this unused?


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3580
PS5, Line 3580: HdfsPartition) partition)
Is there a concern here of running into CastException? I see that FeFsPartition 
has two implementations HdfsPartition and LocalFsPartition. Do we need to 
handle LocalFsPartition as well?

Also, can we do this only when event processing is enabled?


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3673
PS5, Line 3673: else {
may be do a else if(catalog.isExternalEventProcessingEnabled()) here so that 
this code is only triggered when event processing is enabled.


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3674
PS5, Line 3674:         FeFsPartition singlePart = 
Iterables.getOnlyElement((List<FeFsPartition>) parts);
May be add a Preconditions.checkState(parts.size == 1); here to make sure that 
the assumption in the code below are always true


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3676
PS5, Line 3676: (HdfsPartition) singlePart
same as above, Do we need to handle LocalFsPartition as well?


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3699
PS5, Line 3699: createInsertEvent
May be you can create 2 methods, one for partitioned case and another for 
non-partitioned case. The former will take in Map<Long, Set<String>> as an 
argument for filesBeforeInsertInPartitions and the later will take Set<String> 
as an argument for filesBeforeInsertInTable

you can then call them using the

if (table is partitioned)
  createInsertEventForTable()
else
  createInsertEventsForPartitions()


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3716
PS5, Line 3716:    * fireInsertEvent() if external event processing is enabled.
Add to the description that this method is a no-op if event processing is 
disabled.


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3719
PS5, Line 3719: is_overwrite
nit, change the name to isInsertOverwrite


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3736
PS5, Line 3736:               
existingFilesForInsertedPartitions.get(part.getId()));
I think it would be helpful to add info log here which says how many new files 
where detected for a given partition


http://gerrit.cloudera.org:8080/#/c/12889/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3750
PS5, Line 3750:         
deltaFiles.removeAll(existingFilesForInsertedPartitions.get(null));
add a info level log here which prints how many new files were added into the 
table.



--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 5
Gerrit-Owner: Anurag Mantripragada <anu...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <anu...@cloudera.com>
Gerrit-Reviewer: Bharath Krishna <bhar...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:49:46 +0000
Gerrit-HasComments: Yes

Reply via email to