Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> 
txnToWriteIds_ =
            :       new ConcurrentHashMap<>();
> Thanks for the clarification. "the new HMS API getAllWriteEventInfo only re
@Yu-Wen: In addition to what Vihang asked, how would we handle the following 
scenario:

Suppose the following HMS events are generated for a ddl on a partition:
1. Open txn
2. Allocate write id
3. perform ddl op
4. Commit txn

Now if event processor is started and say it starts processing events from #3. 
Assuming that write id generated in #2 is not returned in HMS API 
getAllWriteEventInfo, does this mean that event processor would skip adding 
writeId from #2 to committed writeIds list? If so, is this the right/desired 
behavior?


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057
PS7, Line 2057:         commitTxnMessage_.addWriteEventInfo(writeEventInfoList);
Why are we modifying commitTxnMesage? Can't we get all the required info from 
writeEventInfoList?


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080
PS7, Line 2080:       
Preconditions.checkNotNull(commitTxnMessage_.getPartitions());
Why are we checking for non null partitions? Wouldn't unpartitioned table have 
null partitions?


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4160
PS7, Line 4160:       List<Partition> partsFromEvent, boolean loadFromEvent, 
boolean loadFileMetadata,
Looks like if loadFromEvent is true, then in HdfsTable we avoid making HMS api 
getPartitionsByNames().
I don't think adding loadFromEvent flag is the right way to do this. We should 
have clear apis in catalogOpExecutor, HdfsTable etc to reload partitions based 
on partition names and partition objects.


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4233
PS7, Line 4233:   public int 
addCommittedWriteIdsAndReloadPartitionsIfExist(long eventId, String dbName,
This api is not making much sense to me here. More so since it is public.
Should we move this inside CommitTxnEvent? Some parts like getting table from 
cache can be moved to a common place in catalogOpExecutor.



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 7
Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Oct 2021 17:27:10 +0000
Gerrit-HasComments: Yes

Reply via email to