Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )
Change subject: IMPALA-9857: Batching of consecutive partition events ...................................................................... Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955 PS5, Line 955: > nit: too much indent Done http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027 PS5, Line 1027: > nit: +2 indent Done http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713 PS5, Line 2713: return hmsPartToHdfsPart.size(); > Shouldn't we return hmsPartToHdfsPart.size() since those are the actual par yes, thanks for spotting that. Fixed. http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279 PS2, Line 279: if (!current.canBeBatched(next)) { > Currently one of the conditions of batching two events together is - event Yes, the events definitely can be intermingled from multiple tables. In case of ALTER_PARTITION event it is less likely to happen since these events are transactional in nature (publish all or none) and they hold a eventId lock in HMS when being generated. However, this is highly dependent on the source application which generates the events. For example if Hive makes 10 alterPartition calls instead of 1 alterPartitions() for 10 partitions, the event stream may include events which are inter-mingled between different tables. The batching logic currently optimizes for the common case of consecutive ALTER_PARTITION or INSERT events being generated by a single query. More sophisticated batching schemes like you mentioned can be explored but may increase the scope of the patch and I am inclined to commit this first and then incrementally improve it if needed. My primary concern about batching non-consecutive events is that we need to make sure that we are not introducing any subtle bugs because of the the fact that we are refreshing the partitions out of order then event stream. http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641 PS2, Line 1641: MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1")); > I understand that there may not be significant performance gain. I still fe Okay. Let me look into this and see how much additional changes will need to made for this. http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811 PS5, Line 1811: Ref > nit: indentation is off My IDE seems to like doing this. Not sure how to fix it. Manually edited it. http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100 PS5, Line 100: dx); > This precondition is checked by the Java runtime. makes sense. Removed it. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 21 Sep 2021 19:46:35 +0000 Gerrit-HasComments: Yes