Vihang Karajgaonkar has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17848 )
Change subject: IMPALA-9857: Batching of consecutive partition events ...................................................................... IMPALA-9857: Batching of consecutive partition events This patch improves the performance of events processor by batching together consecutive ALTER_PARTITION or INSERT events. Currently, without this patch, if the events stream consists of a lot of consecutive ALTER_PARTITION events which cannot be skipped, events processor will refresh partition from each event one by one. Similarly, in case of INSERT events in a partition events processor refresh one partition at a time. By batching together such consecutive ALTER_PARTITION or INSERT events, events processor needs to take lock on the table only once per batch and can refresh all the partitions from the events using multiple threads. For transactional (acid) tables, this provides even significant performance gain since currently we refresh the whole table in case of ALTER_PARTITION or INSERT partition events. By batching them together, events processor will refresh the table once per batch. The batch of eligible ALTER_PARTITION and INSERT events will be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event respectively. Performance tests: In order to simulate bunch of ALTER_PARTITION and INSERT events, a simple test was performed by running the following query from hive: insert into store_sales_copy partition(ss_sold_date_sk) select * from store_sales; This query generates 1824 ALTER_PARTITION and 1824 INSERT events and time taken to process all the events generated was measured before and after the patch for external and ACID table. Table Type Before After ====================================================== External table 75 sec 25 sec ACID tables 313 sec 47 sec Additionally, the patch also fixes a minor bug in evaluateSelfEvent() method which should return false when serviceId does not match. Testing Done: 1. Added new tests which cover the batching logic of events. 2. Exhaustive tests. Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M tests/custom_cluster/test_events_custom_configs.py M tests/metadata/test_event_processing.py M tests/util/event_processor_utils.py 11 files changed, 939 insertions(+), 214 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/11 -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 11 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>