[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 ) Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130 Gerrit-Change-Number: 12968 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 14 Apr 2019 02:50:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 ) Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4019/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130 Gerrit-Change-Number: 12968 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 14 Apr 2019 02:50:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 ) Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130 Gerrit-Change-Number: 12968 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 13 Apr 2019 20:27:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12889 ) Change subject: IMPALA-7971: Add support for insert events in event processor. .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2772/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52 Gerrit-Change-Number: 12889 Gerrit-PatchSet: 15 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 13 Apr 2019 09:00:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.
Anurag Mantripragada has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/12889 ) Change subject: IMPALA-7971: Add support for insert events in event processor. .. IMPALA-7971: Add support for insert events in event processor. This patch adds support for detecting and processing insert events triggered by impala as well as external engines (eg.Hive). Inserts from Impala will fire an insert event notification. Using this event, event-processor will refresh table/partition. Both insert into and overwrite are supported for tables/partitions. Known Issues: 1. Inserts into tables from Hive are ignored by the event processor as these inserts create an ALTER event first followed by an INSERT event. The alter will invalidate table making the refresh a no-op. Insert into partitions from hive will create an INSERT event first followed by an ALTER event. In this case, there is an unnecessary table invalidate after a refresh. 2. Existing self-events logic cannot be used for insert events since firing insert event does not allow us to modify table parameters in HMS. This means we cannot get the CatalogServiceIdentifiers in insert events. Therefore, the event-processor will also refresh the tables for which insert operation is performed through Impala. Testing: 1. Added new custom cluster tests to run different insert commands from hive and verified new data is available in Impala without invalidate metadata. 2. Added a test in MetastoreEventsProcessor for testing insert events. Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52 --- M be/src/service/client-request-state.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A tests/custom_cluster/test_event_processing.py 9 files changed, 548 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/12889/15 -- To view, visit http://gerrit.cloudera.org:8080/12889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52 Gerrit-Change-Number: 12889 Gerrit-PatchSet: 15 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.
Anurag Mantripragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12889 ) Change subject: IMPALA-7971: Add support for insert events in event processor. .. Patch Set 15: (9 comments) http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3587 PS14, Line 3587: Map> filesBeforeInsertForPartitions = new HashMap<>(); : Set filesBeforeInsertForTable = new HashSet<>(); > Both of them can be handled in a single map? Also, rename to filesBeforeIns Here we track fileNames of all affectedPartitions using a Map. Later, this map is used to retrieve and compare fileNames to calculate deltas using partitionIds. However, partitionId cannot be used for a non-partitioned table because they change during load(). One solution was to use partitionNames instead of partitionIds. We decided to go with Ids because using partitionNames was too error prone. http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3589 PS14, Line 3589: affectedExistingPartitions = new > just say affectedPartitions ? Changed it to affectedExistingPartitions. Want to stress on 'existing' as these are existing partitions that are involved in this insert event. There could be some new partitions created in the insert which are not stored here. http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3608 PS14, Line 3608: if (catalog_.isExternalEventProcessingEnabled()) { > not needed. We don't wish to do this if event processing is disabled right? http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3704 PS14, Line 3704: if (catalog_.isExternalEventProcessingEnabled()) { > not needed. We don't wish this path be taken if event processor is disabled. Right? http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3736 PS14, Line 3736: filesBeforeInsertForTable = ((HdfsPartition) Iterables > Like I mentioned above, I don't see why we should use two different datastr As mentioned before, we cannot keep track of filenames using partitionIds for non-partitioned tables as we drop and create the single partition during load. (partitionId changes.). Hence two datastructures. http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3743 PS14, Line 3743: // After loading metadata, fire insert event if external event processing is : // enabled. : if (table.getNumClusteringCols() > 0) { : createInsertEventsForPartitions(table, filesBeforeInsertForPartitions, : update.is_overwrite); : } else { : createInsertEventForTable(table, filesBeforeInsertForTable, update.is_overwrite); : } > How about changing the signature to As mentioned above this is not possible because of two different data structures for filesBeforeInsert for partition case and non-partitioned case http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3787 PS14, Line 3787: deltaFiles = ((HdfsPartition)part).getFileNames(); > shouldn't this go before if? tests didn't catch this? This looks right to me. We want deltaFiles to be empty when it is an overwrite. We only calculate the new files if its not an overwrite. http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3791 PS14, Line 3791: deltaFiles.size > need this? Removed. http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3813 PS14, Line 3813: deltaFiles = ((HdfsPartition)singlePart).getFileNames(); > same question, shouldn't deltaFiles go before if? We want delta files to be empty if it is an overwrite. -- To view, visit http://gerrit.cloudera.org:8080/12889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52 Gerrit-Change-Number: 12889 Gerrit-PatchSet: 15 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 13 Apr 2019 08:19:05 + Gerrit-HasComments: Yes