[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Reviewed-on: http://gerrit.cloudera.org:8080/20533 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 379 insertions(+), 66 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 12 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 11 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2024 23:07:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 11: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 11 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2024 18:53:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15093/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 11 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2024 18:56:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Quanlong Huang, Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#11). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 379 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/11 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 11 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@31 PS10, Line 31: import java.util.Iterator; > nit: unused import Done http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@440 PS10, Line 440: Map dbMap = pendingTableEventsMap.get(dbName); : if (dbMap == null) { : dbMap = new HashMap<>(); : pendingTableEventsMap.put(dbName, dbMap); : } > nit: Can be replaced with single 'Map.computeIfAbsent' method call Done -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2024 18:34:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10207/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 11 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2024 18:34:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: Code-Review+2 (2 comments) The patch looks pretty good! Carry +1 from Csaba, Wenzhe and Zoltan. http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@31 PS10, Line 31: import java.util.Iterator; nit: unused import http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@440 PS10, Line 440: Map dbMap = pendingTableEventsMap.get(dbName); : if (dbMap == null) { : dbMap = new HashMap<>(); : pendingTableEventsMap.put(dbName, dbMap); : } nit: Can be replaced with single 'Map.computeIfAbsent' method call Map dbMap = pendingTableEventsMap.computeIfAbsent( dbName, k -> new HashMap<>()); -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2024 01:05:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15069/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 21:04:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 20:53:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2453 PS9, Line 2453: catalog_.getDb("database_to_be_dropped").getMetaStoreDb().getLocationUri(); > nit: +2 indentation Done -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 20:36:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#10). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 383 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/10 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 9: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2453 PS9, Line 2453: catalog_.getDb("database_to_be_dropped").getMetaStoreDb().getLocationUri(); nit: +2 indentation -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 20:23:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Jan 2024 23:14:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10195/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Jan 2024 18:49:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15048/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Jan 2024 18:47:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@391 PS8, Line 391: // drop database - cuts any existing batches for tables in that database : // alter table rename - cuts any existing batches for the source or destination : // table : // alter database - cuts any existing batches for tables in the database : // This should be rare, so the performance difference is minimal. > nit: the comments could be in the order as they appear in the code, i.e.: Good idea, done http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@396 PS8, Line 396: if (next instanceof DropDatabaseEvent || next instanceof AlterDatabaseEvent) { : // Any batched events for tables from this database need to be flushed : // before emitting the AlterDatabaseEvent or DropDatabaseEvent. : flushBatchesForDb(pendingTableEventsMap, sortedFinalBatches, next.getDbName()); : } else if (next instanceof AlterTableEvent) { : AlterTableEvent alterTable = (AlterTableEvent) next; : if (alterTable.isRename()) { : // Flush any batched events for the source table. : Table beforeTable = alterTable.getBeforeTable(); : flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, beforeTable); : : // Flush any batched events for the destination table. Given that the : // destination table must not exist when doing this rename, valid sequences : // are already handled implicitly by the existing batch-breaking logic : // (combined with the sorting of the final batches). This does the flushing : // explicitly in case there are any edge cases not handled by the existing : // mechanisms. : Table afterTable = alterTable.getAfterTable(); : flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, afterTable); : } : } > nit: for readability, would it make sense to put these if statements into t It does make sense to split this logic out into its own function. That provides a better place to describe why we are doing this. I split this out to be its own "cutBatchesForCrossTableEvents" function. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Jan 2024 18:20:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#9). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 383 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/9 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 8: Code-Review+1 (2 comments) Left some nitpicks, otherwise the code LGTM! http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@391 PS8, Line 391: // drop database - cuts any existing batches for tables in that database : // alter table rename - cuts any existing batches for the source or destination : // table : // alter database - cuts any existing batches for tables in the database : // This should be rare, so the performance difference is minimal. nit: the comments could be in the order as they appear in the code, i.e.: * drop database ... * alter database ... * alter table rename ... http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@396 PS8, Line 396: if (next instanceof DropDatabaseEvent || next instanceof AlterDatabaseEvent) { : // Any batched events for tables from this database need to be flushed : // before emitting the AlterDatabaseEvent or DropDatabaseEvent. : flushBatchesForDb(pendingTableEventsMap, sortedFinalBatches, next.getDbName()); : } else if (next instanceof AlterTableEvent) { : AlterTableEvent alterTable = (AlterTableEvent) next; : if (alterTable.isRename()) { : // Flush any batched events for the source table. : Table beforeTable = alterTable.getBeforeTable(); : flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, beforeTable); : : // Flush any batched events for the destination table. Given that the : // destination table must not exist when doing this rename, valid sequences : // are already handled implicitly by the existing batch-breaking logic : // (combined with the sorting of the final batches). This does the flushing : // explicitly in case there are any edge cases not handled by the existing : // mechanisms. : Table afterTable = alterTable.getAfterTable(); : flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, afterTable); : } : } nit: for readability, would it make sense to put these if statements into their own methods? E.g. what we could have here is just two method calls: cutDbEventsIfNeeded(next, pendingTableEventsMap, sortedFinalBatches); cutTableEventsIfNeeded(next, pendingTableEventsMap, sortedFinalBatches); -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Dec 2023 18:03:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Dec 2023 12:33:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10057/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Dec 2023 08:05:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14765/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Dec 2023 08:01:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#8). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 371 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/8 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: (4 comments) I looped the test that was failing for a few hours (a few hundred iterations) and I wasn't able to reproduce the test failure. The logs show that there is no batching involved for that test. For that test, all of the events are on a single table. If there is no batching, then the sorting means that everything stays in the same order in came in. Rebased and adapted the new test cases to the changes in IMPALA-10949. http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@400 PS7, Line 400: } else if (next instanceof AlterTableEvent) { > Shouldn't we consider create/drop table events as batch-breaking events? There is already batch-breaking logic via the canBeBatched() method. Only InsertEvent and AlterPartitionEvent implement it in a way that can return true. As long as events for an individual table hash to the same place, the existing batch-breaking logic will apply on the sequence of events for that table. So, create/drop table behaves the same way it does today, breaking any existing batch for the table involved. This section is meant to handle additional batch breaking that the regular batch breaking may not handle, e.g. events that apply across multiple tables or across a table that would not be the one we hash to. http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405 PS7, Line 405: flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, beforeTable); > IMO, we should also consider the scenario where the rename table happens ac beforeTable and afterTable are org.apache.hadoop.hive.metastore.api.Table objects with the both database and table included, so they are treated as fully qualified table names. Was there additional flushing that you had in mind? http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408 PS7, Line 408: // an invalid scenario, because the destination must not exist at the time > This is a possible scenario, In the current queue, There are table events f There definitely needs to be batch breaking, but the combination of the preexisting batch-breaking logic and the fact that events are sorted by end Event ID makes it hard to construct a valid sequence that isn't already handled by other code. To take your example, the drop event on t1 breaks the t1 batch. The alter table rename hashes to the start table t2 and breaks the t2 batch. I still think it is useful to make the handling explicit to be safe. I will reword the comment to make this clearer. http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@426 PS7, Line 426: dbMap = pendingTableEventsMap.get(dbName); > Shouldn't we just assign a new HashMap<>() directly to the dbMap variable? Good point, fixed -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Dec 2023 07:34:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@400 PS7, Line 400: } else if (next instanceof AlterTableEvent) { Shouldn't we consider create/drop table events as batch-breaking events? http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405 PS7, Line 405: flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, beforeTable); IMO, we should also consider the scenario where the rename table happens across different databases also and flush the corresponding events. Eg: Alter table rename db1.tb1 to db2.tb2; http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408 PS7, Line 408: // an invalid scenario, because the destination must not exist at the time This is a possible scenario, In the current queue, There are table events for t1, table events for t2, drop event for t1, rename event from t2 to t1. http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@426 PS7, Line 426: dbMap = pendingTableEventsMap.get(dbName); Shouldn't we just assign a new HashMap<>() directly to the dbMap variable? -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Dec 2023 01:15:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: the latest changes look good to me, but the failed test looks relevant: https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1108/testReport/junit/custom_cluster.test_events_custom_configs/TestEventProcessingCustomConfigs/test_drop_table_events/ -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 10:30:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10012/ -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 04:26:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10012/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 23:54:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14682/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 22:45:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 22:28:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#7). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 366 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/7 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@347 PS6, Line 347: must > nit: duplicated Thanks for pointing that out, fixed -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 22:22:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 6: Code-Review+1 (1 comment) Thanks for working on this. http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@347 PS6, Line 347: must nit: duplicated -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 22:05:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14680/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 20:51:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#6). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 366 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/6 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@310 PS5, Line 310: while > optional: Not sure if this worth the effort, but it would be nice to make t Good idea, I switched to make this a two-layer map. http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2610 PS5, Line 2610: anways > typo Done -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 20:22:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 5: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@310 PS5, Line 310: while optional: Not sure if this worth the effort, but it would be nice to make this better than O(N) - it doesn't seem impossible that alter database or create/drop database pairs are common in some workload. It could also make sense to increase batch sizes in the future, making this search quite costly. e.g. pendingTableEventsMap could be Map>, with db/table keys, and the first time a db is met could add the db entries http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2610 PS5, Line 2610: anways typo -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 14:26:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 12:05:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10001/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 07:37:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14658/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 06:33:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14656/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 06:27:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2422 PS4, Line 2422: Database alteredDb = catalog_.getDb("database_to_be_dropped").getMetaStoreDb().deepCopy(); > line too long (94 > 90) Done http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3535 PS4, Line 3535: private void addDatabaseParameters(String db, String key, String val) throws TException { > line too long (91 > 90) This wasn't needed anymore, removed this part of the change. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 06:08:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#5). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 349 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/5 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#4). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 353 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/4 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2422 PS4, Line 2422: Database alteredDb = catalog_.getDb("database_to_be_dropped").getMetaStoreDb().deepCopy(); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3535 PS4, Line 3535: private void addDatabaseParameters(String db, String key, String val) throws TException { line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2023 06:01:03 + Gerrit-HasComments: Yes