[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Reviewed-on: http://gerrit.cloudera.org:8080/20533
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 379 insertions(+), 66 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 12
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 23:07:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:53:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15093/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:56:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Hello Quanlong Huang, Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, 
Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#11).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 379 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/11
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@31
PS10, Line 31: import java.util.Iterator;
> nit: unused import
Done


http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@440
PS10, Line 440: Map dbMap = 
pendingTableEventsMap.get(dbName);
  : if (dbMap == null) {
  :   dbMap = new HashMap<>();
  :   pendingTableEventsMap.put(dbName, dbMap);
  : }
> nit: Can be replaced with single 'Map.computeIfAbsent' method call
Done



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:34:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10207/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:34:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-28 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 10: Code-Review+2

(2 comments)

The patch looks pretty good! Carry +1 from Csaba, Wenzhe and Zoltan.

http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@31
PS10, Line 31: import java.util.Iterator;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@440
PS10, Line 440: Map dbMap = 
pendingTableEventsMap.get(dbName);
  : if (dbMap == null) {
  :   dbMap = new HashMap<>();
  :   pendingTableEventsMap.put(dbName, dbMap);
  : }
nit: Can be replaced with single 'Map.computeIfAbsent' method call

 Map dbMap = 
pendingTableEventsMap.computeIfAbsent(
dbName, k -> new HashMap<>());



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 01:05:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15069/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jan 2024 21:04:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-26 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 10: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jan 2024 20:53:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2453
PS9, Line 2453: 
catalog_.getDb("database_to_be_dropped").getMetaStoreDb().getLocationUri();
> nit: +2 indentation
Done



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jan 2024 20:36:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-26 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#10).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 383 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/10
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 9: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2453
PS9, Line 2453:   
catalog_.getDb("database_to_be_dropped").getMetaStoreDb().getLocationUri();
nit: +2 indentation



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jan 2024 20:23:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Jan 2024 23:14:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10195/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Jan 2024 18:49:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15048/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Jan 2024 18:47:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-24 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@391
PS8, Line 391: // drop database - cuts any existing batches for tables 
in that database
 : // alter table rename - cuts any existing batches for 
the source or destination
 : //   table
 : // alter database - cuts any existing batches for tables 
in the database
 : //   This should be rare, so the performance difference 
is minimal.
> nit: the comments could be in the order as they appear in the code, i.e.:
Good idea, done


http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@396
PS8, Line 396: if (next instanceof DropDatabaseEvent || next instanceof 
AlterDatabaseEvent) {
 :   // Any batched events for tables from this database 
need to be flushed
 :   // before emitting the AlterDatabaseEvent or 
DropDatabaseEvent.
 :   flushBatchesForDb(pendingTableEventsMap, 
sortedFinalBatches, next.getDbName());
 : } else if (next instanceof AlterTableEvent) {
 :   AlterTableEvent alterTable = (AlterTableEvent) next;
 :   if (alterTable.isRename()) {
 : // Flush any batched events for the source table.
 : Table beforeTable = alterTable.getBeforeTable();
 : flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, beforeTable);
 :
 : // Flush any batched events for the destination 
table. Given that the
 : // destination table must not exist when doing this 
rename, valid sequences
 : // are already handled implicitly by the existing 
batch-breaking logic
 : // (combined with the sorting of the final batches). 
This does the flushing
 : // explicitly in case there are any edge cases not 
handled by the existing
 : // mechanisms.
 : Table afterTable = alterTable.getAfterTable();
 : flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, afterTable);
 :   }
 : }
> nit: for readability, would it make sense to put these if statements into t
It does make sense to split this logic out into its own function. That provides 
a better place to describe why we are doing this. I split this out to be its 
own "cutBatchesForCrossTableEvents" function.



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Jan 2024 18:20:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-24 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#9).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 383 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/9
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 8: Code-Review+1

(2 comments)

Left some nitpicks, otherwise the code LGTM!

http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@391
PS8, Line 391: // drop database - cuts any existing batches for tables 
in that database
 : // alter table rename - cuts any existing batches for 
the source or destination
 : //   table
 : // alter database - cuts any existing batches for tables 
in the database
 : //   This should be rare, so the performance difference 
is minimal.
nit: the comments could be in the order as they appear in the code, i.e.:
 * drop database ...
 * alter database ...
 * alter table rename ...


http://gerrit.cloudera.org:8080/#/c/20533/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@396
PS8, Line 396: if (next instanceof DropDatabaseEvent || next instanceof 
AlterDatabaseEvent) {
 :   // Any batched events for tables from this database 
need to be flushed
 :   // before emitting the AlterDatabaseEvent or 
DropDatabaseEvent.
 :   flushBatchesForDb(pendingTableEventsMap, 
sortedFinalBatches, next.getDbName());
 : } else if (next instanceof AlterTableEvent) {
 :   AlterTableEvent alterTable = (AlterTableEvent) next;
 :   if (alterTable.isRename()) {
 : // Flush any batched events for the source table.
 : Table beforeTable = alterTable.getBeforeTable();
 : flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, beforeTable);
 :
 : // Flush any batched events for the destination 
table. Given that the
 : // destination table must not exist when doing this 
rename, valid sequences
 : // are already handled implicitly by the existing 
batch-breaking logic
 : // (combined with the sorting of the final batches). 
This does the flushing
 : // explicitly in case there are any edge cases not 
handled by the existing
 : // mechanisms.
 : Table afterTable = alterTable.getAfterTable();
 : flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, afterTable);
 :   }
 : }
nit: for readability, would it make sense to put these if statements into their 
own methods? E.g. what we could have here is just two method calls:

 cutDbEventsIfNeeded(next, pendingTableEventsMap, sortedFinalBatches);
 cutTableEventsIfNeeded(next, pendingTableEventsMap, sortedFinalBatches);



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Dec 2023 18:03:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Dec 2023 12:33:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10057/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Dec 2023 08:05:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14765/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Dec 2023 08:01:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-17 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#8).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 371 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/8
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-17 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7:

(4 comments)

I looped the test that was failing for a few hours (a few hundred iterations) 
and I wasn't able to reproduce the test failure. The logs show that there is no 
batching involved for that test. For that test, all of the events are on a 
single table. If there is no batching, then the sorting means that everything 
stays in the same order in came in.

Rebased and adapted the new test cases to the changes in IMPALA-10949.

http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@400
PS7, Line 400: } else if (next instanceof AlterTableEvent) {
> Shouldn't we consider create/drop table events as batch-breaking events?
There is already batch-breaking logic via the canBeBatched() method. Only 
InsertEvent and AlterPartitionEvent implement it in a way that can return true. 
As long as events for an individual table hash to the same place, the existing 
batch-breaking logic will apply on the sequence of events for that table. So, 
create/drop table behaves the same way it does today, breaking any existing 
batch for the table involved.

This section is meant to handle additional batch breaking that the regular 
batch breaking may not handle, e.g. events that apply across multiple tables or 
across a table that would not be the one we hash to.


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405
PS7, Line 405: flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, beforeTable);
> IMO, we should also consider the scenario where the rename table happens ac
beforeTable and afterTable are org.apache.hadoop.hive.metastore.api.Table 
objects with the both database and table included, so they are treated as fully 
qualified table names.

Was there additional flushing that you had in mind?


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408
PS7, Line 408: // an invalid scenario, because the destination must 
not exist at the time
> This is a possible scenario, In the current queue, There are table events f
There definitely needs to be batch breaking, but the combination of the 
preexisting batch-breaking logic and the fact that events are sorted by end 
Event ID makes it hard to construct a valid sequence that isn't already handled 
by other code.

To take your example, the drop event on t1 breaks the t1 batch. The alter table 
rename hashes to the start table t2 and breaks the t2 batch.

I still think it is useful to make the handling explicit to be safe.

I will reword the comment to make this clearer.


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@426
PS7, Line 426:   dbMap = pendingTableEventsMap.get(dbName);
> Shouldn't we just assign a new HashMap<>() directly to the dbMap variable?
Good point, fixed



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Dec 2023 07:34:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-13 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@400
PS7, Line 400: } else if (next instanceof AlterTableEvent) {
Shouldn't we consider create/drop table events as batch-breaking events?


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405
PS7, Line 405: flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, beforeTable);
IMO, we should also consider the scenario where the rename table happens across 
different databases also and flush the corresponding events.
Eg: Alter table rename db1.tb1 to db2.tb2;


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408
PS7, Line 408: // an invalid scenario, because the destination must 
not exist at the time
This is a possible scenario, In the current queue, There are table events for 
t1, table events for t2, drop event for t1, rename event from t2 to t1.


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@426
PS7, Line 426:   dbMap = pendingTableEventsMap.get(dbName);
Shouldn't we just assign a new HashMap<>() directly to the dbMap variable?



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Dec 2023 01:15:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7:

the latest changes look good to me, but the failed test looks relevant:
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1108/testReport/junit/custom_cluster.test_events_custom_configs/TestEventProcessingCustomConfigs/test_drop_table_events/


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 10:30:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10012/


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 04:26:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10012/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 23:54:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14682/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:45:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:28:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#7).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 366 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/7
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@347
PS6, Line 347: must
> nit: duplicated
Thanks for pointing that out, fixed



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:22:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 6: Code-Review+1

(1 comment)

Thanks for working on this.

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@347
PS6, Line 347: must
nit: duplicated



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:05:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14680/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:51:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#6).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 366 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/6
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@310
PS5, Line 310: while
> optional: Not sure if this worth the effort, but it would be nice to make t
Good idea, I switched to make this a two-layer map.


http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2610
PS5, Line 2610: anways
> typo
Done



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:22:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@310
PS5, Line 310: while
optional: Not sure if this worth the effort, but it would be nice to make this 
better than O(N) - it doesn't seem impossible that alter database or 
create/drop database pairs are common in some workload. It could also make 
sense to increase batch sizes in the future, making this search quite costly.

e.g. pendingTableEventsMap could be Map>, 
with db/table keys, and the first time a db is met could add the db entries


http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2610
PS5, Line 2610: anways
typo



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 14:26:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 12:05:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10001/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 07:37:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14658/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 06:33:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14656/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 06:27:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2422
PS4, Line 2422: Database alteredDb = 
catalog_.getDb("database_to_be_dropped").getMetaStoreDb().deepCopy();
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3535
PS4, Line 3535:   private void addDatabaseParameters(String db, String key, 
String val) throws TException {
> line too long (91 > 90)
This wasn't needed anymore, removed this part of the change.



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 06:08:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#5).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 349 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/5
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#4).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 353 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/4
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2422
PS4, Line 2422: Database alteredDb = 
catalog_.getDb("database_to_be_dropped").getMetaStoreDb().deepCopy();
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/20533/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3535
PS4, Line 3535:   private void addDatabaseParameters(String db, String key, 
String val) throws TException {
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 06:01:03 +
Gerrit-HasComments: Yes