[ 
https://issues.apache.org/jira/browse/IMPALA-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770223#comment-17770223
 ] 

Joe McDonnell edited comment on IMPALA-12463 at 9/28/23 10:07 PM:
------------------------------------------------------------------

I was able to reproduce this interleaving by doing the following:
{noformat}
# SETUP STEPS
# Run these from beeline
beeline -n ${USER} -u "jdbc:hive2://localhost:11050/default;auth=none"

# We will set up two partitioned tables with many partitions (in this case 7300)

# Create partitioned table
create table whatever (i int) partitioned by (partition_col int);
create table whatever2 (i int) partitioned by (partition_col int);

# Set partition options to allow inserting many partitions
set hive.hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;

# Insert that creates 7300 partitions
insert into whatever (partition_col, i) select id, id from functional.alltypes;
insert into whatever2 (partition_col, i) select id, id from functional.alltypes;

# Run this from impala-shell
# This loads the tables on the Impala side and forces the catalog to process
# events on these tables.
select count(*) from whatever;
select count(*) from whatever2;

# REPRODUCING STEPS
# Run concurrently in two beeline sessions
# The INSERT events on the catalogd side should be interleaved
# Both need:
set hive.hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
# Beeline #1
insert into whatever (partition_col, i) select id, id from functional.alltypes;
# Beeline #2
insert into whatever2 (partition_col, i) select id, id from 
functional.alltypes;{noformat}
On the catalog side, I see:
{noformat}
I0928 13:51:41.488373 679621 MetastoreEvents.java:628] EventId: 390796 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390795 to 
390796
I0928 13:51:41.488386 679621 MetastoreEvents.java:628] EventId: 390800 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390799 to 
390800
I0928 13:51:41.488399 679621 MetastoreEvents.java:628] EventId: 390804 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390803 to 
390804
I0928 13:51:41.488425 679621 MetastoreEvents.java:628] EventId: 390811 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390810 to 
390811
I0928 13:51:41.488437 679621 MetastoreEvents.java:628] EventId: 390819 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390818 to 
390819
I0928 13:51:41.488451 679621 MetastoreEvents.java:628] EventId: 390834 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390833 to 
390834{noformat}


was (Author: joemcdonnell):
I was able to reproduce this interleaving by doing the following:
{noformat}
# SETUP STEPS
# Run these from beeline
beeline -n ${USER} -u "jdbc:hive2://localhost:11050/default;auth=none"

# We will set up two partitioned tables with many partitions (in this case 7300)

# Create partitioned table
create table whatever (i int) partitioned by (partition_col int);
create table whatever2 (i int) partitioned by (partition_col int);

# Set partition options to allow inserting many partitions
set hive.hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;

# Insert that creates 7300 partitions
insert into whatever (partition_col, i) select id, id from functional.alltypes;
insert into whatever2 (partition_col, i) select id, id from functional.alltypes;

# Run this from impala-shell
# This loads the tables on the Impala side and forces the catalog to process
# events on these tables.
select count(*) from whatever;
select count(*) from whatever2;

# REPRODUCING STEPS
# Run concurrently in two beeline sessions
# The INSERT events on the catalogd side should be interleaved
# Beeline #1
insert into whatever (partition_col, i) select id, id from functional.alltypes;
# Beeline #2
insert into whatever2 (partition_col, i) select id, id from 
functional.alltypes;{noformat}
On the catalog side, I see:
{noformat}
I0928 13:51:41.488373 679621 MetastoreEvents.java:628] EventId: 390796 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390795 to 
390796
I0928 13:51:41.488386 679621 MetastoreEvents.java:628] EventId: 390800 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390799 to 
390800
I0928 13:51:41.488399 679621 MetastoreEvents.java:628] EventId: 390804 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390803 to 
390804
I0928 13:51:41.488425 679621 MetastoreEvents.java:628] EventId: 390811 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390810 to 
390811
I0928 13:51:41.488437 679621 MetastoreEvents.java:628] EventId: 390819 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390818 to 
390819
I0928 13:51:41.488451 679621 MetastoreEvents.java:628] EventId: 390834 
EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390833 to 
390834{noformat}

> Allow batching of non consecutive metastore events
> --------------------------------------------------
>
>                 Key: IMPALA-12463
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12463
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Priority: Major
>         Attachments: concurrent_metadata_load.py
>
>
> Currently Impala tries to batch events like partition insert/creation only if:
> 1. the next event is for the same table as the previous one
> 2. the next event's id is the previous one's + 1
> 3. the next event has the same type as the previous one
> (2 can be stricter than 1 if some events were filtered between the two)
> See 
> https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L315
> Another limit is that only events in the same batch from HMS can be merged. 
> Currently 1000 events are polled at the same time: 
> https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L218
> Making this configurable could be also useful.
> Event batching could be improved by batching all events to the current one if 
> they modify the same table, unless they are "cut" by:
> a. an event on the same table but with a different type
> b. a rename table event where the original or the new name is the same as the 
> current event
> If such an event occurs, the events after that can be only merged to a newer 
> event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to