[ https://issues.apache.org/jira/browse/IMPALA-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770223#comment-17770223 ]
Joe McDonnell edited comment on IMPALA-12463 at 9/28/23 10:07 PM: ------------------------------------------------------------------ I was able to reproduce this interleaving by doing the following: {noformat} # SETUP STEPS # Run these from beeline beeline -n ${USER} -u "jdbc:hive2://localhost:11050/default;auth=none" # We will set up two partitioned tables with many partitions (in this case 7300) # Create partitioned table create table whatever (i int) partitioned by (partition_col int); create table whatever2 (i int) partitioned by (partition_col int); # Set partition options to allow inserting many partitions set hive.hive.exec.max.dynamic.partitions=10000; set hive.exec.max.dynamic.partitions.pernode=10000; set hive.exec.max.dynamic.partitions=10000; # Insert that creates 7300 partitions insert into whatever (partition_col, i) select id, id from functional.alltypes; insert into whatever2 (partition_col, i) select id, id from functional.alltypes; # Run this from impala-shell # This loads the tables on the Impala side and forces the catalog to process # events on these tables. select count(*) from whatever; select count(*) from whatever2; # REPRODUCING STEPS # Run concurrently in two beeline sessions # The INSERT events on the catalogd side should be interleaved # Both need: set hive.hive.exec.max.dynamic.partitions=10000; set hive.exec.max.dynamic.partitions.pernode=10000; set hive.exec.max.dynamic.partitions=10000; # Beeline #1 insert into whatever (partition_col, i) select id, id from functional.alltypes; # Beeline #2 insert into whatever2 (partition_col, i) select id, id from functional.alltypes;{noformat} On the catalog side, I see: {noformat} I0928 13:51:41.488373 679621 MetastoreEvents.java:628] EventId: 390796 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390795 to 390796 I0928 13:51:41.488386 679621 MetastoreEvents.java:628] EventId: 390800 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390799 to 390800 I0928 13:51:41.488399 679621 MetastoreEvents.java:628] EventId: 390804 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390803 to 390804 I0928 13:51:41.488425 679621 MetastoreEvents.java:628] EventId: 390811 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390810 to 390811 I0928 13:51:41.488437 679621 MetastoreEvents.java:628] EventId: 390819 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390818 to 390819 I0928 13:51:41.488451 679621 MetastoreEvents.java:628] EventId: 390834 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390833 to 390834{noformat} was (Author: joemcdonnell): I was able to reproduce this interleaving by doing the following: {noformat} # SETUP STEPS # Run these from beeline beeline -n ${USER} -u "jdbc:hive2://localhost:11050/default;auth=none" # We will set up two partitioned tables with many partitions (in this case 7300) # Create partitioned table create table whatever (i int) partitioned by (partition_col int); create table whatever2 (i int) partitioned by (partition_col int); # Set partition options to allow inserting many partitions set hive.hive.exec.max.dynamic.partitions=10000; set hive.exec.max.dynamic.partitions.pernode=10000; set hive.exec.max.dynamic.partitions=10000; # Insert that creates 7300 partitions insert into whatever (partition_col, i) select id, id from functional.alltypes; insert into whatever2 (partition_col, i) select id, id from functional.alltypes; # Run this from impala-shell # This loads the tables on the Impala side and forces the catalog to process # events on these tables. select count(*) from whatever; select count(*) from whatever2; # REPRODUCING STEPS # Run concurrently in two beeline sessions # The INSERT events on the catalogd side should be interleaved # Beeline #1 insert into whatever (partition_col, i) select id, id from functional.alltypes; # Beeline #2 insert into whatever2 (partition_col, i) select id, id from functional.alltypes;{noformat} On the catalog side, I see: {noformat} I0928 13:51:41.488373 679621 MetastoreEvents.java:628] EventId: 390796 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390795 to 390796 I0928 13:51:41.488386 679621 MetastoreEvents.java:628] EventId: 390800 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390799 to 390800 I0928 13:51:41.488399 679621 MetastoreEvents.java:628] EventId: 390804 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390803 to 390804 I0928 13:51:41.488425 679621 MetastoreEvents.java:628] EventId: 390811 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390810 to 390811 I0928 13:51:41.488437 679621 MetastoreEvents.java:628] EventId: 390819 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390818 to 390819 I0928 13:51:41.488451 679621 MetastoreEvents.java:628] EventId: 390834 EventType: INSERT_PARTITIONS Created a batch event for 2 events from 390833 to 390834{noformat} > Allow batching of non consecutive metastore events > -------------------------------------------------- > > Key: IMPALA-12463 > URL: https://issues.apache.org/jira/browse/IMPALA-12463 > Project: IMPALA > Issue Type: Improvement > Components: Catalog > Reporter: Csaba Ringhofer > Priority: Major > Attachments: concurrent_metadata_load.py > > > Currently Impala tries to batch events like partition insert/creation only if: > 1. the next event is for the same table as the previous one > 2. the next event's id is the previous one's + 1 > 3. the next event has the same type as the previous one > (2 can be stricter than 1 if some events were filtered between the two) > See > https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L315 > Another limit is that only events in the same batch from HMS can be merged. > Currently 1000 events are polled at the same time: > https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L218 > Making this configurable could be also useful. > Event batching could be improved by batching all events to the current one if > they modify the same table, unless they are "cut" by: > a. an event on the same table but with a different type > b. a rename table event where the original or the new name is the same as the > current event > If such an event occurs, the events after that can be only merged to a newer > event. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org