Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs ...................................................................... Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py File tests/custom_cluster/test_sync_to_latest_hms_events.py: http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py@37 PS14, Line 37: --file_metadata_reload_properties='' > I'm still understanding why we need this in some tests. Do those tests depe This is a real problem with queries involving the 'Insert or Insert overwrite' command. This command generates an alter table event followed by an insert event. if the numRows don't change then we cannot detect if need to reload file metadata. We need to detect that an alter table event is generated because of an insert query and reload file metadata accordingly. Below is an example where we cannot detect whether to reload file metadata or not: create table tb1(i int); (Query run in Impala) insert into tb1 values (1); (Query run in Hive) Insert overwrite table tb1 values (2); (Query run in Hive) Select * from tb1; (Query run in Impala) -- The output comes out as '1' instead of '2'. Reason: -> For the first insert query, we get 2 events, alter table and insert event, alter table event has numRows property changed, so we reload file metadata and update the lastSyncEventId on table, then the insert event gets skipped. -> For the second insert overwrite query, we get 2 events, alter table and insert event, since the numRows are changed (even though underlying data changed), we cannot detect if file metadata needs to be reloaded, so we process this event without reloading file metadata and update the lastSyncEventId on table, then the insert event gets skipped. As a result, we get data correctness issues. I believe the solution to this issue is to fix the Alter table event in the metastore, to indicate that this event is triggered because of an insert event then we can simply reload file metadata. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 15 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Thu, 14 Dec 2023 02:44:52 +0000 Gerrit-HasComments: Yes