Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py
File tests/custom_cluster/test_sync_to_latest_hms_events.py:

http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py@37
PS14, Line 37: --file_metadata_reload_properties=''
> I'm still understanding why we need this in some tests. Do those tests depe
This is a real problem with queries involving the 'Insert or Insert overwrite' 
command. This command generates an alter table event followed by an insert 
event. if the numRows don't change then we cannot detect if need to reload file 
metadata. We need to detect that an alter table event is generated because of 
an insert query and reload file metadata accordingly.
Below is an example where we cannot detect whether to reload file metadata or 
not:
create table tb1(i int); (Query run in Impala)
insert into tb1 values (1); (Query run in Hive)
Insert overwrite table tb1 values (2); (Query run in Hive)
Select * from tb1; (Query run in Impala) -- The output comes out as '1' instead 
of '2'.

Reason:
-> For the first insert query, we get 2 events, alter table and insert event, 
alter table event has numRows property changed, so we reload file metadata and 
update the lastSyncEventId on table, then the insert event gets skipped.
-> For the second insert overwrite query, we get 2 events, alter table and 
insert event, since the numRows are changed (even though underlying data 
changed), we cannot detect if file metadata needs to be reloaded, so we process 
this event without reloading file metadata and update the lastSyncEventId on 
table, then the insert event gets skipped. As a result, we get data correctness 
issues.

I believe the solution to this issue is to fix the Alter table event in the 
metastore, to indicate that this event is triggered because of an insert event 
then we can simply reload file metadata.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 15
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Dec 2023 02:44:52 +0000
Gerrit-HasComments: Yes

Reply via email to