[
https://issues.apache.org/jira/browse/IMPALA-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333267#comment-17333267
]
ASF subversion and git services commented on IMPALA-10656:
----------------------------------------------------------
Commit c65d7861d9ae28f6fc592727ff699a8155dcda2c in impala's branch
refs/heads/dependabot/pip/infra/python/deps/py-1.10.0 from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c65d786 ]
IMPALA-10656: Fire insert events before commit
Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).
Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.
The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.
Testing:
- no new tests, insert events were already covered in
test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests
Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Reviewed-on: http://gerrit.cloudera.org:8080/17313
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Fire insert events before commit
> --------------------------------
>
> Key: IMPALA-10656
> URL: https://issues.apache.org/jira/browse/IMPALA-10656
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Frontend
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Major
>
> Currently Impala commits an insert first, then reloads the table from HMS,
> and generates the insert events based on the difference between the two
> snapshots. (e.g. which file was not present in the old snapshot but are there
> in the new). Hive replication expects the insert events before the commit, so
> this may potentially lead to issues there,
> The solution is to collect the new files during the insert in the backend,
> and send the insert events based on this file set.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]