jhungund commented on PR #3435:
URL: https://github.com/apache/hive/pull/3435#issuecomment-1197938266
Summary of the change:
While setting up the tasks during the repl-load phase of the replication,
the delay the access to the metadata until the task execution. This will avoid
inconsistent metadata state by the tasks.
**Root Cause Analysis**
Background:
During the incremental load phase of replication, all event logs are
processed sequentially.
One or more tasks are additionally spawned/created during the processing of
each event.
All the spawned tasks are also, subsequently, executed sequentially.
**Scenario of the issue:**
The issue is seen in the following scenario:
1. An external table(Eg. T1) is already replication to target cluster from
source cluster during earlier replication cycles.
2. This external table is dropped.
3. A new managed table with the same name (T1) is recreated.
**Root cause:**
1. The above mentioned operations (table drop and recreation) are propagated
to the target cluster
vis event logs during the subsequent incremental phase of replication.
2. We create tasks to drop the old external tables for drop table event.
3. We also create new tasks to create and load the table for the new table.
4. Additionally, some additional events are logged which create tasks to
load the table.
5. During the creation of these load-table tasks, we try to access the
metadata corresponding to the new table from the metadata store.
In normal scenario of a fresh table creation, the metadata store will
not have data corresponding to the new table (yet to be created).
However, in this scenario, the old table still exists and hence, we end
up using the metadata corresponding to old table.
We try to use this metadata to create the load tasks for the new table.
During the execution of these load tasks, which subsequently execute
after the drop and recreate tasks, we find that the metadata set in the task
context is stale and is inconsistent with the newly created table. Hence, the
error.
**Fix:**
Do not access the metadata during the task creation for load table.
Instead, access the metadata during the task execution. By that time,
the metadata is updated to the latest state with the previously executed
tasks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]