[
https://issues.apache.org/jira/browse/HIVE-24933?focusedWorklogId=796014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796014
]
ASF GitHub Bot logged work on HIVE-24933:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Jul/22 10:07
Start Date: 28/Jul/22 10:07
Worklog Time Spent: 10m
Work Description: jhungund commented on PR #3435:
URL: https://github.com/apache/hive/pull/3435#issuecomment-1197938266
Summary of the change:
While setting up the tasks during the repl-load phase of the replication,
the delay the access to the metadata until the task execution. This will avoid
inconsistent metadata state by the tasks.
**Root Cause Analysis**
Background:
During the incremental load phase of replication, all event logs are
processed sequentially.
One or more tasks are additionally spawned/created during the processing of
each event.
All the spawned tasks are also, subsequently, executed sequentially.
**Scenario of the issue:**
The issue is seen in the following scenario:
1. An external table(Eg. T1) is already replication to target cluster from
source cluster during earlier replication cycles.
2. This external table is dropped.
3. A new managed table with the same name (T1) is recreated.
**Root cause:**
1. The above mentioned operations (table drop and recreation) are propagated
to the target cluster
vis event logs during the subsequent incremental phase of replication.
2. We create tasks to drop the old external tables for drop table event.
3. We also create new tasks to create and load the table for the new table.
4. Additionally, some additional events are logged which create tasks to
load the table.
5. During the creation of these load-table tasks, we try to access the
metadata corresponding to the new table from the metadata store.
In normal scenario of a fresh table creation, the metadata store will
not have data corresponding to the new table (yet to be created).
However, in this scenario, the old table still exists and hence, we end
up using the metadata corresponding to old table.
We try to use this metadata to create the load tasks for the new table.
During the execution of these load tasks, which subsequently execute
after the drop and recreate tasks, we find that the metadata set in the task
context is stale and is inconsistent with the newly created table. Hence, the
error.
**Fix:**
Do not access the metadata during the task creation for load table.
Instead, access the metadata during the task execution. By that time,
the metadata is updated to the latest state with the previously executed
tasks.
Issue Time Tracking
-------------------
Worklog Id: (was: 796014)
Time Spent: 2h 10m (was: 2h)
> Replication fails for transactional tables having same name as dropped
> non-transactional table
> ----------------------------------------------------------------------------------------------
>
> Key: HIVE-24933
> URL: https://issues.apache.org/jira/browse/HIVE-24933
> Project: Hive
> Issue Type: Bug
> Reporter: Pratyush Madhukar
> Assignee: Pratyush Madhukar
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)