Harshal Patel created HIVE-28975:
------------------------------------

             Summary: [HiveAcidReplication] Remove dangling txns from Target 
side post incremental replication
                 Key: HIVE-28975
                 URL: https://issues.apache.org/jira/browse/HIVE-28975
             Project: Hive
          Issue Type: Improvement
          Components: repl
            Reporter: Harshal Patel
            Assignee: Harshal Patel


*Context and Problem Statement:*

Currently, due to certain inconsistencies on the Hive side, customers are 
frequently encountering the repl_incompatible error, triggered by different 
underlying issues.
 * *Current Issue:* There are missing entries in the txn_write_notification_log 
table for TRUNCATE operations. This causes problems when the Hive configuration 
property hive.repl.filter.transactions is set to true.

To improve resiliency from the replication side, we propose a mechanism to 
clean up dangling transaction entries on the Disaster Recovery (DR) cluster 
after the incremental load completes.

*Proposed Solution:*

We introduce a mechanism to capture and reconcile the state of open 
transactions during the replication process.
h3. *Steps:*
 # *Capture Initial Open Transactions:*
 * At the beginning of the incremental dump, capture the list of open 
transactions.
 * For example, this initial list might be: 1, 2, 3.


 # *Proceed with Normal Dump Process:*
 * While the dump is in progress, some transactions may complete, and new ones 
may start.
 * For instance, suppose transaction 1 completes and transaction 4 starts.


 # *Capture Final Open Transactions:*
 * After the dump completes, capture the list of open transactions again.
 * This list might now be: 2, 3, 4.
 * Append the new transaction (4 in this case) to the list and persist it in a 
file.


 # *During Load on the DR Cluster:*
 * Here load will have 1,2,3,4 as open transactions from source
 * After the load process completes, retrieve the transaction list from the 
repl_txn_map for the respective database.


 # *Clean Dangling Transactions:*
 * Abort the transactions on the DR cluster that are *not* present in the final 
list of transactions captured in step 3.
 * It will be like remove from repl_txn_map where not in (list of open txn from 
source)

h3. *Rationale Behind Key Steps:*

*Why is Step 1 Important?*

If the initial list of open transactions is not captured, the dump process 
might begin with a set of transactions assumed to be in a consistent state. For 
example, if transaction 1 was open at the time the dump started, it will remain 
open on the DR cluster after replication. But it got closed during dump was 
running. So, skipping this step would result in incorrect abortion of valid 
transactions during cleanup (step 5).

*Why is Step 3 Important?*

If a transaction (e.g., transaction 4) is opened between steps 1 and 2 and is 
replicated as part of the dump, it must be included in the list. Otherwise, it 
would be incorrectly aborted during the cleanup phase (step 5).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to