[ 
https://issues.apache.org/jira/browse/HIVE-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19927:
------------------------------------
    Attachment: HIVE-19927.02.patch

> Last Repl ID set by bootstrap dump is incorrect and may cause data loss if 
> have ACID/MM tables.
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19927
>                 URL: https://issues.apache.org/jira/browse/HIVE-19927
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, Transactions
>    Affects Versions: 3.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, pull-request-available, replication
>         Attachments: HIVE-19927.01.patch, HIVE-19927.02.patch
>
>
> During bootstrap dump of ACID tables, let's consider the below sequence.
> - Current session (REPL DUMP), Open txn (Txn1) - Event-10
> - Another session (Session-2), Open txn (Txn2) - Event-11
> - Session-2 -> Insert data (T1.D1) to ACID table. - Event-12
> - Get lastReplId = last event ID logged. (Event-12)
> - Session-2 -> Commit Txn (Txn2) - Event-13
> - Dump ACID tables based on validTxnList based on Txn1. --> This step skips 
> all the data written by txns > Txn1. So, T1.D1 will be missing.
> - Commit Txn (Txn1)
> - REPL LOAD from bootstrap dump will skip T1.D1.
> - Incremental REPL DUMP will start from Event-13 and hence lose Txn2 which is 
> opened after Txn1. So, data T1.D1 will be lost for ever.
> Proposed to capture the lastReplId of bootstrap before opening current txn 
> (Txn1) and store it in Driver context and use it for dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to