[ https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449673#comment-16449673 ]
Sankar Hariappan edited comment on HIVE-18988 at 4/24/18 7:13 PM: ------------------------------------------------------------------ Added 04.patch with * Logic to timeout the open txns which are opened before triggering bootstrap. * Replicate the write ids state in target based on validWriteIdlist for each ACID/MM table getting replicated. [~ekoifman], [~maheshk114], [~thejas] Can you please review this patch? was (Author: sankarh): Added 04.patch with * Logic to timeout the open txns which are opened before triggering bootstrap. * Replicate the write ids state in target based on validWriteIdlist for each ACID/MM table getting replicated. > Support bootstrap replication of ACID tables > -------------------------------------------- > > Key: HIVE-18988 > URL: https://issues.apache.org/jira/browse/HIVE-18988 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl > Affects Versions: 3.0.0 > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > Priority: Major > Labels: ACID, DR, pull-request-available, replication > Fix For: 3.1.0 > > Attachments: HIVE-18988.01.patch, HIVE-18988.02.patch, > HIVE-18988.03.patch, HIVE-18988.04.patch > > > Bootstrapping of ACID tables, need special handling to replicate a stable > state of data. > - If ACID feature enables, then perform bootstrap dump for ACID tables with > in read txn. > -> Dump table/partition metadata. > -> Get the list of valid data files for a table using same logic as read txn > do. > -> Dump latest ValidWriteIdList as per current read txn. > - Set the valid last replication state such that it doesn't miss any open > txn started after triggering bootstrap dump. > - If any txns on-going which was opened before triggering bootstrap dump, > then it is not guaranteed that if open_txn event captured for these txns. > Also, if these txns are opened for streaming ingest case, then dumped ACID > table data may include data of open txns which impact snapshot isolation at > target. To avoid that, bootstrap dump should wait for timeout (new > configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, > just force abort those txns and continue. > - If any txns force aborted belongs to a streaming ingest case, then dumped > ACID table data may have aborted data too. So, it is necessary to replicate > the aborted write ids to target to mark those data invalid for any readers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)