[ https://issues.apache.org/jira/browse/HBASE-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317621#comment-15317621 ]
Vladimir Rodionov commented on HBASE-14142: ------------------------------------------- Dangerous for operations which are not idempotent (Increment, Append) > HBase Backup/Restore Phase 3: Cells deduplication during backup > --------------------------------------------------------------- > > Key: HBASE-14142 > URL: https://issues.apache.org/jira/browse/HBASE-14142 > Project: HBase > Issue Type: New Feature > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > > As since we do not record last backed up sequence ids (MVCC) and do not > restore up to that sequence id - that is kind of tricky, there will be some > duplicates of KVs in store files after first incremental restore after full > backup. These duplicates are result of how we do full backup and first > incremental backup after full one. During full backup we perform distributed > log roll and record, for every RS, last WAL timestamp, then we do snapshot. > The next WAL after recorded one will make it into a next incremental backup > set, but it will contains some edits (puts, deletes) which have been recorded > by a previous snapshot. During restore, we, first, restore snapshot, then we > will re-play WALs and this operation can create some duplicates of KVs in > different store files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)