[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952464#comment-16952464 ]
HBase QA commented on HBASE-12125: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HBASE-12125 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-12125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895999/HBASE-12125.v4.master.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/957/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.11.0 https://yetus.apache.org | This message was automatically generated. > Add Hbck option to check and fix WAL's from replication queue > ------------------------------------------------------------- > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: hbck, hbck2, Replication > Affects Versions: 3.0.0, 2.3.0, 1.6.0, hbase-operator-tools-1.1.0 > Reporter: Virag Kothari > Assignee: Vincent Poon > Priority: Critical > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch, > HBASE-12125.v4.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian Jira (v8.3.4#803005)