[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227696#comment-16227696 ]
Ted Yu commented on HBASE-12125: -------------------------------- Can you put the patch on review board ? > Add Hbck option to check and fix WAL's from replication queue > ------------------------------------------------------------- > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 3.0.0 > Reporter: Virag Kothari > Assignee: Vincent Poon > Attachments: HBASE-12125.v1.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)