[ 
https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448578#comment-13448578
 ] 

terry zhang commented on HBASE-6719:
------------------------------------

I think we need to handle the IOException carefully and better not to skip the 
Hlog unless it is really corrupted. We can log this failture as a fatal in Log 
and skip the Hlog (by delete the hlog zk node manually ) if we have to.
                
> [replication] Data will lose if open a Hlog failed more than 
> maxRetriesMultiplier
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-6719
>                 URL: https://issues.apache.org/jira/browse/HBASE-6719
>             Project: HBase
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.94.1
>            Reporter: terry zhang
>            Assignee: terry zhang
>            Priority: Critical
>             Fix For: 0.94.2
>
>         Attachments: hbase-6719.patch
>
>
> Please Take a look below code
> {code:title=ReplicationSource.java|borderStyle=solid}
> protected boolean openReader(int sleepMultiplier) {
> {
>   ...
>   catch (IOException ioe) {
>       LOG.warn(peerClusterZnode + " Got: ", ioe);
>       // TODO Need a better way to determinate if a file is really gone but
>       // TODO without scanning all logs dir
>       if (sleepMultiplier == this.maxRetriesMultiplier) {
>         LOG.warn("Waited too long for this file, considering dumping");
>         return !processEndOfFile(); // Open a file failed over 
> maxRetriesMultiplier(default 10)
>       }
>     }
>     return true;
>   ...
> }
>   protected boolean processEndOfFile() {
>     if (this.queue.size() != 0) {    // Skipped this Hlog . Data loss
>       this.currentPath = null;
>       this.position = 0;
>       return true;
>     } else if (this.queueRecovered) {   // Terminate Failover Replication 
> source thread ,data loss
>       this.manager.closeRecoveredQueue(this);
>       LOG.info("Finished recovering the queue");
>       this.running = false;
>       return true;
>     }
>     return false;
>   }
> {code} 
> Some Time HDFS will meet some problem but actually Hlog file is OK , So after 
> HDFS back  ,Some data will lose and can not find them back in slave cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to