[ 
https://issues.apache.org/jira/browse/HBASE-15252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142276#comment-15142276
 ] 

stack commented on HBASE-15252:
-------------------------------

Looking in blame, change was done a while back:

commit 031b3d99a956998d36ba593a8b0df4801ef2c66e
Author: Jean-Daniel Cryans <jdcry...@apache.org>
Date:   Fri Aug 30 18:07:35 2013 +0000

    HBASE-9373 [replication] data loss because replication doesn't expect 
partial reads

Given the replication subject on the patch, I'd guess the thought was that 
replication would come back and try to read this stream again later when 
perhaps there was more data available on the WAL... perhaps the seekOnFs is 
what is used to throw the 'true'EOF? I don't know this code well.

> Data loss when replaying wal if HDFS timeout
> --------------------------------------------
>
>                 Key: HBASE-15252
>                 URL: https://issues.apache.org/jira/browse/HBASE-15252
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>         Attachments: HBASE-15252-testcase.patch
>
>
> This is a problem introduced by HBASE-13825 where we change the exception 
> type in catch block in {{readNext}} method of {{ProtobufLogReader}}.
> {code:title=ProtobufLogReader.java}
>       try {
>           ......
>           ProtobufUtil.mergeFrom(builder, new 
> LimitInputStream(this.inputStream, size),
>             (int)size);
>         } catch (IOException ipbe) { // <------ used to be 
> InvalidProtocolBufferException
>           throw (EOFException) new EOFException("Invalid PB, EOF? Ignoring; 
> originalPosition=" +
>             originalPosition + ", currentPosition=" + 
> this.inputStream.getPos() +
>             ", messageSize=" + size + ", currentAvailable=" + 
> available).initCause(ipbe);
>         }
> {code}
> Here if the {{inputStream}} throws an {{IOException}} due to timeout or 
> something, we just convert it to an {{EOFException}} and at the bottom of 
> this method, we ignore {{EOFException}} and return false. This cause the 
> upper layer think we reach the end of file. So when replaying we will treat 
> the HDFS timeout error as a normal end of file and cause data loss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to