[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

Esteban Gutierrez (JIRA) Tue, 22 May 2018 13:36:19 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484522#comment-16484522
 ]


Esteban Gutierrez commented on HBASE-20604:
-------------------------------------------

Thanks [~Apache9]. I'm looking into injecting a failure in 
{{ProtobufUtil.mergeFrom()}} or maybe directly into {{FSDataInputStream}} in 
order to have more accurate test. 

Attaching new patch that additionally does a seek back to the original position 
of the stream when no KVs are present so an additional read to the stream 
shouldn't trigger an unnecessary EOFException.


> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20604
>                 URL: https://issues.apache.org/jira/browse/HBASE-20604
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 3.0.0
>            Reporter: Esteban Gutierrez
>            Assignee: Esteban Gutierrez
>            Priority: Critical
>         Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

Reply via email to