[jira] [Commented] (HBASE-26849) NPE caused by WAL Compression and Replication

Andrew Kyle Purtell (Jira) Fri, 25 Mar 2022 09:28:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512465#comment-17512465
 ]


Andrew Kyle Purtell commented on HBASE-26849:
---------------------------------------------

bq. We might as well open another issue and consider how to fundamentally 
refactor Dict.

Agreed, that would be the preferred approach anyway. 

> NPE caused by WAL Compression and Replication
> ---------------------------------------------
>
>                 Key: HBASE-26849
>                 URL: https://issues.apache.org/jira/browse/HBASE-26849
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11
>            Reporter: tianhang tang
>            Assignee: tianhang tang
>            Priority: Critical
>         Attachments: image-2022-03-16-14-25-49-276.png, 
> image-2022-03-16-14-30-15-247.png
>
>
> My cluster uses HBase 1.4.12, opened WAL compression and replication.
> I could found replication sizeOfLogQueue backlog, and after some debugs, 
> found the NPE throwed by 
> [https://github.com/apache/hbase/blob/branch-1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/LRUDictionary.java#L109:]
> !image-2022-03-16-14-25-49-276.png!
>  
> The root cause for this problem is:
> WALEntryStream#checkAllBytesParsed:
> !image-2022-03-16-14-30-15-247.png!
> resetReader does not create a new reader, the original CompressionContext and 
> the dict in it will still be retained.
> However, at this time, the position is reset to 0, which means that the HLog 
> needs to be read from the beginning, but the cache that has not been cleared 
> is still used, so there will be problems: the same data has already in the 
> LRUCache, and it will be directly added to the cache again.
> Recreate a new reader here, the problem is solved.
> I will open a PR later. But, there are some other places in the current code 
> to resetReader or seekOnFs. I guess these codes doesn't take into account the 
> wal compression case at all...
>  
> In theory, as long as the file is read again, the LRUCache should also be 
> rolled back, otherwise there will be inconsistent behavior of READ and WRITE 
> links.
> But the position can be roll back to any intermediate position at will, but 
> LRUCache can't...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HBASE-26849) NPE caused by WAL Compression and Replication

Reply via email to