[ https://issues.apache.org/jira/browse/HBASE-26849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512465#comment-17512465 ]
Andrew Kyle Purtell commented on HBASE-26849: --------------------------------------------- bq. We might as well open another issue and consider how to fundamentally refactor Dict. Agreed, that would be the preferred approach anyway. > NPE caused by WAL Compression and Replication > --------------------------------------------- > > Key: HBASE-26849 > URL: https://issues.apache.org/jira/browse/HBASE-26849 > Project: HBase > Issue Type: Bug > Components: Replication, wal > Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11 > Reporter: tianhang tang > Assignee: tianhang tang > Priority: Critical > Attachments: image-2022-03-16-14-25-49-276.png, > image-2022-03-16-14-30-15-247.png > > > My cluster uses HBase 1.4.12, opened WAL compression and replication. > I could found replication sizeOfLogQueue backlog, and after some debugs, > found the NPE throwed by > [https://github.com/apache/hbase/blob/branch-1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/LRUDictionary.java#L109:] > !image-2022-03-16-14-25-49-276.png! > > The root cause for this problem is: > WALEntryStream#checkAllBytesParsed: > !image-2022-03-16-14-30-15-247.png! > resetReader does not create a new reader, the original CompressionContext and > the dict in it will still be retained. > However, at this time, the position is reset to 0, which means that the HLog > needs to be read from the beginning, but the cache that has not been cleared > is still used, so there will be problems: the same data has already in the > LRUCache, and it will be directly added to the cache again. > Recreate a new reader here, the problem is solved. > I will open a PR later. But, there are some other places in the current code > to resetReader or seekOnFs. I guess these codes doesn't take into account the > wal compression case at all... > > In theory, as long as the file is read again, the LRUCache should also be > rolled back, otherwise there will be inconsistent behavior of READ and WRITE > links. > But the position can be roll back to any intermediate position at will, but > LRUCache can't... -- This message was sent by Atlassian Jira (v8.20.1#820001)