The old WAL compression implementation is buggy when used together with replication, that's true...
But in general I think it is fixable, the dict is per file IIRC, so I think clearing the LRUCache when resetting to the head of the file can fix the problem? Maybe we need to do some refactoring... Thanks. 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 16:20写道: > Hi masters, > > I have created an issue HBASE-26849 > <https://issues.apache.org/jira/browse/HBASE-26849> about NPE caused by > WAL > Compression and Replication. > > For this problem, I try to reopen a WAL reader when we reset the position > to 0 and it looks like it's working well. But it didn't fundamentally solve > the problem. > > Since we have the WAL Compression feature, Replication has introduced a lot > of new code, and there are many places that reset the HLog position, such > as seekOnFs to originalPosition. I guess none of these codes consider > compatibility with WAL Compression. Because theoretically we can roll back > the position to any position at any time, but the LRUCache in the > corresponding LRUDictionary should also be rolled back, otherwise the read > and write link behavior may be inconsistent. But LRUCache can't roll back > at all... > > So my thought is, open another issue and add some description in the doc, > WAL Compression and Replication are not compatible. > > What do you think? > > Thank you. Regards >