> > Maybe we need to do some refactoring... > I can not agree more... But before that, I think we'd better point out this compatibility issue explicitly in our doc.
唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 18:08写道: > If we only reset the position to the head, yes we can fix it. > In fact, 26849 is to fix the problem in this scenario. > But unfortunately, we have some other scenarios where we roll back the > position to some intermediate position, such as > ProtobufLogReader.java#L381 > <https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java%20/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381> > I think we cannot rollback the LRUCache too... > While my cluster works fine after 26849, the fix is still theoretically > incomplete. > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 17:59写道: > >> The old WAL compression implementation is buggy when used together with >> replication, that's true... >> >> But in general I think it is fixable, the dict is per file IIRC, so I >> think >> clearing the LRUCache when resetting to the head of the file can fix the >> problem? >> >> Maybe we need to do some refactoring... >> >> Thanks. >> >> 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 16:20写道: >> >> > Hi masters, >> > >> > I have created an issue HBASE-26849 >> > <https://issues.apache.org/jira/browse/HBASE-26849> about NPE caused by >> > WAL >> > Compression and Replication. >> > >> > For this problem, I try to reopen a WAL reader when we reset the >> position >> > to 0 and it looks like it's working well. But it didn't fundamentally >> solve >> > the problem. >> > >> > Since we have the WAL Compression feature, Replication has introduced a >> lot >> > of new code, and there are many places that reset the HLog position, >> such >> > as seekOnFs to originalPosition. I guess none of these codes consider >> > compatibility with WAL Compression. Because theoretically we can roll >> back >> > the position to any position at any time, but the LRUCache in the >> > corresponding LRUDictionary should also be rolled back, otherwise the >> read >> > and write link behavior may be inconsistent. But LRUCache can't roll >> back >> > at all... >> > >> > So my thought is, open another issue and add some description in the >> doc, >> > WAL Compression and Replication are not compatible. >> > >> > What do you think? >> > >> > Thank you. Regards >> > >> >