[ https://issues.apache.org/jira/browse/HBASE-27231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763825#comment-17763825 ]
Andrew Kyle Purtell commented on HBASE-27231: --------------------------------------------- I think we can. I cherry picked the master commit for this JIRA from master branch to our internal fork of 2.5.5 and only one unit test is not passing, and I think it is because the test itself is no longer valid. I will report back when the internal change is all green. > FSHLog should retry writing WAL entries when syncs to HDFS failed. > ------------------------------------------------------------------ > > Key: HBASE-27231 > URL: https://issues.apache.org/jira/browse/HBASE-27231 > Project: HBase > Issue Type: Improvement > Components: wal > Affects Versions: 3.0.0-alpha-4 > Reporter: chenglei > Assignee: chenglei > Priority: Major > Fix For: 3.0.0-beta-1 > > > Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, we > do not know whether the data has been persistent or not. The implementation > for {{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries > again, and then adding logic in WAL split and replay to deal with duplicate > entries. But for {{FSHLog}}, it does not have the same logic with > {{AsyncFSWAL}}, when {{ProtobufLogWriter.append}} and > {{ProtobufLogWriter.sync}} failed, {{FSHLog.sync}} immediately throws the > exception thrown by {{ProtobufLogWriter.append}} and > {{ProtobufLogWriter.sync}} , we should implement the same retry logic as > {{AsyncFSWAL}}, so {{WAL.sync}} could only throw {{TimeoutIOException}} and > we could uniformly abort the RegionServer when {{WAL.sync}} failed. > The basic idea is because both {{FSHLog.RingBufferEventHandler}} and > {{AsyncFSWAL.consumeExecutor}} are single-thread, we could reuse the logic > in {{AsyncWAL}} and move the most code in {{AsyncWAL}} upward to > {{AbstractFSWAL}} , and just adapting the {{SyncRunner}} in {{FSHLog}} to the > logic in {{AsyncWriter.sync}}. Once we do that, most logic in {{AsyncWAL}} > and {{FSHLog}} are unified, just how to sync the {{writer}} is different. -- This message was sent by Atlassian Jira (v8.20.10#820010)