[ https://issues.apache.org/jira/browse/HBASE-26347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaolin Ha resolved HBASE-26347. -------------------------------- Resolution: Fixed Merged to master, thanks [~zhangduo] for reviewing. > Support detect and exclude slow DNs in fan-out of WAL > ----------------------------------------------------- > > Key: HBASE-26347 > URL: https://issues.apache.org/jira/browse/HBASE-26347 > Project: HBase > Issue Type: New Feature > Components: wal > Affects Versions: 2.0.0, 3.0.0-alpha-2 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > Fix For: 3.0.0-alpha-3 > > > We all knows the WAL sync performance directly affects the RPC process time. > And we use self-designed FanOutOneBlockAsyncDFSOutput to sync WAL entries, > which connect straightly to all the block located DNs. But when even one DN > of the locations is slow, e.g. some disk hardware failures, the WAL syncs > slow. And what's more, the hardware failure detected by the lower layer HDFS > system is not so sensitive. > We can detect slow DNs by the ACK time of packets in > FanOutOneBlockAsyncDFSOutput, and exclude them when add new blocks after log > rolled(rolling log can also be triggered by slow syncs). And shows this info > in UI. We can also invalid these excluded DN cache after a duration, to aware > the recovery of those DNs. > I think this idea can quickly reduce the influence of slow DNs, and improve > the service availability. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)