[
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault resolved HBASE-28260.
---------------------------------------
Fix Version/s: 2.6.0
3.0.0-beta-2
Resolution: Fixed
Pushed to branch-2.6+. Note that NO_LOCAL_WRITE was added back in 2016 for
hbase's specific use, but apparently never used. So this Jira finally closes
the loop on HDFS-3702. Thanks [~charlesconnell] for the contribution!
> Possible data loss in WAL after RegionServer crash
> --------------------------------------------------
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> We recently had a production incident:
> # RegionServer crashes, but local DataNode lives on
> # WAL lease recovery kicks in
> # Namenode reconstructs the block during lease recovery (which results in a
> new genstamp). It chooses the replica on the local DataNode as the primary.
> # Local DataNode reconstructs the block, so NameNode registers the new
> genstamp.
> # Local DataNode and the underlying host dies, before the new block could be
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no
> replicas. The old replicas still remain, but are considered corrupt due to
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were
> identical to the newly constructed and lost block. Further, the file in
> question was only 1 block. So we downloaded one of those corrupt block files
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in
> hdfs. So in this case we had no actual data loss, but it could have happened
> easily if the file was more than 1 block or the replicas weren't fully in
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from
> the local datanode, but avoiding writing there altogether would be better.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)