[
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault reopened HBASE-28260:
---------------------------------------
Assignee: Charles Connell
Actually, since this is a bug and it applies cleanly to branch-2.5, I'm
reopening for cherry-pick there.
> Possible data loss in WAL after RegionServer crash
> --------------------------------------------------
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Assignee: Charles Connell
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> We recently had a production incident:
> # RegionServer crashes, but local DataNode lives on
> # WAL lease recovery kicks in
> # Namenode reconstructs the block during lease recovery (which results in a
> new genstamp). It chooses the replica on the local DataNode as the primary.
> # Local DataNode reconstructs the block, so NameNode registers the new
> genstamp.
> # Local DataNode and the underlying host dies, before the new block could be
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no
> replicas. The old replicas still remain, but are considered corrupt due to
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were
> identical to the newly constructed and lost block. Further, the file in
> question was only 1 block. So we downloaded one of those corrupt block files
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in
> hdfs. So in this case we had no actual data loss, but it could have happened
> easily if the file was more than 1 block or the replicas weren't fully in
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from
> the local datanode, but avoiding writing there altogether would be better.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)