[jira] [Commented] (HBASE-28260) Possible data loss in WAL after RegionServer crash

Bryan Beaudreault (Jira) Thu, 14 Dec 2023 09:22:04 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796839#comment-17796839
 ]


Bryan Beaudreault commented on HBASE-28260:
-------------------------------------------

Re: replication, I guess it would increase network traffic a bit on the source 
cluster if there were not a local replica. That could be a consideration here.

> Possible data loss in WAL after RegionServer crash
> --------------------------------------------------
>
>                 Key: HBASE-28260
>                 URL: https://issues.apache.org/jira/browse/HBASE-28260
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> We recently had a production incident:
>  # RegionServer crashes, but local DataNode lives on
>  # WAL lease recovery kicks in
>  # Namenode reconstructs the block during lease recovery (which results in a 
> new genstamp). It chooses the replica on the local DataNode as the primary.
>  # Local DataNode reconstructs the block, so NameNode registers the new 
> genstamp.
>  # Local DataNode and the underlying host dies, before the new block could be 
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no 
> replicas. The old replicas still remain, but are considered corrupt due to 
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were 
> identical to the newly constructed and lost block. Further, the file in 
> question was only 1 block. So we downloaded one of those corrupt block files 
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in 
> hdfs. So in this case we had no actual data loss, but it could have happened 
> easily if the file was more than 1 block or the replicas weren't fully in 
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the 
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to 
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from 
> the local datanode, but avoiding writing there altogether would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28260) Possible data loss in WAL after RegionServer crash

Reply via email to