[ https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796839#comment-17796839 ]
Bryan Beaudreault commented on HBASE-28260: ------------------------------------------- Re: replication, I guess it would increase network traffic a bit on the source cluster if there were not a local replica. That could be a consideration here. > Possible data loss in WAL after RegionServer crash > -------------------------------------------------- > > Key: HBASE-28260 > URL: https://issues.apache.org/jira/browse/HBASE-28260 > Project: HBase > Issue Type: Bug > Reporter: Bryan Beaudreault > Priority: Major > > We recently had a production incident: > # RegionServer crashes, but local DataNode lives on > # WAL lease recovery kicks in > # Namenode reconstructs the block during lease recovery (which results in a > new genstamp). It chooses the replica on the local DataNode as the primary. > # Local DataNode reconstructs the block, so NameNode registers the new > genstamp. > # Local DataNode and the underlying host dies, before the new block could be > replicated to other replicas. > This leaves us with a missing block, because the new genstamp block has no > replicas. The old replicas still remain, but are considered corrupt due to > GENSTAMP_MISMATCH. > Thankfully we were able to confirm that the length of the corrupt blocks were > identical to the newly constructed and lost block. Further, the file in > question was only 1 block. So we downloaded one of those corrupt block files > and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in > hdfs. So in this case we had no actual data loss, but it could have happened > easily if the file was more than 1 block or the replicas weren't fully in > sync prior to reconstruction. > In order to avoid this issue, we should avoid writing WAL blocks too the > local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to > [~weichiu] for pointing this out. > During reading of WALs we already reorder blocks so as to avoid reading from > the local datanode, but avoiding writing there altogether would be better. -- This message was sent by Atlassian Jira (v8.20.10#820010)