[ 
https://issues.apache.org/jira/browse/HDFS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196735#comment-13196735
 ] 

Todd Lipcon commented on HDFS-2742:
-----------------------------------

Here's an explanation of why this is still important after HDFS-2791:

The crux of the issue is something I mentioned in an earlier comment:
{quote}
If instead we try to process the queued messages as soon as we first hear about 
the block, we have the opposite problem – step 3 and step 4 are switched. This 
causes problems for Safe Mode since it isn't properly accounting the number of 
complete blocks in that case. Hence the patch currently attached to this JIRA 
breaks TestHASafeMode.
{quote}

I've added two new unit tests which exercise this. In the first test, the 
sequence is the following:
State: SBN is in safe mode
1. Active NN opens some blocks for construction, writes OP_ADD to add them
2. Active NN rolls. SBN picks up the OP_ADD from the shared edits
3. Active NN closes the file, but doesn't roll edits.
4. DNs report FINALIZED replicas to the SBN. 
Here the SBN does not increment safeBlockCount, since the block is 
UNDER_CONSTRUCTION.
5. Active NN rolls. SBN replays OP_CLOSE. This increments the total block 
count, since we have one more finalized block. But it has to incrementally 
track that it has one more _safe_ block as well -- since it already had a 
finalized replica before it played OP_CLOSE. This is one of the fixes provided 
by HDFS-2742.

The second test is the following:
State: SBN is in safe mode
1. Active NN and SBN both see some files, and both have received block reports.
2. Active NN deletes the files, and writes OP_DELETE to the log. It does not 
yet send the deletion requests to the DNs.
3. Active NN rolls the log. SBN picks up the OP_DELETE.
4. SBN calls setBlockTotal, which sees that we now have fewer blocks in the 
namespace. This decreases the safemode "total" count. However, those blocks 
were also included in the "safe" count. So without this patch, the "safe" count 
ends up higher than the "total" and we crash with an assertion failure.


The other benefit of this patch is that the fine-grained tracking of queued 
block messages is a lot more efficient. In particular:
- the SBN will get "stuck" in safemode less often, since when it starts up, it 
can process the majority of the initial block reports, rather than queueing 
them all.
- the SBN's memory won't "blow out" nearly as fast if it falls behind reading 
the edit logs, since only the newest modified blocks will have to get queued.

                
> HA: observed dataloss in replication stress test
> ------------------------------------------------
>
>                 Key: HDFS-2742
>                 URL: https://issues.apache.org/jira/browse/HDFS-2742
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>         Attachments: hdfs-2742.txt, hdfs-2742.txt, hdfs-2742.txt, 
> hdfs-2742.txt, hdfs-2742.txt, hdfs-2742.txt, log-colorized.txt
>
>
> The replication stress test case failed over the weekend since one of the 
> replicas went missing. Still diagnosing the issue, but it seems like the 
> chain of events was something like:
> - a block report was generated on one of the nodes while the block was being 
> written - thus the block report listed the block as RBW
> - when the standby replayed this queued message, it was replayed after the 
> file was marked complete. Thus it marked this replica as corrupt
> - it asked the DN holding the corrupt replica to delete it. And, I think, 
> removed it from the block map at this time.
> - That DN then did another block report before receiving the deletion. This 
> caused it to be re-added to the block map, since it was "FINALIZED" now.
> - Replication was lowered on the file, and it counted the above replica as 
> non-corrupt, and asked for the other replicas to be deleted.
> - All replicas were lost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to