[jira] [Commented] (HDFS-9949) Testcase for catching DN UUID regeneration regression

Colin Patrick McCabe (JIRA) Fri, 18 Mar 2016 22:06:31 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198022#comment-15198022
 ]


Colin Patrick McCabe commented on HDFS-9949:
--------------------------------------------

Thanks, [~qwertymaniac].  Great find, and great root-cause analysis.

{code}
106      while (!cluster.getDataNodes().get(0).isDatanodeFullyStarted()) {
106             Thread.sleep(500);
107           }
{code}
Can we have a sleep for 50 ms here instead, just to take advantage of cases 
where the registration is quicker?

+1 once that's addressed.

> Testcase for catching DN UUID regeneration regression
> -----------------------------------------------------
>
>                 Key: HDFS-9949
>                 URL: https://issues.apache.org/jira/browse/HDFS-9949
>             Project: Hadoop HDFS
>          Issue Type: Test
>    Affects Versions: 2.6.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: HDFS-9949.000.branch-2.7.not-for-commit.patch, 
> HDFS-9949.000.patch
>
>
> In the following scenario, in releases without HDFS-8211, the DN may 
> regenerate its UUIDs unintentionally.
> 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}}
> 1. Stop DN
> 2. Unmount the second disk, {{/data2/dfs/dn}}
> 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root 
> path
> 4. Start DN
> 5. DN now considers /data2/dfs/dn empty so formats it, but during the format 
> it uses {{datanode.getDatanodeUuid()}} which is null until register() is 
> called.
> 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} 
> gets called with successful condition, and it causes a new generation of UUID 
> which is written to all disks {{/data1/dfs/dn/current/VERSION}} and 
> {{/data2/dfs/dn/current/VERSION}}.
> 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was 
> realised)
> 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the 
> {{VERSION}} file to be the original one again on it (mounting masks the root 
> path that we last generated upon).
> 9. DN fails to start up cause it finds mismatched UUID between the two disks
> The DN should not generate a new UUID if one of the storage disks already 
> have the older one.
> HDFS-8211 unintentionally fixes this by changing the 
> {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} 
> representation of the UUID vs. the {{DatanodeID}} object which only gets 
> available (non-null) _after_ the registration.
> It'd still be good to add a direct test case to the above scenario that 
> passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can 
> catch a regression around this in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9949) Testcase for catching DN UUID regeneration regression

Reply via email to