[ https://issues.apache.org/jira/browse/HDFS-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J updated HDFS-9949: -------------------------- Target Version/s: 3.0.0, 2.8.0, 2.9.0 Status: Patch Available (was: Open) > Testcase for catching DN UUID regeneration regression > ----------------------------------------------------- > > Key: HDFS-9949 > URL: https://issues.apache.org/jira/browse/HDFS-9949 > Project: Hadoop HDFS > Issue Type: Test > Affects Versions: 2.6.0 > Reporter: Harsh J > Assignee: Harsh J > Priority: Minor > Attachments: HDFS-9949.000.branch-2.7.not-for-commit.patch, > HDFS-9949.000.patch > > > In the following scenario, in releases without HDFS-8211, the DN may > regenerate its UUIDs unintentionally. > 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}} > 1. Stop DN > 2. Unmount the second disk, {{/data2/dfs/dn}} > 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root > path > 4. Start DN > 5. DN now considers /data2/dfs/dn empty so formats it, but during the format > it uses {{datanode.getDatanodeUuid()}} which is null until register() is > called. > 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} > gets called with successful condition, and it causes a new generation of UUID > which is written to all disks {{/data1/dfs/dn/current/VERSION}} and > {{/data2/dfs/dn/current/VERSION}}. > 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was > realised) > 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the > {{VERSION}} file to be the original one again on it (mounting masks the root > path that we last generated upon). > 9. DN fails to start up cause it finds mismatched UUID between the two disks > The DN should not generate a new UUID if one of the storage disks already > have the older one. > HDFS-8211 unintentionally fixes this by changing the > {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} > representation of the UUID vs. the {{DatanodeID}} object which only gets > available (non-null) _after_ the registration. > It'd still be good to add a direct test case to the above scenario that > passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can > catch a regression around this in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)