[ 
https://issues.apache.org/jira/browse/HADOOP-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated HADOOP-1762:
---------------------------------

    Attachment: HADOOP-1762.patch

One version of the patch :

- StorageIDs and datanodes are not stored persistently.
- When ever a new StorageID is required, we generate new random ID just like 
before. This patch generates a 48bit integer instead of 32 bit.
- In case of collision, new datanode replaces the old Datanode and new datanode 
gets a new storageID. 
- The old datanode will re-register. So the datanode will appear to be absent 
for one heartbeat period.
- Probability of collision is nothing compared probablity of losing a datanode 
for other reasons. Even then the datnode would be missing only for a short 
while.

I will attach another version of the patch which does the following :

- Instead of Namenode generating the id, datanode itself generates an id in the 
form "randInt-ip-pid-timestamp".  
- This will make collision even less probable. Since Namenode does not assign 
StorageIDs, it can not work around like the the current patch. Namenode will 
just log the suspected collision. In this case, two datanodes involved in the 
collision will keep replacing each other.
- We could make namenode change a storageID. But this goes against the 
requirement that only datanode should assign a storageID.
- Note that this will increase the size of StorageID string which is copied 
everytime DatanodeID is passed. 


> Namenode does not need to store storageID and datanodeID persistently
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-1762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1762
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-1762.patch
>
>
> Currently Namenode stores all the storage-ids it generates since the 
> beginning (since last format). It allocates a new storageID everytime a new 
> datanode comes online. It also stores all the known datanode ids since the 
> beginning. 
> It would be better if Namenode did not have to keep track of these. I will 
> describe a proposal in the next comment. 
> This has implecations regd how Namenode helps administrators identify 'dead 
> datanodes' etc. These issues are addressed in HADOOP-1138.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to