[
https://issues.apache.org/jira/browse/HADOOP-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-1762:
---------------------------------
Attachment: HADOOP-1762.patch
One version of the patch :
- StorageIDs and datanodes are not stored persistently.
- When ever a new StorageID is required, we generate new random ID just like
before. This patch generates a 48bit integer instead of 32 bit.
- In case of collision, new datanode replaces the old Datanode and new datanode
gets a new storageID.
- The old datanode will re-register. So the datanode will appear to be absent
for one heartbeat period.
- Probability of collision is nothing compared probablity of losing a datanode
for other reasons. Even then the datnode would be missing only for a short
while.
I will attach another version of the patch which does the following :
- Instead of Namenode generating the id, datanode itself generates an id in the
form "randInt-ip-pid-timestamp".
- This will make collision even less probable. Since Namenode does not assign
StorageIDs, it can not work around like the the current patch. Namenode will
just log the suspected collision. In this case, two datanodes involved in the
collision will keep replacing each other.
- We could make namenode change a storageID. But this goes against the
requirement that only datanode should assign a storageID.
- Note that this will increase the size of StorageID string which is copied
everytime DatanodeID is passed.
> Namenode does not need to store storageID and datanodeID persistently
> ---------------------------------------------------------------------
>
> Key: HADOOP-1762
> URL: https://issues.apache.org/jira/browse/HADOOP-1762
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.14.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-1762.patch
>
>
> Currently Namenode stores all the storage-ids it generates since the
> beginning (since last format). It allocates a new storageID everytime a new
> datanode comes online. It also stores all the known datanode ids since the
> beginning.
> It would be better if Namenode did not have to keep track of these. I will
> describe a proposal in the next comment.
> This has implecations regd how Namenode helps administrators identify 'dead
> datanodes' etc. These issues are addressed in HADOOP-1138.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.