[
https://issues.apache.org/jira/browse/HADOOP-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522670
]
Raghu Angadi commented on HADOOP-1762:
--------------------------------------
Proposed implementation :
- Namenode stores one integer {{lastStorageId}} persistently
- When a Namenode starts, it does know about any storageIds except
{{lastStorageId}}
- When a datanode D1 registers: {code}
if ( D1.storageID == 0 or D1.storageID > lastStorageId) {
D1.storageID = lastStorageIDd++; // take care of overflow etc
EditLog.write.(LAST_STORAGE_ID, lastStorageID);
}
// same as current behaviour
// Check if D1.storageID is already registered etc
{code}
- Another simpler alternative: Don't keep track of lastStorageID but always
assign a random storage id when ever a new storage ID is required. Especially
if we use 64 bit integer, probability of collision is pretty much as low.
- What about when lastStorageID is INT_MAX? We can use 64bit integer..
probably we should. And even if 32bit integer rolls, its ok.
- In either case, collision probability would still be minuscule compared to
probability of similar damage (losing a datanode).
- If there is an actual collision, apart from namenode losing one datanode,
there is another consequence : If two nodes Dx and Dy get the same storage id,
then each will keep replacing the other at the namenode. To avoid this,
whenever a new datanode registers with an existing storage id, just assign a
new storage id, instead of reusing the old one.
- If we use 'lastStorageID' method, then, when a datanode starts up this
hadoop 0.15 for the first time, it should zero out its storage id. Apart from
this, there are no other changes required at the datanode.
I personally prefer the random storage id.
> Namenode does not need to store storageID and datanodeID persistently
> ---------------------------------------------------------------------
>
> Key: HADOOP-1762
> URL: https://issues.apache.org/jira/browse/HADOOP-1762
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.14.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
>
> Currently Namenode stores all the storage-ids it generates since the
> beginning (since last format). It allocates a new storageID everytime a new
> datanode comes online. It also stores all the known datanode ids since the
> beginning.
> It would be better if Namenode did not have to keep track of these. I will
> describe a proposal in the next comment.
> This has implecations regd how Namenode helps administrators identify 'dead
> datanodes' etc. These issues are addressed in HADOOP-1138.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.