[
https://issues.apache.org/jira/browse/HADOOP-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dhruba borthakur resolved HADOOP-1306.
--------------------------------------
Resolution: Duplicate
The slowness of getAdditionalBlock RPC has been addressed by HADOOP-1269,
HADOOP-1187, HADOOP-1149 AND HADOOP-1073
> DFS Scalability: Reduce the number of getAdditionalBlock RPCs on the namenode
> -----------------------------------------------------------------------------
>
> Key: HADOOP-1306
> URL: https://issues.apache.org/jira/browse/HADOOP-1306
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Reporter: dhruba borthakur
> Attachments: fineGrainLocks3.patch
>
>
> One of the most-frequently-invoked RPCs in the namenode is the addBlock()
> RPC. The DFSClient uses this RPC to allocate one more block for a file that
> it is currently operating upon. The scalability of the namenode will improve
> if we can decrease the number of addBlock() RPCs. One idea that we want to
> discuss here is to make addBlock() return more than one block. This proposal
> came out of a discussion I had with Ben Reed.
> Let's say that addBlock() returns n blocks for the file. The namenode already
> tracks these blocks using the pendingCreates data structure. The client
> guarantees that these n blocks will be used in order. The client also
> guarantees that if it cannot use a block (dues to whatever reason), it will
> inform the namenode using the abandonBlock() RPC. These RPCs are already
> supported.
> Another possible optimization : since the namenode has to allocate n blocks
> for a file, should it use the same set of datanodes for this set of blocks?
> My proposal is that if n is a small number (e.g. 3), it is prudent to
> allocate the same set of datanodes to host all replicas for this set of
> blocks. This will reduce the CPU spent in chooseTargets().
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.