[ http://issues.apache.org/jira/browse/HADOOP-764?page=comments#action_12456276 ] Raghu Angadi commented on HADOOP-764: -------------------------------------
As a related note, since we do a blocks.clear() in DatanodeDescriptor when we update blocks from , this results in one separate copy of a block exists for each node (i.e. one for each replica) and one coy in NameNode's blockMap(). Ideally blocks in DatanodeDescriptor should be references to blocks in global blockMap. This patch decreases number of times blocks.clear() is invoked but over time there will be separate copies of blocks. Fix is not to call blocks.clear() but update blocks map inline when new blocks are removed or added inside processReport(). > The memory consumption of processReport() in the namenode can be reduced > ------------------------------------------------------------------------ > > Key: HADOOP-764 > URL: http://issues.apache.org/jira/browse/HADOOP-764 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > Assigned To: dhruba borthakur > Attachments: processBlockReport3.patch > > > The FSNamesystem.processReport() method converts the blocklist for a datanode > into an array by calling node.getBlocks(). Although this memory allocation is > transient, it could possibly require the garbage-collector to work that much > harder. > The method Block.getBlocks() should be deprecated. Code that currently uses > this method should instead iterate over the Collection. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira