[ https://issues.apache.org/jira/browse/HADOOP-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated HADOOP-2148: ---------------------------------------- Description: FSDataset.getBlockFile() first verifies that the block is valid and then returns the file name corresponding to the block. Doing that it performs the data-node blockMap lookup twice. Only one lookup is needed here. This is important since the data-node blockMap is big. Another observation is that data-nodes do not need the blockMap at all. File names can be derived from the block IDs, there is no need to hold Block to File mapping in memory. was: FSDataset.getBlockFile() first verifies that the block is valid and then returns the file name corresponding to the block. Doing that it performs the data-node blockMap lookup twice. Only one lookup is needed here. This is important since the data-node blockMap is big. > Inefficient FSDataset.getBlockFile() > ------------------------------------ > > Key: HADOOP-2148 > URL: https://issues.apache.org/jira/browse/HADOOP-2148 > Project: Hadoop > Issue Type: Improvement > Affects Versions: 0.14.0 > Reporter: Konstantin Shvachko > Fix For: 0.16.0 > > > FSDataset.getBlockFile() first verifies that the block is valid and then > returns the file name corresponding to the block. > Doing that it performs the data-node blockMap lookup twice. Only one lookup > is needed here. > This is important since the data-node blockMap is big. > Another observation is that data-nodes do not need the blockMap at all. File > names can be derived from the block IDs, > there is no need to hold Block to File mapping in memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.