[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated MAPREDUCE-1374: ---------------------------------- Attachment: MAPREDUCE-1374.2.patch Added test case. > Reduce memory footprint of FileSplit > ------------------------------------ > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > - retVal[index++] = host.node.getName().split(":")[0]; > + retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.