[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568979#comment-13568979 ]
Suresh Srinivas commented on HDFS-4461: --------------------------------------- bq. If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug" I think you are missing the point I made earlier. In the description you say: bq. This has been causing out-of-memory conditions for users who pick such long volume paths. It is not correct to attribute the inefficiency in memory of DirectoryScanner to OOM. So please update the description to say DirectoryScanner can be made more efficient. bq. I saw more than 1 million ScanInfo objects I am interested in seeing the number of blocks in this particular setup and if we are leaking these objects. I am more leaning towards incorrect datanode configuration in the setup where you saw OOM. Can you provide details on what the heap size of datanode is, the number of blocks on the datanode etc.? > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > ------------------------------------------------------------------------------------- > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.3-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This has been causing out-of-memory conditions for users > who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira