[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568979#comment-13568979
 ] 

Suresh Srinivas commented on HDFS-4461:
---------------------------------------

bq. If someone is running with around 200,000 blocks (a reasonable number), and 
a 50 to 80 character path, this change saves between 50 and 100 MB of heap 
space during the DirectoryScanner run. That's what we should be focusing on 
here-- the efficiency improvement. After all, that is why I marked this JIRA as 
"improvement" rather than "bug" 

I think you are missing the point I made earlier. In the description you say:
bq. This has been causing out-of-memory conditions for users who pick such long 
volume paths.
It is not correct to attribute the inefficiency in memory of DirectoryScanner 
to OOM. So please update the description to say DirectoryScanner can be made 
more efficient.

bq. I saw more than 1 million ScanInfo objects
I am interested in seeing the number of blocks in this particular setup and if 
we are leaking these objects.

I am more leaning towards incorrect datanode configuration in the setup where 
you saw OOM. Can you provide details on what the heap size of datanode is, the 
number of blocks on the datanode etc.?
                
> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-4461
>                 URL: https://issues.apache.org/jira/browse/HDFS-4461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This has been causing out-of-memory conditions for users 
> who pick such long volume paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to