[jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode

Matt Foley (Commented) (JIRA) Wed, 18 Apr 2012 11:27:08 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256783#comment-13256783
 ]


Matt Foley commented on HDFS-3290:
----------------------------------

Hi Colin,
I think you've misunderstood the block storage.  In each data sub-directory, it 
stores the next 64 blocks (by default) and their metadata (128 files 
altogether), then spawns up to 64 new subdirectories and starts filling those.  
Recurses as necessary.  Result is a directory tree where each sub-directory has 
a max of 192 objects, and the leaf directories have 128 or less.

Please see org.apache.hadoop.hdfs.server.datanode.FSDataset.FSDir.addBlock() in 
the hadoop-1 branch.
                
> Use a better local directory layout for the datanode
> ----------------------------------------------------
>
>                 Key: HDFS-3290
>                 URL: https://issues.apache.org/jira/browse/HDFS-3290
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>
> When the HDFS DataNode stores chunks in a local directory, it currently puts 
> all of the chunk files into one big directory.  As the number of files 
> increases, this does not work well at all.  Local filesystems are not 
> optimized for the case where there are hundreds of thousands of files in the 
> same directory.  It also makes inspecting directories with standard UNIX 
> tools difficult.
> Similar to the git version control system, HDFS should create a few different 
> top level directories keyed off of a few bits in the chunk ID.  Git uses 8 
> bits.  This substantially cuts down on the number of chunk files in the same 
> directory and gives increased performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode

Reply via email to