[ 
https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3290:
---------------------------------------

    Description: 
When the HDFS DataNode stores chunks in a local directory, it currently puts 
all of the chunk files into either one big directory, or a collection of 
directories.  However, there is no way to know which directory a given block 
will end up in, given its ID.  As the number of files increases, this does not 
scale well.

Similar to the git version control system, HDFS should create a few different 
top level directories keyed off of a few bits in the chunk ID.  Git uses 8 
bits.  This substantially cuts down on the number of chunk files in the same 
directory and gives increased performance, while not compromising O(1) lookup 
of chunks.

  was:
When the HDFS DataNode stores chunks in a local directory, it currently puts 
all of the chunk files into one big directory.  As the number of files 
increases, this does not work well at all.  Local filesystems are not optimized 
for the case where there are hundreds of thousands of files in the same 
directory.  It also makes inspecting directories with standard UNIX tools 
difficult.

Similar to the git version control system, HDFS should create a few different 
top level directories keyed off of a few bits in the chunk ID.  Git uses 8 
bits.  This substantially cuts down on the number of chunk files in the same 
directory and gives increased performance.

    
> Use a better local directory layout for the datanode
> ----------------------------------------------------
>
>                 Key: HDFS-3290
>                 URL: https://issues.apache.org/jira/browse/HDFS-3290
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>
> When the HDFS DataNode stores chunks in a local directory, it currently puts 
> all of the chunk files into either one big directory, or a collection of 
> directories.  However, there is no way to know which directory a given block 
> will end up in, given its ID.  As the number of files increases, this does 
> not scale well.
> Similar to the git version control system, HDFS should create a few different 
> top level directories keyed off of a few bits in the chunk ID.  Git uses 8 
> bits.  This substantially cuts down on the number of chunk files in the same 
> directory and gives increased performance, while not compromising O(1) lookup 
> of chunks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to