[jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode

Matt Foley (Commented) (JIRA) Wed, 18 Apr 2012 14:49:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257001#comment-13257001
 ]


Matt Foley commented on HDFS-3290:
----------------------------------

bq. So you have to search every directory to find a given block. This clearly 
won't scale as the number of directories increases.

Colin, your new assertion is also incorrect.  I'll leave it as an exercise for 
the student to figure out what the datanode really does when it needs to look 
up a block.

I'm glad you're enthusiastic about improving Hadoop.  But please consider that 
this code is actually quite mature and works really well, in production in 
multiple companies with 100's of millions of blocks under management.  
Evidently it DOES scale.  So please read the code and thoroughly understand how 
it really works before suggesting major changes.  There are plenty of things 
that can be improved; but changing stuff that "ain't broke" induces errors and 
unnecessary risk.  Thank you.

                
> Use a better local directory layout for the datanode
> ----------------------------------------------------
>
>                 Key: HDFS-3290
>                 URL: https://issues.apache.org/jira/browse/HDFS-3290
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>
> When the HDFS DataNode stores chunks in a local directory, it currently puts 
> all of the chunk files into either one big directory, or a collection of 
> directories.  However, there is no way to know which directory a given block 
> will end up in, given its ID.  As the number of files increases, this does 
> not scale well.
> Similar to the git version control system, HDFS should create a few different 
> top level directories keyed off of a few bits in the chunk ID.  Git uses 8 
> bits.  This substantially cuts down on the number of chunk files in the same 
> directory and gives increased performance, while not compromising O(1) lookup 
> of chunks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode

Reply via email to