[jira] [Comment Edited] (HDFS-7174) Support for more efficient large directories

Kihwal Lee (JIRA) Thu, 09 Oct 2014 10:18:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165389#comment-14165389
 ]


Kihwal Lee edited comment on HDFS-7174 at 10/9/14 5:16 PM:
-----------------------------------------------------------

bq. As Yi Liu pointed out, the current patch has a problem. If we go back and 
forth between switchingThreshold (say, by repeatedly adding and removing a 
single element to a directory), we pay a very high cost. To prevent this, the 
threshold for converting a INodeHashedArrayList back to a simple INodeArrayList 
should be lower than the threshold for doing the opposite conversion.

There is a low watermark, with is 90% of the conversion threshold. So it won't 
flip back and forth like that.


was (Author: kihwal):
bq. As Yi Liu pointed out, the current patch has a problem. If we go back and 
forth between switchingThreshold (say, by repeatedly adding and removing a 
single element to a directory), we pay a very high cost. To prevent this, the 
threshold for converting a INodeHashedArrayList back to a simple INodeArrayList 
should be lower than the threshold for doing the opposite conversion.

There is low watermark, with is 90% of the conversion threshold. So it won't 
flip back and forth like that.

> Support for more efficient large directories
> --------------------------------------------
>
>                 Key: HDFS-7174
>                 URL: https://issues.apache.org/jira/browse/HDFS-7174
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion 
> becomes very costly.  E.g. creating 1M entries takes 10s of minutes.  This is 
> because the complexity of an insertion is O\(n\). As the size of a list 
> grows, the overhead grows n^2. (integral of linear function).  It also causes 
> allocations and copies of big arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7174) Support for more efficient large directories

Reply via email to