[ https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165389#comment-14165389 ]
Kihwal Lee edited comment on HDFS-7174 at 10/9/14 5:16 PM: ----------------------------------------------------------- bq. As Yi Liu pointed out, the current patch has a problem. If we go back and forth between switchingThreshold (say, by repeatedly adding and removing a single element to a directory), we pay a very high cost. To prevent this, the threshold for converting a INodeHashedArrayList back to a simple INodeArrayList should be lower than the threshold for doing the opposite conversion. There is a low watermark, with is 90% of the conversion threshold. So it won't flip back and forth like that. was (Author: kihwal): bq. As Yi Liu pointed out, the current patch has a problem. If we go back and forth between switchingThreshold (say, by repeatedly adding and removing a single element to a directory), we pay a very high cost. To prevent this, the threshold for converting a INodeHashedArrayList back to a simple INodeArrayList should be lower than the threshold for doing the opposite conversion. There is low watermark, with is 90% of the conversion threshold. So it won't flip back and forth like that. > Support for more efficient large directories > -------------------------------------------- > > Key: HDFS-7174 > URL: https://issues.apache.org/jira/browse/HDFS-7174 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch > > > When the number of children under a directory grows very large, insertion > becomes very costly. E.g. creating 1M entries takes 10s of minutes. This is > because the complexity of an insertion is O\(n\). As the size of a list > grows, the overhead grows n^2. (integral of linear function). It also causes > allocations and copies of big arrays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)