[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2016-05-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282164#comment-15282164
 ] 

Sangjin Lee commented on HDFS-9053:
---

There has been no movement on this for a while. We'll need to move it out of 
scope for 2.8.0 soon. Let me know if you disagree.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-22 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968754#comment-14968754
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks for the discussion, Nicholas.

{quote}
BTW, why do we need two arrays for elements and children. I think BTree 
implementation naturally needs only one array.
{quote}
Actually most implementations I saw use two array. 
One array stores the elements, and another array stores the reference to 
children nodes (childrenSize == elementsSize+1 for non leaf nodes). 
Having them in one array makes: 1) the array contains two different types, I 
think it's strange. 2) more complicated logic and not easy to understand.

{quote}
When the #children is small, ArrayList is just fine. Let's use BTree only when 
#children is larger than a threshold
{quote}
Yeah, I ever did this as you suggested, please see {{005}} patch, I added a 
{{SortedCollection}} which wraps a custom array list implementation and the 
btree. 
- I didn't do as your suggestion of "The field in INodeDirectory is List 
children which may refer to either an ArrayList or a BTreeList. We may replace 
the list at runtime", is because:  It's hard to use a List, since when we 
use ArrayList, we do searching before index/delete and access through index, 
but BTree is in-order and we don't access through index. 

I personally think this approach is a bit complex. Do you have further 
suggestion about it? Nicholas.

On the other hand, back to the btree, Jing, I see you are not in favor of btree 
extend Node, is it possible to make some change and then you can accept? Any 
suggestions?

Appreciate you two to spend time on the discussion, thanks a lot!

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-22 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969206#comment-14969206
 ] 

Yi Liu commented on HDFS-9053:
--

{quote}
The advantages of having one array are 1) smaller memory footprint and 2) not 
requiring to maintain the invariant above.
{quote}
Agree, my concern is it requires one array contains two different data types 
and more complex code logic. 

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-22 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969166#comment-14969166
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

> Actually most implementations I saw use two array.

Could you give some examples?

Two arrays are closely related to each other as shown in the javadoc below.
{code}
+   * It must at all times maintain invariant that either:
+   * -childrenSize == 0, elementsSize unconstrained
+   * -childrenSize == elementsSize + 1
{code}
The advantages of having one array are 1) smaller memory footprint and 2) not 
requiring to maintain the invariant above.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-22 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969202#comment-14969202
 ] 

Yi Liu commented on HDFS-9053:
--

{quote}
Could you give some examples?
{quote}
https://github.com/google/btree
The google btree implementation using Go language. 

https://code.google.com/p/cpp-btree/
The btree in cpp

Also there are some other btree implementations in github which may be not 
popular.




> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968277#comment-14968277
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks [~jingzhao]!

{quote}
My concern is if the number of directories is limited, then maybe increasing 44 
bytes per directory is fine. E.g., for a big cluster with >100M 
files/directories, if we only have 1M directories, then we only increase the 
heap size by 44 MB. Compared with the total heap size this may be just <0.1% 
increase.
{quote}
Each directory increasing 44 bytes is the old implementation.  
In the latest patch, each directory only increase *8* bytes.  So it's quite 
small for directory. 

I know some users have more than 500M files, and some directory contains quite 
lots of files, usually this kind of large directory is accessed most 
frequently, just as "barrel principle", this becomes the bottleneck of NN.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968202#comment-14968202
 ] 

Jing Zhao commented on HDFS-9053:
-

Thanks for all the work here, [~hitliuyi]! Before we decide what optimization 
should be done, maybe we can first check the typical percentage of 
INodeDirectories (among all the INodes) in a cluster. My concern is if the 
number of directories is limited, then maybe increasing 44 bytes per directory 
is fine. E.g., for a big cluster with >100M files/directories, if we only have 
1M directories, then we only increase the heap size by 44 MB. Compared with the 
total heap size this may be just <0.1% increase.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968424#comment-14968424
 ] 

Jing Zhao commented on HDFS-9053:
-

I understand the latest patch only increases 8 bytes for each directory. But 
I'm not in favor of the latest change letting BTree extend Node. So I think we 
may first want to understand the total number of memory increased by the 
original patch.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968445#comment-14968445
 ] 

Yi Liu commented on HDFS-9053:
--

Got it now. Thanks a lot [~jingzhao]!
Let's recalculate again.  For original btree implementation, it increases 44 
bytes which is my estimation and ignore alignment.  Now I have looked into the 
real memory layout, it actually increases (40 + 40) - 40 arraylist = 40 bytes. 
And I can do small improvement to remove {{degree}} variable 4 bytes + 4 bytes 
alignment/padding gap = 8 bytes.
So finally if we don't let BTree extend Node, it increases *32 bytes* for a 
directly.

32 bytes memory increment for a directory is fine for me and was my original 
thought.  As in your example, if we have 1M directories, then we only increase 
heap size by 32 MB.  I also respect Nicholas' comment, if we all think it's OK, 
I am happy to do this :).  


{noformat}
org.apache.hadoop.util.BTree object internals:
 OFFSET  SIZE  TYPE DESCRIPTIONVALUE
  016   (object header)N/A
 16 4   int BTree.degree   N/A
 20 4   int BTree.size N/A
 24 4   int BTree.modCount N/A
 28 4   (alignment/padding gap)N/A
 32 8  Node BTree.root N/A
Instance size: 40 bytes (estimated, the sample instance is not available)
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
{noformat}

{noformat}
org.apache.hadoop.util.BTree.Node object internals:
 OFFSET  SIZE TYPE DESCRIPTIONVALUE
  016  (object header)N/A
 16 4  int Node.elementsSize  N/A
 20 4  int Node.childrenSize  N/A
 24 8 Object[] Node.elements  N/A
 32 8 Object[] Node.children  N/A
Instance size: 40 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
{noformat}

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968454#comment-14968454
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

When the #children is small, ArrayList is just fine.  Let's use BTree only when 
#children is larger than a threshold, say 4k or 8k.  After an ArrayList is 
converted to BTree, it may not need to be converted back to ArrayList for 
simplicity and also that the case is uncommon.

BTW, why do we need two arrays for elements and children.  I think BTree 
implementation naturally needs only one array.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964679#comment-14964679
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  23m 13s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   8m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 28s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 28s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  9s | The applied patch generated  7 
new checkstyle issues (total was 0, now 7). |
| {color:green}+1{color} | whitespace |   0m 11s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  5s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   8m 44s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  52m 27s | Tests failed in hadoop-hdfs. |
| | | 115m 59s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.ipc.TestIPC |
|   | hadoop.fs.viewfs.TestViewFsAtHdfsRoot |
|   | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767526/HDFS-9053.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13075/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13075/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13075/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13075/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13075/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13075/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-20 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964775#comment-14964775
 ] 

Yi Liu commented on HDFS-9053:
--

The test failures are not related.
I plan to support {{shrinkable}} in future task.

Hi [~jingzhao], could you help to review the new patch again? (The diff is 
small compared with previous patch you reviewed)
[~szetszwo], I can change the default degree to 2K if you like.

BTW, I have checked the logic of btree many times and done a lot of tests, and 
I am confident of the correctness of btree. 

Thanks a lot.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-19 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964433#comment-14964433
 ] 

Yi Liu commented on HDFS-9053:
--

Update the patch:
# Fix some checkstyles and the whitespace.
# Update few method names in btree to be more readable.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958688#comment-14958688
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

{noformat}
 24 8 Object[] Node.elements  N/A
 32 8 Object[] Node.children  N/A
{noformat}
It only counts the reference but array objects are not counted.  So the BTree 
overhead is still a lot more than ArrayList.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959972#comment-14959972
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

> For small elements size (assume # < max degree which is 2047), ... Do I miss 
> something?

According to [your 
comment|https://issues.apache.org/jira/browse/HDFS-9053?focusedCommentId=14950498=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14950498]
 (also copied below), you were saying that B-Tree only increased 8 bytes when 
#children < 4K, i.e. when 2047 < #children < 4K.  Is it still true?  If not, 
how much memory is needed when 2047 < #children < 4K?
{quote}
I find a good approach to improve B-Tree memory overhead to make it only 
increase 8 bytes memory usage comparing with using ArrayList for small elements 
size.
So we don't need to use ArrayList when #children is small (< 4K), and we can 
always use the BTree.
{quote}

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959988#comment-14959988
 ] 

Yi Liu commented on HDFS-9053:
--

Hi Nicholas, sorry, I may bypass some description here.  For 2047, I wanted to 
say it just an example threshold of small elements size. Currently the small 
elements size is an assumption value, actually we can set the degree of BTree 
as any value we want to.  If we want 4K as the threshold of small elements 
size, we can set the degree of B-Tree to 2K, then max degree is (4K - 1).   (I 
should make the description clear..)
Thanks.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959955#comment-14959955
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks [~szetszwo] for the comments.

{quote}
>> 24 8 Object[] Node.elements  N/A
>> 32 8 Object[] Node.children  N/A
It only counts the reference but array objects are not counted. So the BTree 
overhead is still a lot more than ArrayList.
{quote}

For small elements size (assume # < max degree which is 2047), the {{children}} 
is null reference, so there is no array object of {{children}} here,  and for 
{{elements}}, {{ArrayList}} also has it. So as described above, {{BTree}} 
increases 8 bytes compared with {{ArrayList}} for small size elements.  Do I 
miss something?

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951674#comment-14951674
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 35s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 33s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 13s | The applied patch generated  8 
new checkstyle issues (total was 0, now 8). |
| {color:red}-1{color} | whitespace |   0m 10s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   7m  7s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | hdfs tests | 190m  7s | Tests passed in hadoop-hdfs. 
|
| | | 241m 41s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765801/HDFS-9053.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / def374e |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12911/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-09 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950082#comment-14950082
 ] 

Yi Liu commented on HDFS-9053:
--

Update the patch. The new patch uses an array list if the children size is 
small (<= 4K), otherwise use B-Tree.
The new patch includes following changes:
# add {{SortedCollection}} which uses an array list to store elements if the 
size is small, otherwise use B-Tree.  It implements a shrinkable array list and 
control expanding. The merits comparing with using java ArrayList in it are: 
(1) Fewer memory: save the object overhead/alignment of ArrayList. (2) The max 
capacity is 4K, so no need to expand to capacity larger than 4K (3) Shrinkable: 
if the elements size becomes few, the internal array will shrink.
# Add more long tests for the {{B-Tree}} and {{SortedCollection}}.

I am still running the long running tests locally, they all success so far.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-09 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950084#comment-14950084
 ] 

Yi Liu commented on HDFS-9053:
--

The increased memory usage is only 8 bytes in {{SortedCollection}} comparing 
with {{ArrayList}} if the elements size is small (<= 4K).
{noformat}
java.util.ArrayList object internals:
 OFFSET  SIZE TYPE DESCRIPTIONVALUE
  016  (object header)N/A
 16 4  int AbstractList.modCount  N/A
 20 4  (alignment/padding gap)N/A
 24 4  int ArrayList.size N/A
 28 4  (alignment/padding gap)N/A
 32 8 Object[] ArrayList.elementData  N/A
Instance size: 40 bytes (estimated, the sample instance is not available)
{noformat}

{noformat}
org.apache.hadoop.hdfs.util.SortedCollection object internals:
 OFFSET  SIZE TYPE DESCRIPTIONVALUE
  016  (object header)N/A
 16 4  int SortedCollection.initCapacity  N/A
 20 4  int SortedCollection.size  N/A
 24 4  int SortedCollection.modCount  N/A
 28 4  int SortedCollection.degreeN/A
 32 8 Object[] SortedCollection.elements  N/A
 40 8BTree SortedCollection.btree N/A
Instance size: 48 bytes (estimated, the sample instance is not available)
{noformat}

Hi [~jingzhao], [~szetszwo], could you review the new patch? Thanks.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950856#comment-14950856
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   8m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 28s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 22s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  6s | The applied patch generated  8 
new checkstyle issues (total was 0, now 8). |
| {color:red}-1{color} | whitespace |   0m 10s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 47s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   7m 48s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 120m 50s | Tests failed in hadoop-hdfs. |
| | | 181m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestWriteRead |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.security.TestPermission |
|   | hadoop.hdfs.TestParallelRead |
|   | hadoop.fs.viewfs.TestViewFsHdfs |
|   | hadoop.hdfs.TestBlockReaderLocalLegacy |
|   | hadoop.hdfs.server.namenode.TestXAttrConfigFlag |
|   | hadoop.hdfs.TestPread |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.TestDFSStripedOutputStream |
|   | hadoop.hdfs.TestWriteConfigurationToDFS |
|   | hadoop.hdfs.TestDatanodeRegistration |
|   | hadoop.hdfs.web.TestWebHDFSXAttr |
|   | hadoop.hdfs.TestDFSRollback |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
|   | hadoop.hdfs.TestDatanodeConfig |
|   | hadoop.hdfs.TestDFSFinalize |
|   | hadoop.hdfs.server.namenode.ha.TestGetGroupsWithHA |
|   | hadoop.fs.TestWebHdfsFileContextMainOperations |
|   | hadoop.fs.TestGlobPaths |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.hdfs.tools.TestDFSHAAdminMiniCluster |
|   | hadoop.hdfs.TestSeekBug |
|   | hadoop.fs.loadGenerator.TestLoadGenerator |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMkdir |
|   | hadoop.hdfs.TestAbandonBlock |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary |
|   | hadoop.hdfs.TestFileCreationDelete |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.fs.contract.hdfs.TestHDFSContractConcat |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 |
|   | hadoop.hdfs.util.TestDiff |
|   | hadoop.hdfs.security.TestDelegationToken |
|   | hadoop.fs.TestSymlinkHdfsDisable |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRootDirectory |
|   | hadoop.hdfs.TestMissingBlocksAlert |
|   | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.TestSmallBlock |
|   | hadoop.cli.TestDeleteCLI |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.fs.viewfs.TestViewFsWithXAttrs |
|   | hadoop.hdfs.TestDFSMkdirs |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.web.TestWebHDFSForHA |
|   | hadoop.fs.viewfs.TestViewFsDefaultValue |
|   | hadoop.fs.contract.hdfs.TestHDFSContractOpen |
|   | hadoop.hdfs.server.namenode.TestDeleteRace |
|   | hadoop.hdfs.TestFSInputChecker |
|   | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRename |
|   | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.cli.TestErasureCodingCLI |
|   | hadoop.hdfs.TestRollingUpgradeRollback |
|   | hadoop.hdfs.TestRemoteBlockReader |
|   | hadoop.hdfs.server.namenode.TestNameNodeResourceChecker |
|   | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestLeaseRecovery |
|   | hadoop.fs.viewfs.TestViewFsAtHdfsRoot |
|   | hadoop.hdfs.TestBlockReaderLocal |
|   | hadoop.fs.contract.hdfs.TestHDFSContractGetFileStatus |
|   | hadoop.cli.TestCryptoAdminCLI |
|   | 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-09 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950498#comment-14950498
 ] 

Yi Liu commented on HDFS-9053:
--

I find a good approach to improve B-Tree overhead to make it only increase *8 
bytes* memory usage comparing with using {{ArrayList}} for small elements size. 
So we don't need to use ArrayList when #children is small (< 4K), and we can 
always use the {{BTree}}.
The main idea is to let {{BTree}} extend the BTree Node, then we don't need a 
separate root node, since {{BTree}} itself is the root.

{noformat}
java.util.ArrayList object internals:
 OFFSET  SIZE TYPE DESCRIPTIONVALUE
  016  (object header)N/A
 16 4  int AbstractList.modCount  N/A
 20 4  (alignment/padding gap)N/A
 24 4  int ArrayList.size N/A
 28 4  (alignment/padding gap)N/A
 32 8 Object[] ArrayList.elementData  N/A
Instance size: 40 bytes (estimated, the sample instance is not available)
{noformat}

{noformat}
org.apache.hadoop.util.btree.BTree object internals:
 OFFSET  SIZE TYPE DESCRIPTIONVALUE
  016  (object header)N/A
 16 4  int Node.elementsSize  N/A
 20 4  int Node.childrenSize  N/A
 24 8 Object[] Node.elements  N/A
 32 8 Object[] Node.children  N/A
 40 4  int BTree.size N/A
 44 4  int BTree.modCount N/A
Instance size: 48 bytes (estimated, the sample instance is not available)
{noformat}
We can see {{BTree}} only increases *8 bytes* comparing with {{ArrayList}} for 
a {{INodeDirectory}}.

[~jingzhao], [~szetszwo], please look at the new patch {{006}}.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-08 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948629#comment-14948629
 ] 

Yi Liu commented on HDFS-9053:
--

Hi [~szetszwo], do you have further comments about it? Thanks. 

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-08 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949847#comment-14949847
 ] 

Yi Liu commented on HDFS-9053:
--

Hi [~szetszwo], thanks for your comments.

I will do as your suggestion: use an array list if the children size is small 
(<= 4K), otherwise use B-Tree. But there may be some differences for the 
implementation details.

{quote}
The field in INodeDirectory is List children which may refer to either 
an ArrayList or a BTreeList. We may replace the list at runtime
{quote}
It's hard to use a List, since when we use ArrayList, we do searching 
before index/delete and access through index, but BTree is in-order and we 
don't access through index. So I think we do it in following way:
# actually I made an initial patch of switching between array list and b-tree 
few days ago. The logic of ArrayList is not complicated, so I implement it in a 
new class and support shrinkable, in the new class, I control the array and 
expanding, also the new class keeps reference to a b-tree, if the elements size 
becomes large, it switches to use the b-tree. The new class is an in-order data 
structure, not like ArrayList which we need to search before operating.   In 
INodeDirectory, we just need to use the new class, another reason of I 
implement a new data structure class and don't do the switching in 
INodeDirectory is: we should have a {{SWITCH_THRESHOLD}} for switching from 
array list to b-tree, and need a low water mark to switch back, they should not 
be the same value, otherwise, the switching becomes frequent at some point, so 
I don't want to expose too many internal logic of switching in INodeDirectly. 
But the final memory usage of the new data structure is the same as ArrayList, 
even it has a reference to b-tree, and it supports Shrinkable  I will give 
the memory usage after posting a new patch.

{quote}
 I am also worry about the potiental bugs and the risk. If there is a bug in 
B-Tree, it is possible to lose one or more sub trees and, as a result, lose a 
lot of data. ArrayList is already well-tested. Anyway, we need more tests for 
the B-Tree, especially some long running random tests.
{quote}
Sure, I will add more tests for it, I have added many tests including some long 
running. I agree with that a bug-free data structure implementation is not 
easy, we should be careful and test the new implementations extensively :) 

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-08 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949749#comment-14949749
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

Hi [~hitliuyi],

44 bytes overhead is not small especially when most of the directories fall in 
this case.  Just like that we did very hard but only able to save 40 bytes per 
replica in HDFS-8859.

I agree that the current BTree implementation has an advantage that it can 
shrink.  We may also implement our a shrinkable array list.  The shrinkable 
feature does not seem a big deal since this is not a common case.

> ... we may need write a class to wrap the two data structures, ...

Wrapping is not needed.  The field in INodeDirectory is List children 
which may refer to either an ArrayList or a BTreeList.  We may replace the list 
at runtime.

For the new B-Tree implementation.  I am also worry about the potiental bugs 
and the risk.  If there is a bug in B-Tree, it is possible to lose one or more 
sub trees and, as a result, lose a lot of data.  ArrayList is already 
well-tested.  Anyway, we need more tests for the B-Tree, especially some long 
running random tests.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-02 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942004#comment-14942004
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks [~szetszwo], good comment, I ever considered it carefully too. I want to 
convince you to allow me only use B-Tree here:
# Use the case you said, the #children is small and < 4K. *1)* If children is < 
2K, then B-Tree only contains a root. As we counted before, the increased 
overhead is only 44 bytes which is really very small for a directory, a 
continuous block is 80 bytes memory (detail below), so we only increase about 
1/2 continuous block for a directory in NN. *2)* If the children is > 2K and < 
4K, here we use 4K as example, the B-Tree at most contains 3 branches: 1 root 
node, 3 leaf nodes. One leaf node increase about (40 bytes + 16 bytes elements 
array overhead) = 56 bytes, and 1 root node is (40 bytes + 16 bytes elements 
array overhead + 16 bytes children overhead + 3 children * 8) = 96 bytes, the 
b-tree itself is 40 bytes, and we need to subtract the ArrayList (40 bytes + 16 
bytes elements array overhead) = 56 bytes, so we at most increase 56 * 3 + 96 + 
40 - 56 = 248 bytes overhead, but ArrayList of 4K references to INode needs 
more than 4K * 8 = 32K memory, then we can get that the increased memory is 
only *0.75%*
{noformat}
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoContiguous object 
internals:
 OFFSET  SIZE  TYPE DESCRIPTIONVALUE
  016   (object header)N/A
 16 8  long Block.blockId  N/A
 24 8  long Block.numBytes N/A
 32 8  long Block.generationStamp  N/A
 40 8  long BlockInfo.bcId N/A
 48 2 short BlockInfo.replication  N/A
 50 6   (alignment/padding gap)N/A
 56 8 LinkedElement BlockInfo.nextLinkedElementN/A
 64 8  Object[] BlockInfo.triplets N/A
 72 8 BlockUnderConstructionFeature BlockInfo.uc   N/A
Instance size: 80 bytes (estimated, the sample instance is not available)
{noformat}
# One advantage of B-Tree compared to ArrayList for small size children is: 
B-Tree can shrink. If the children of directory decreases from 4K to 2K, there 
are 2K * 8 = 16K memory wasteful if suing ArrayList. 
# On the other hand, if we do the switch between ArrayList and B-Tree, we may 
need write a class to wrap the two data structures, then  it still needs 
16bytes object overhead + 8 bytes additional reference = 24 bytes. 

How do you say? Thanks, Nicholas.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939361#comment-14939361
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 49s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 16s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  5s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 10s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   7m 46s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 200m  8s | Tests failed in hadoop-hdfs. |
| | | 255m 18s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.metrics2.impl.TestGangliaMetrics |
|   | hadoop.hdfs.TestRecoverStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764524/HDFS-9053.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5db371f |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12758/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12758/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12758/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12758/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12758/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940673#comment-14940673
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

For the common case that the #children is small, say < 4K, ArrayList is better 
than B-Tree since it uses less memory and has similar running time as B-Tree.  
B-Tree is better for the special case that the #children is large.  How about 
keep using ArrayList when #children < 4K and change to B-Tree when #children >= 
4K.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936627#comment-14936627
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

{quote}
>It seems that there is at least one more reference not yet counted since 
> Node is non-static.

I mean the increased memory compared with using ArrayList, and I have stated 
this in above comments.
The reference to Node is already counted, use 64-bits system/JVM as example, 
the first '8' is the reference to the B-Tree root. The second '8' is the null 
reference for children in BTree#Node. This doesn't count the memory parts which 
ArrayList also has: 12 bytes object overhead + 4 bytes int modCount + 8 bytes 
reference to elements + 4 bytes int elementsSize + elements array memory.
{quote}
ArrayList is not an inner class but the new class Node is.  There is a "this" 
reference to the outer class.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-30 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939144#comment-14939144
 ] 

Yi Liu commented on HDFS-9053:
--

{quote}
the new class Node is. There is a "this" reference to the outer class.
{quote}
Good point, Nicholas, I have missed that. I just checked it in the memory 
layout after you pointed out.
{noformat}
org.apache.hadoop.util.BTree.Node object internals:
 OFFSET  SIZE TYPE DESCRIPTIONVALUE
  016  (object header)N/A
 16 4  int Node.elementsSize  N/A
 20 4  int Node.childrenSize  N/A
 24 8 Object[] Node.elements  N/A
 32 8 Object[] Node.children  N/A
 40 8BTree Node.this$0N/A
Instance size: 48 bytes (estimated, the sample instance is not available)
{noformat}

Let me change the B-Tree Node class to static, then we can save the reference 
to outer class. Will update the patch later.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934689#comment-14934689
 ] 

Yi Liu commented on HDFS-9053:
--

[~jingzhao], you have given a good review, and it hits the two places I ever 
considered carefully how to do it better. Thanks.

{quote}
In INodeDirectory#replaceChild, can we directly call addOrReplace instead of 
calling get first?
{quote}
I think we can use {{addOrReplace}} directly.  Since there should be an INode 
with the name existing.  To keep original behavior, I will remove the added one 
if {{addOrReplace}} returns null.

{quote}
Do you think we can avoid the following code? Maybe we can add the EK type to 
the ReadOnlyCollection/ReadOnlyList level?
{quote}
That's a good comment, I ever considered this carefully.  I also thought adding 
the EK type as one of generic type to the ReadOnlyCollection/ReadOnlyList 
level, but I felt it looked not natural for a collection/list, and not all 
implementations of ReadOnlyList need to implement iterating starting from 
specified element, also I though it was OK since it's a private interface we 
use in HDFS.  I will leave this comment in next version of patch, if you feel 
we'd better to do this, I will update it, I am OK with the both ways.

{quote}
DirectoryWithSnapshotFeature#getChildrenList#iterator(EK) forgot to increase 
pos? Maybe also add a new test for this (e.g., set a small ls limit and list a 
snapshot of a directory)?
{quote}
Great catch, let me update it, and add a new test in {{TestLargeDirectory}} to 
cover.

{quote}
In getListing, instead of continuing the iteration, can we just call size() to 
calculate the number of the remaining items?
{quote}
I ever tried to find a better way. {{size()}} will return the total number of 
elements in B-Tree, but we don't know the current index, so seems not able to 
calculate the number of the remaining items.



> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935212#comment-14935212
 ] 

Yi Liu commented on HDFS-9053:
--

The three test failures are not related.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935101#comment-14935101
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m  1s | Pre-patch trunk has 2 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  6s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m 14s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   7m 42s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 165m 33s | Tests failed in hadoop-hdfs. |
| | | 220m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.shell.TestCopyPreserveFlag |
|   | hadoop.hdfs.web.TestWebHDFSOAuth2 |
|   | hadoop.hdfs.TestCrcCorruption |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764210/HDFS-9053.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d6fa34e |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/artifact/patchprocess/trunkFindbugsWarningshadoop-common.html
 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12739/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934862#comment-14934862
 ] 

Yi Liu commented on HDFS-9053:
--

Update the patch to address Jing's comments:
1. Call {{addOrReplace}} directly in {{INodeDirectory#replaceChild}}
2. add EK type to the ReadOnlyCollection/ReadOnlyList level.
3. Fix the bug in 
{{DirectoryWithSnapshotFeature#getChildrenList#iterator(K)#next()}}, and add a 
new test in {{TestLargeDirectory}} for it: list the snapshot of directory, 
since the snapshot of directory is large enough, so can hit the code path.
 

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936229#comment-14936229
 ] 

Yi Liu commented on HDFS-9053:
--

Have fixed following errors we found so far:
1. the time complexity of insert/delete for ArrayList
2. In the JIRA description, change reference size to 8 bytes, since in real 
case, we use 64-bits system/JVM
3. In the JIRA comments, change reference size to 8 bytes and re-calculate 
estimation of the memory usage, and description about what system/JVM we assume.

[~szetszwo], let's fix the errors if you find more. Thanks.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936183#comment-14936183
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks [~szetszwo] for the comments.

{quote}
here are quite a few errors in description and the analysis. Let's hold on the 
patch for the moment and fix all these errors first.
{quote}
Sure, let's fix the errors in the description and the analysis first, thanks 
for pointing out this.

{quote}
> So totally 12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 
> = 40 bytes on 64-bits system/JVM. (I have not counted object alignment)
It seems that there is at least one more reference not yet counted since Node 
is non-static.
{quote}
The reference to Node is already counted, use 64-bits system/JVM as example, 
the first '8' is the reference to the B-Tree root.  The second '8' is the null 
reference for children in BTree#Node, of course we are talking about the 
directory which is not large and only contains one B-Tree node.


Nicholas, I am going to fix the two places in the JIRA description and analysis 
in JIRA comments, which we find so far
# fix the time complexity of insert/delete for ArrayList
# fix the description about memory usage for B-Tree to add the system/JVM 
information and add more detailed information.

Any other errors do you find besides these? please tell me and appreciate if 
you can also help to edit them directly.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-29 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935480#comment-14935480
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

> So totally 12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 
> = 40 bytes on 64-bits system/JVM. (I have not counted object alignment)

It seems that there is at least one more reference not yet counted since Node 
is non-static.

There are quite a few errors in description and the analysis.  Let's hold on 
the patch for the moment and fix all these errors first.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934634#comment-14934634
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks for the comments, Nicholas.

{quote}
Where are the numbers, especially the 4s, from? Do we assume a 32-bit world?
{quote}

{code}
public class BTree implements Iterable {
  ...
  private final int degree;
  private Node root;
  private int size;
  private transient int modCount = 0;
 ...
}

private final class Node {
static final int DEFAULT_CAPACITY = 5;
private Object[] elements;
private int elementsSize; 
private Object[] children;
private int childrenSize;
...
}
{code}
Sorry, I should use 64-bits system/JVM, and details are:

Compared to ArrayList, we increases following things:
private final int degree;   <-   4 bytes Integer
private Node root;   <-  reference, 4 bytes on 32-bits 
system/JVM, 8 bytes on 64-bits system/JVM
private int size;<-  4 bytes Integer

{{Node}} object overhead   <--  12 bytes
private Object[] children; <-  null reference, 4 bytes on 
32-bits system/JVM, 8 bytes on 64-bits system/JVM
private int childrenSize;   <-  4 bytes Integer.

So totally  12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 = 
40 bytes on 64-bits system/JVM. (I have not counted object alignment)


> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934598#comment-14934598
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks Jing for the further review. 
{quote}
Yeah, sorry for the delay.
{quote}
NP, thanks for the help :)

I also think it's a critical part of the code, thanks a lot for the review, 
[~szetszwo]! also welcome [~kihwal] and anyone who can review this.

{quote}
For insert/delete, the running time is O( n) for ArrayList.
{quote}
Yeah, if we count the re-allocations and copies of big arrays, I just mean the 
search time complexity before we do insert/delete.

{quote}
How about memory usage? One reason to use ArrayList instead of a more 
complicated data structure is to save memory. It seems to me that using B-Tree 
increases memory usage in general. It increases memory usage dramatically in 
some worst cases such as the average branching factor of the tree is small 
(this may also be a common case).
{quote}
That's a great question here.  I have given few description in the my second 
comment above: 
[here|https://issues.apache.org/jira/browse/HDFS-9053?focusedCommentId=14740814=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740814],
 I would give a detailed explanation:
Generally say, the B-Tree in the patch increases *very few* memory usage, and 
can be ignored:
# *For normal and small size directories*, let's say it's (INode) children size 
is not large and then B-Tree only contains one node. Currently the default 
degree of B-Tree used in INodeDirectory is 1024, so the max degree is 2047, 
that means one B-Tree node can at most contain 2047 elements. And the children 
of the B-Tree root node is null, so the increased memory usage compared to 
ArrayList is: one object overhead + few variables, total increase about 
(12+4+4+4+4+4) = 32 bytes.
# *For large directories*, let's say it's (INode) children size is larger than 
the max degree and then B-Tree contains more than one nodes. For any B-Tree 
node except root, it will contains at least min degree size elements, in our 
case, it's 1023, furthermore, we expand elements of B-Tree node same as 
ArrayList, so it's the same as ArrayList at this point, typically the worst 
case for B-Tree from memory point view is all elements are in-order, in this 
case, when we split a B-Tree node, we allocate the elements size of memory and 
do expanding, but no more elements are added later in the split left B-Tree 
node, and there is a waste of 1/3 size of elements array memory, but actually 
it's almost the same as the worst cases even for ArrayList if there is no/few 
elements added, but in practice, the elements are not in order.  Now back to 
the overhead, for B-Tree, for every B-Tree node, the increased memory usage is: 
one object overhead + one element pointer in the father + 2 variables, so total 
increase about (12 + 4+4+4) = 24 bytes for every new added B-Tree node, but one 
B-Tree node contains at least (degree - 1) elements, and at most (2* degree - 
1) elements.  Another small increased memory is the object overhead of the 
children in the B-Tree inner node.

In conclusion, the increased memory used for B-Tree compared to ArrayList is 
very small, for small/normal directories, it's only 32 bytes overhead which is 
even less than memory of one block in NN, for large directories, besides 
additional increased 12 bytes memory for the variables in B-Tree , it will 
increase about 24 bytes memory for every new added B-Tree node which can 
contain 1023 ~ 2047 INodes in the directory in our case.  We can ignore these 
few memory overhead for a directory. 

And last, the benefits of B-Tree is great and obvious as described in the JIRA 
description, also in the patch, I consider the memory overhead carefully while 
implementing the B-Tree. 

Please let me know if you have further questions. Thanks a lot.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934611#comment-14934611
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

> Yeah, if we count the re-allocations and copies of big arrays, I just mean 
> the search time complexity before we do insert/delete.

If we count only the search time complexity before we do insert/delete, it 
should be called "search time complexity" but not insert/delete time complexity.

> ... total increase about (12+4+4+4+4+4) = 32 bytes.

Where are the numbers, especially the 4s, from?  Do we assume a 32-bit world?

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933696#comment-14933696
 ] 

Jing Zhao commented on HDFS-9053:
-

Yeah, sorry for the delay. I'm currently reviewing the remaining part. Will 
post comments later.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934414#comment-14934414
 ] 

Jing Zhao commented on HDFS-9053:
-

Also, since this is a critical part of the code, maybe we should invite more to 
review. Do you also want to take a look at the patch, [~szetszwo] and [~kihwal]?

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934412#comment-14934412
 ] 

Jing Zhao commented on HDFS-9053:
-

Some further comments:
# In {{INodeDirectory#replaceChild}}, can we directly call {{addOrReplace}} 
instead of calling {{get}} first?
{code}
final INode existing = children.get(newChild.getLocalNameBytes());
..
oldChild = existing;
..
children.addOrReplace(newChild);
{code}
# Do you think we can avoid the following code? Maybe we can add the EK type to 
the ReadOnlyCollection/ReadOnlyList level?
{code}
public  Iterator iterator(EK k) {
  final byte[] name = (byte[])k;
{code}
# {{DirectoryWithSnapshotFeature#getChildrenList#iterator(EK)}} forgot to 
increase {{pos}}? Maybe also add a new test for this (e.g., set a small ls 
limit and list a snapshot of a directory)?
{code}
public INode next() {
  if (pos >= childrenSize) {
throw new NoSuchElementException();
  }
  return children.get(pos);
}
{code}
# In {{getListing}}, instead of continuing the iteration, can we just call 
{{size()}} to calculate the number of the remaining items?
{code}
  while (i.hasNext()) {
INode cur = i.next();
if (!(locationBudget > 0 && listingCnt < fsd.getLsLimit())) {
  remaining++;
  continue;
}
{code}

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934459#comment-14934459
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

The concept of using B-Tree seems good.

> ...  Currently we use an ArrayList for the children under a directory, and 
> the children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n) ...

For insert/delete, the running time is O( n) for ArrayList.

How about memory usage?  One reason to use ArrayList instead of a more 
complicated data structure is to save memory.  It seems to me that using B-Tree 
increases memory usage in general.  It increases memory usage dramatically in 
some worst cases such as the average branching factor of the tree is small 
(this may also be a common case).

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-27 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910048#comment-14910048
 ] 

Yi Liu commented on HDFS-9053:
--

Hi [~jingzhao], could you help to do further reviews? Thanks a lot.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904106#comment-14904106
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  2s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m 56s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 195m 15s | Tests failed in hadoop-hdfs. |
| | | 266m 28s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761615/HDFS-9053.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cc2b473 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12615/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12615/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12615/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12615/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12615/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-22 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903725#comment-14903725
 ] 

Yi Liu commented on HDFS-9053:
--

The Jenkins has some issue and the test failures are not related. 
The reason is: having multiple maven invocations going on at once sharing the 
same .m2 directory on the same machine, this patch includes new class file in 
Hadoop-common, when some other maven invocations run after this one on the same 
machine,  "mvn install" of test will replace the hadoop-common jar in .m2 
directory, so the test failures in Jenkins show NoClassDefFoundError for the 
new added class in hadoop-common.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-22 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902232#comment-14902232
 ] 

Yi Liu commented on HDFS-9053:
--

Update the patch to address the comments, also cleanup the 
javac/checkstyle/whitespace.
{quote}
4. Optional: in insertElement maybe we can copy elements only once if we need 
to expand the array.
{quote}
We still need to have two copy even if we handle the insertElement and expand 
together, and the steps are: 1) allocate new heap memory, 2) copy the elements 
before insertion point to the new memory, 3) set inserted element and copy the 
elements after insertion point to the new memory. It also makes logic a bit 
complex.  The current insertElement is the same behavior as insertion of 
ArrayList. So maybe we can keep current behavior, how do you think?

Thanks Jing.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902728#comment-14902728
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 50s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  4s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m  9s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 159m 26s | Tests failed in hadoop-hdfs. |
| | | 229m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.cli.TestXAttrCLI |
|   | hadoop.fs.viewfs.TestViewFsWithXAttrs |
|   | hadoop.fs.TestSWebHdfsFileContextMainOperations |
|   | hadoop.cli.TestAclCLI |
|   | hadoop.fs.TestHDFSFileContextMainOperations |
|   | hadoop.tracing.TestTracing |
|   | hadoop.security.TestRefreshUserMappings |
|   | hadoop.security.TestPermissionSymlinks |
|   | hadoop.fs.TestSymlinkHdfsFileContext |
|   | hadoop.fs.TestUrlStreamHandler |
|   | hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot |
|   | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs |
|   | hadoop.fs.TestSymlinkHdfsFileSystem |
|   | hadoop.tools.TestJMXGet |
|   | hadoop.fs.viewfs.TestViewFsDefaultValue |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.security.TestPermission |
|   | hadoop.fs.viewfs.TestViewFsHdfs |
|   | hadoop.cli.TestDeleteCLI |
|   | hadoop.fs.TestFcHdfsCreateMkdir |
|   | hadoop.fs.viewfs.TestViewFsWithAcls |
|   | hadoop.TestGenericRefresh |
|   | hadoop.fs.viewfs.TestViewFsAtHdfsRoot |
|   | hadoop.tracing.TestTracingShortCircuitLocalRead |
|   | hadoop.fs.TestFcHdfsPermission |
|   | hadoop.hdfs.web.TestWebHDFSOAuth2 |
|   | hadoop.tracing.TestTraceAdmin |
|   | hadoop.cli.TestCacheAdminCLI |
|   | hadoop.TestRefreshCallQueue |
|   | hadoop.fs.TestEnhancedByteBufferAccess |
|   | hadoop.cli.TestCryptoAdminCLI |
|   | hadoop.fs.viewfs.TestViewFileSystemHdfs |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.fs.TestFcHdfsSetUMask |
|   | hadoop.fs.viewfs.TestViewFsFileStatusHdfs |
|   | hadoop.fs.shell.TestHdfsTextCommand |
| Timed out tests | org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761615/HDFS-9053.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 10ab7d5 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12589/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12589/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12589/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12589/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12589/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901761#comment-14901761
 ] 

Jing Zhao commented on HDFS-9053:
-

Thanks for the great work, Yi! So far I just reviewed the B-Tree implementation 
part and it looks good to me. Just some minor comments:
# "static" can be removed
{code}
  public static interface Element extends Comparable {
K getKey();
  }
{code}
# The parameter is never used.
{code}
Node(boolean allocateMaxElements) {
  elements = new Object[maxElements()];
}
{code}
# It may be helpful to add some more Preconditions/assert check to verify the 
parameter and internal state. For example, some verification about the index i 
in the following code.
{code}
SplitResult split(int i) {
  E e = (E)elements[i];
  Node next = new Node(true);
  
{code}
# Optional: in insertElement maybe we can copy elements only once if we need to 
expand the array.
# Rename {{put}} to {{addOrReplace}} to make its semantic more clear?
# Need to update the javadoc of {{removeElement}} and {{removeChild}}.
# {{SplitResult#element}} and {{SplitResult#node}} can be declared as final.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901808#comment-14901808
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks a lot for your review and spend lots of time on this, Jing! 
I will update the B-Tree part to address your comments later.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-17 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802729#comment-14802729
 ] 

Yi Liu commented on HDFS-9053:
--

The failed one test is not related to this patch.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791829#comment-14791829
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:red}-1{color} | javac |   7m 35s | The applied patch generated  28  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 45s | The applied patch generated  
16 new checkstyle issues (total was 0, now 16). |
| {color:red}-1{color} | whitespace |   0m 11s | The patch has 7  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 15s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 29s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 160m 56s | Tests failed in hadoop-hdfs. |
| | | 228m 34s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.web.TestWebHDFSOAuth2 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756270/HDFS-9053.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0832b38 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12501/console |


This message was automatically generated.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-16 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14769053#comment-14769053
 ] 

Yi Liu commented on HDFS-9053:
--

Upload the patch and trigger Jenkins.

The patch looks large, but it's not complex as its size looks. Many 
modifications are caused by {{INodeDirectory#getChildren}} returns 
{{ReadOnlyCollection}} instead of {{ReadOnlyList}}.   The patch mainly includes 
following parts:
# BTree implementation and related tests.
# INodeDirectory and related modification to use BTree and tests for large 
directory.
# Other modifications by INodeDirectory update, especially 
{{INodeDirectory#getChildren}} returns {{ReadOnlyCollection}} instead of 
{{ReadOnlyList}}.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790677#comment-14790677
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:red}-1{color} | javac |   7m 52s | The applied patch generated  28  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 11s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 52s | The applied patch generated  
16 new checkstyle issues (total was 0, now 16). |
| {color:red}-1{color} | whitespace |   0m 12s | The patch has 7  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m 21s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  32m 46s | Tests failed in hadoop-hdfs. |
| | | 103m 12s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer |
|   | hadoop.hdfs.TestHDFSFileSystemContract |
|   | hadoop.hdfs.qjournal.client.TestQuorumJournalManager |
|   | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport |
|   | hadoop.hdfs.server.namenode.TestGetBlockLocations |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.web.TestWebHDFSForHA |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.hdfs.server.datanode.TestDataNodeECN |
|   | hadoop.hdfs.web.TestWebHdfsUrl |
|   | hadoop.hdfs.TestClientProtocolForPipelineRecovery |
|   | hadoop.TestGenericRefresh |
|   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
|   | hadoop.hdfs.server.namenode.TestEditLogJournalFailures |
|   | hadoop.fs.contract.hdfs.TestHDFSContractSeek |
|   | hadoop.hdfs.TestPread |
|   | hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.fs.viewfs.TestViewFsFileStatusHdfs |
|   | hadoop.hdfs.TestWriteConfigurationToDFS |
|   | hadoop.hdfs.TestParallelShortCircuitLegacyRead |
|   | hadoop.hdfs.TestFileCreationClient |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits |
|   | hadoop.hdfs.crypto.TestHdfsCryptoStreams |
|   | hadoop.hdfs.server.datanode.TestTransferRbw |
|   | hadoop.hdfs.TestWriteBlockGetsBlockLengthHint |
|   | hadoop.hdfs.server.namenode.TestAddBlock |
|   | hadoop.fs.TestSymlinkHdfsDisable |
|   | hadoop.hdfs.TestDecommission |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotMetrics |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.server.namenode.TestFSImageWithAcl |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.TestDFSRemove |
|   | hadoop.cli.TestCacheAdminCLI |
|   | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler |
|   | hadoop.fs.loadGenerator.TestLoadGenerator |
|   | hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotReplication |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter |
|   | hadoop.hdfs.TestFileAppend4 |
|   | hadoop.hdfs.server.namenode.TestEditLogAutoroll |
|   | hadoop.hdfs.qjournal.server.TestJournalNode |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
|   | hadoop.hdfs.TestRollingUpgradeDowngrade |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
|   | hadoop.security.TestPermission |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.hdfs.server.datanode.TestRefreshNamenodes |
|   | hadoop.hdfs.server.namenode.TestDeleteRace |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade |
|  

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-16 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791373#comment-14791373
 ] 

Yi Liu commented on HDFS-9053:
--

Seems there is issue for Jenkins, re-trigger...

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791529#comment-14791529
 ] 

Hadoop QA commented on HDFS-9053:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 54s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:red}-1{color} | javac |   7m 59s | The applied patch generated  28  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 30s | The applied patch generated  
16 new checkstyle issues (total was 0, now 16). |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 7  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 22s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 23s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  43m 47s | Tests failed in hadoop-hdfs. |
| | | 111m  4s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.net.TestClusterTopology |
|   | hadoop.hdfs.TestFileStatus |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
|   | hadoop.hdfs.server.namenode.TestINodeFile |
|   | hadoop.fs.contract.hdfs.TestHDFSContractOpen |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
|   | hadoop.hdfs.TestEncryptionZonesWithHA |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMkdir |
|   | hadoop.hdfs.server.namenode.TestNameNodeXAttr |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages |
|   | hadoop.hdfs.server.datanode.TestCachingStrategy |
|   | hadoop.hdfs.server.namenode.TestFileJournalManager |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.cli.TestXAttrCLI |
|   | hadoop.hdfs.server.namenode.TestDeleteRace |
|   | hadoop.hdfs.server.namenode.TestParallelImageWrite |
|   | hadoop.hdfs.server.namenode.TestNameNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional |
|   | hadoop.hdfs.server.namenode.TestSaveNamespace |
|   | hadoop.hdfs.TestDFSRename |
|   | hadoop.hdfs.util.TestDiff |
|   | hadoop.hdfs.server.datanode.TestDataNodeFSDataSetSink |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotManager |
|   | hadoop.hdfs.server.namenode.TestFsck |
|   | hadoop.hdfs.server.namenode.ha.TestHarFileSystemWithHA |
|   | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration |
|   | hadoop.hdfs.server.datanode.TestDeleteBlockPool |
|   | hadoop.hdfs.TestRemoteBlockReader2 |
|   | hadoop.hdfs.server.namenode.TestStorageRestore |
|   | hadoop.hdfs.server.namenode.TestFileLimit |
|   | hadoop.hdfs.server.blockmanagement.TestNodeCount |
|   | hadoop.fs.contract.hdfs.TestHDFSContractSetTimes |
|   | hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestFSPermissionChecker |
|   | hadoop.hdfs.server.namenode.TestSecureNameNode |
|   | hadoop.hdfs.server.namenode.TestFileContextAcl |
|   | hadoop.hdfs.server.datanode.TestDataXceiverLazyPersistHint |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestDataTransferProtocol |
|   | hadoop.fs.viewfs.TestViewFsWithXAttrs |
|   | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
|   | hadoop.hdfs.server.datanode.TestDiskError |
|   | hadoop.hdfs.server.namenode.TestMalformedURLs |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | hadoop.fs.TestSWebHdfsFileContextMainOperations |
|   | hadoop.hdfs.TestIsMethodSupported |
|   | hadoop.hdfs.TestParallelShortCircuitReadNoChecksum |
|   | hadoop.hdfs.server.blockmanagement.TestAvailableSpaceBlockPlacementPolicy 
|
|   | hadoop.hdfs.TestFileCreationClient |
| 

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-11 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740370#comment-14740370
 ] 

Yi Liu commented on HDFS-9053:
--

Current patch only includes my implementation of B-Tree, I will post the patch 
integration with {{INodeDirectory}} later. 
Code logic of B-Tree is more complex than other normal data structure, the most 
challenged part is remove/merge, insert/split, iteration.  I refer to google 
implementation of B-Tree in Go language (https://github.com/google/btree), and 
borrow their merge, split logic. So the correctness of that part should be 
already verified in practice. I checked the B-Tree implementation carefully 
according to B-Tree definition, also did many tests to make sure B-Tree 
implementation correct.  Later, I will give some performance data of 
insertion/deletion/searching comparing between B-Tree and ArrayList.
The implementation in patch is low memory footprint, if the elements size is 
not large (less than the maximum degree of B-Tree node), the B-Tree will only 
have one root node and contains an array for the elements, and the children 
array is null. I use java array for the elements and children, and control the 
expand in the code myself to save the object overhead and can use fewer memory 
compared to use ArrayList.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree).patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-11 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740814#comment-14740814
 ] 

Yi Liu commented on HDFS-9053:
--

*1. Performance*
I choose 1024 as the B-Tree degree. For ArrayList in tests, the elements are in 
order, so for insertion/deletion/getting, we get the index using binary search 
first and do insertion/deletion/getting, this is the same behavior as in 
INodeDirectory.

>From the performance data below, B-Tree is much more better on 
>insertion/deletion, and almost same for get, for iteration, it's a bit worse 
>(since it's not to iterate continuous memory), but the iterations are fast 
>enough, so we can ignore.

-- *Insert (random)* (in milliseconds)
||Data Size||B-Tree||ArrayList||
|1K|4|4|
|64K|71|613|
|512K|494|38022|
|1M|1365|151855|
|2M|3584|688158|

-- *Delete (random)* (in milliseconds)
||Data Size||B-Tree||ArrayList||
|1K|4|4|
|64K|75|643|
|512K|658|40463|
|1M|1427|161492|
|2M|3724|718581|

-- *Get (random)* (in milliseconds)
||Data Size||B-Tree||ArrayList||
|1K|1|1|
|64K|16|17|
|512K|279|295|
|1M|759|755|
|2M|2055|2047|

-- *Iteration* (in milliseconds)
||Data Size||B-Tree||ArrayList||
|1K|n/a|n/a|
|64K|9|3|
|512K|30|16|
|1M|61|23|
|2M|143|40|

*2. Memory*
As stated in the description, B-Tree is very good because it solves #1, #3 
issues from memory aspect.
B-Tree may have few object overhead and additional array to store references to 
sub-trees. The overhead is relatively very small for large directories; for 
small directories, the overhead is small itself. Furthermore it's directory, so 
few overhead is acceptable, besides, I already try best effort to reduce the 
overhead while implementing.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree).patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-11 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740842#comment-14740842
 ] 

Yi Liu commented on HDFS-9053:
--

To integrate B-Tree with INodeDirectory, it's natural besides directory exposes 
{{getChildrenList}} and returns {{ReadOnlyList}}, B-Tree implements Iterable, 
it's not a list. Actually most other places want to iterate all elements, and 
some places want to iterate starting from certain child. So my plan is add an 
new Iterable interface which extends {{java.lang.Iterable}} and allow creating 
an iterator starting from certain child. Then in directory, getChildren returns 
{{ReadOnlyIterator}}, and {{ReadOnlyList}} implements it the ReadOnlyIterator. 
For snapshot, it makes sense to keep current behavior, still use the 
{{ReadOnlyList}}.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)