[ 
https://issues.apache.org/jira/browse/HDFS-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023077#comment-17023077
 ] 

Stephen O'Donnell commented on HDFS-15140:
------------------------------------------

One problem I have with this change, is that I have no idea why the 
FoldedTreeSet structure degrades so badly after some period of time. It would 
be great to figure out what that is and fix it.

The datanode originally used a ResizableGSet for the block map, which is 
unsorted. However, along with the introduction of FoldedTreeSet, we introduced 
sorted block reports and the namenode now relies upon them being sorted.

Therefore, there are (at least) two choices here:

1. Replace FoldedTreeSet with another sorted structure. Eg TreeMap would work. 

2. Switch back to the GSet in the datanode, and then sort the blocks each time 
we create a block report. 

In benchmarks I ran, TreeMap seems to perform marginally better than 
FoldedTreeMap on adds and deletes. For gets, for sets under 8M entries, 
FoldedTreeSet is marginally faster and then for larger sets TreeMap is faster. 
The performance gap seems to widen as the sets get bigger in TreeMap's favour.

However, a major plus point to FoldedTreeSet, is its reduce memory overhead. 
Loading each structure with 1M entries and capturing a heap dump, we can see 
TreeMap has an overhead of about 40 bytes per entry, as each object is stored 
into a TreeMap$Entry:

{code}
 num     #instances         #bytes  class name
----------------------------------------------
   1:       1000000       40000000  java.util.TreeMap$Entry
   2:       1000129       24003096  java.lang.Long
   3:       1000000       24000000  com.sodonnell.BlockMock
{code}

FoldedTreeSet however, has an overhead of about 5 bytes per entry due to 
storing up to 64 objects in each tree node:

{code}
num     #instances         #bytes  class name
----------------------------------------------
   1:       1000000       24000000  com.sodonnell.BlockMock
   2:         15941        4263344  [Ljava.lang.Object;
   3:         15625         875000  com.sodonnell.FoldedTreeSet$Node
{code}

Therefore switching to TreeMap would cost about (40 - 5) * 1M blocks = 33.37MB 
of additional heap per 1M blocks and give comparable get, delete and add 
performance.

For option (2), we could capture the list of blocks for a block report, then 
drop the DN lock and sort them. Block Reports are sent per volume, so sorting 
volume by volume would be doable. 

Benchmarking a sort on an array of objects, where the compare is performed 
against a long gives:

{code}
Time taken to sort 500000 elements: 338
Time taken to sort 1000000 elements: 337
Time taken to sort 2000000 elements: 632
Time taken to sort 4000000 elements: 1438
Time taken to sort 8000000 elements: 3244
Time taken to sort 16000000 elements: 7260
{code}

The extra sort would require materializing another list of references to the 
objects, but even for 20M blocks (a lot for a DN) at 4 bytes per reference this 
is only about 76MB which would be quickly GC'ed afterwards.

> Replace FoldedTreeSet in Datanode with SortedSet or TreeMap
> -----------------------------------------------------------
>
>                 Key: HDFS-15140
>                 URL: https://issues.apache.org/jira/browse/HDFS-15140
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> Based on the problems discussed in HDFS-15131, I would like to explore 
> replacing the FoldedTreeSet structure in the datanode with a builtin Java 
> equivalent - either SortedSet or TreeMap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to