[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433142#comment-15433142
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:43 PM:
------------------------------------------------------------

I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
            +----------------+                    +-----------------+           
          +-----------------+
             |   Replica 1  |-----Next----> |    Replica  2  |-----Next----> |  
  Replica  3  |
             |   Replica 1  |<-----Prev---- |    Replica  2  |<-----Prev---- |  
  Replica  3  |
            +----------------+                    +-----------------+           
         +-----------------+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replicas are doubly linked in order of insertion time. The youngest is 
always at the head of the linked list, and the eldest is always at the tail.  
Removing the entries between the head and the tail doesn't need any lookup, 
because the replica knows its position in linked list by next and prev, thus 
remove is simple: change it's precedessor's and its succssor's next and prev. 
The order of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
            +----------------+                    +-----------------+           
         +-----------------+
      ---->|   Replica 1  |-----Next----> |    Replica  2  |-----Next----> |    
Replica  3  |
     <-----|   Replica 1  |<-----Prev---- |    Replica  2  |<-----Prev---- |    
Replica  3  |
            +----------------+                    +-----------------+           
         +-----------------+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---------------------------------------------------------------
>
>                 Key: HDFS-10690
>                 URL: https://issues.apache.org/jira/browse/HDFS-10690
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Fenghua Hu
>            Assignee: Fenghua Hu
>         Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
> private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to