[ https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433142#comment-15433142 ]
Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:38 PM: ------------------------------------------------------------ I would like to explain the solution here. Currently, TreeMap is used to track all the ShortCircuitReplica entries in the order of being inserted. These entries could be removed from TreeMap in two cases. The first is when the entry is accessed again, it will be removed from TreeMap. Please note that the entry could be anyone in the TreeMap. The other is when the entry is evicted due to treemap size limitation, in this case, only the eldest entry will be removed. Removal is a costly operation for the first case, because looking up ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve it, we design a new data structure LruList, which entirely eliminates costly look-up operation. +----------------+ +-----------------+ +-----------------+ | | | | | | ---->| Replica 1 |-----Next----> | Replica 2 |-----Next----> | Replica 3 | | | | | | | <-----| |<-----Prev---- | |<-----Prev---- | | +----------------+ +-----------------+ +-----------------+ We introduced two references in ShortCircuitReplica objects. Reference Next points to the elder ShortCircuitReplica and Prev points to the younger one. All the replica is doubly linked in order of insertion time. The youngest is always at the head of the linked list, and the eldest is always at the tail. Removing the entries between the head and the tail doesn't need any lookup, because the replica knows its position in linked list by next and prev, thus remove is simple: change it's precedessor's and its succssor's next and prev. The order of operation is always O(1). For insertion, the youngest entry is always be added to the head, thus the operation is also O(1). Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) operation for insertion/lookup/removal. Here comes a brief test result: Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each process。 Total QPS: w/o patch: 95K w/ patch: 135K The performance gain is (135 - 95) / 95 = 42%. Suggestions/comments are very welcomed. was (Author: fenghua_hu): I would like to explain the solution here. Currently, TreeMap is used to track all the ShortCircuitReplica entries in the order of being inserted. These entries could be removed from TreeMap in two cases. The first is when the entry is accessed again, it will be removed from TreeMap. Please note that the entry could be anyone in the TreeMap. The other is when the entry is evicted due to treemap size limitation, in this case, only the eldest entry will be removed. Removal is a costly operation for the first case, because looking up ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve it, we design a new data structure LruList, which entirely eliminates costly look-up operation. +----------------+ +-----------------+ |Replica 1 | |Replica 2 | +----------------+ +-----------------+ ---->| Next |-----> | Next |------->... +----------------+ +-----------------+ <-----| Prev |<-----| Prev |<------... +----------------+ +-----------------+ | | | | +----------------+ +-----------------+ We introduced two references in ShortCircuitReplica objects. Reference Next points to the elder ShortCircuitReplica and Prev points to the younger one. All the replica is doubly linked in order of insertion time. The youngest is always at the head of the linked list, and the eldest is always at the tail. Removing the entries between the head and the tail doesn't need any lookup, because the replica knows its position in linked list by next and prev, thus remove is simple: change it's precedessor's and its succssor's next and prev. The order of operation is always O(1). For insertion, the youngest entry is always be added to the head, thus the operation is also O(1). Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) operation for insertion/lookup/removal. Here comes a brief test result: Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each process。 Total QPS: w/o patch: 95K w/ patch: 135K The performance gain is (135 - 95) / 95 = 42%. Suggestions/comments are very welcomed. > Optimize insertion/removal of replica in ShortCircuitCache.java > --------------------------------------------------------------- > > Key: HDFS-10690 > URL: https://issues.apache.org/jira/browse/HDFS-10690 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0-alpha2 > Reporter: Fenghua Hu > Assignee: Fenghua Hu > Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, > ShortCircuitCache_LinkedMap.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > Currently in ShortCircuitCache, two TreeMap objects are used to track the > cached replicas. > private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>(); > private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new > TreeMap<>(); > TreeMap employs Red-Black tree for sorting. This isn't an issue when using > traditional HDD. But when using high-performance SSD/PCIe Flash, the cost > inserting/removing an entry becomes considerable. > To mitigate it, we designed a new list-based for replica tracking. > The list is a double-linked FIFO. FIFO is time-based, thus insertion is a > very low cost operation. On the other hand, list is not lookup-friendly. To > address this issue, we introduce two references into ShortCircuitReplica > object. > ShortCircuitReplica next = null; > ShortCircuitReplica prev = null; > In this way, lookup is not needed when removing a replica from the list. We > only need to modify its predecessor's and successor's references in the lists. > Our tests showed up to 15-50% performance improvement when using PCIe flash > as storage media. > The original patch is against 2.6.4, now I am porting to Hadoop trunk, and > patch will be posted soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org