[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399796#comment-15399796
 ] 

Xiaoyu Yao edited comment on HDFS-10690 at 7/29/16 6:14 PM:
------------------------------------------------------------

[~fenghua_hu], thanks for updating the patch with the perf numbers. 

bq. LinkedHashMap needs to do two things: 1. calculate key and insert hashmap, 
2, insert into a linked list, hence in theory, it does more things than LruList 
in this patch. Yes, LinkedHashMap does make code cleaner, but its performances 
could be compromised. Please correct me if i am wrong.

I think LinkedHashMap is a better choice based on the following reasons unless 
the perf number shows significant difference. 

1. We prefer not to reinvent the wheel, which adds unnecessary test/maintenance 
cost for the project. Search though the Hadoop code base, we can see many LRU 
implementation based on well-tested LinkedHashMap. This can not only improve 
the performance but also simplify the existing code in ShortCuicuitCache. 

2. The hashmap inside LinkedHashMap does have some overhead of key calculation. 
But it offers faster lookup than a pure double link list, which is also very 
important for ShortCircuitCache. 


was (Author: xyao):
[~fenghua_hu], thanks for updating the patch with the perf numbers. 

bq. LinkedHashMap needs to do two things: 1. calculate key and insert hashmap, 
2, insert into a linked list, hence in theory, it does more things than LruList 
in this patch. Yes, LinkedHashMap does make code cleaner, but its performances 
could be compromised. Please correct me if i am wrong.

I think LinkedHashMap is a better choice based on the following reasons unless 
the perf number shows significant difference. 

1. We prefer not to reinvent the wheel, which adds unnecessary test/maintenance 
cost for the project. Search though the Hadoop code base, we can see many LRU 
implementation based on well-tested LinkedHashMap. This can not only improve 
the performance and simplify the existing code in ShortCuicuitCache. 

2. The hashmap inside LinkedHashMap does have some overhead of key calculation. 
But it offers faster lookup than a pure double link list, which is also very 
important for ShortCircuitCache. 

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---------------------------------------------------------------
>
>                 Key: HDFS-10690
>                 URL: https://issues.apache.org/jira/browse/HDFS-10690
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Fenghua Hu
>            Assignee: Fenghua Hu
>         Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
> private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to