[ https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756557#comment-13756557 ]
Anoop Sam John commented on HBASE-3484: --------------------------------------- Trying out some thing like how there can be multiple HFiles within a store. Within a memstore there can be more than one KeyValueSkipListSet object at a time (and so CSLM) For each of the KeyValueSkipListSet slice there is a configurable max size . Initially there will be only one KeyValueSkipListSet in the Memstore. Once the size reaches the threshold, we will create another KeyValueSkipListSet (So a new CSLM) and new KVs are inserted into this. The old datastructure wont get KVs again. So within *one KeyValueSkipListSet* KVs will be sorted. This continues and finally all these KeyValueSkipListSets are taken in to Snapshots and written to HFile. We will need changes in the MemstoreScanner so as to consider this as a heap and emit KVs in the correct order. Once the flush is over again there will be only one KeyValueSkipListSet in a memstore and this continues. Basically trying to avoid a single CSLM to grow to very big size with more #entries. By default there is no max size for a slice so single CSLM becoming bigger as long as KVs are inserted into memstore before a flush. Done a POC and tested also. The initial test with LoadTestTool shows that we can avoid the decrease in throughput with size of the memstore. Will attach a patch with this change by this weekend. > Replace memstore's ConcurrentSkipListMap with our own implementation > -------------------------------------------------------------------- > > Key: HBASE-3484 > URL: https://issues.apache.org/jira/browse/HBASE-3484 > Project: HBase > Issue Type: Improvement > Components: Performance > Affects Versions: 0.92.0 > Reporter: Todd Lipcon > Priority: Critical > Attachments: hierarchical-map.txt, memstore_drag.png > > > By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements > to it for our use case in MemStore: > - add an iterator.replace() method which should allow us to do upsert much > more cheaply > - implement a Set directly without having to do Map<KeyValue,KeyValue> to > save one reference per entry > It turns out CSLM is in public domain from its development as part of JSR > 166, so we should be OK with licenses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira