[ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215997#comment-15215997 ]
Anastasia Braginsky commented on HBASE-14921: --------------------------------------------- bq. Please say more on this. CellSet is a NavigableMap (not a ConcurrentNavigableMap) so I'm missing where we need the 'Concurrent' (is it in this patch?) Indeed CellSet is NavigableMap. However, CellSet has field “delegatee”, whose type is ConcurrentNavigableMap. We want CellSet to have different types of delegatees, but they need to be ConcurrentNavigableMaps. Hereby, I copy-paste the code: {quote} @InterfaceAudience.Private public class CellSet implements NavigableSet<Cell> { // Implemented on top of a {@link java.util.concurrent.ConcurrentSkipListMap} // Differ from CSLS in one respect, where CSLS does "Adds the specified element to this set if it // is not already present.", this implementation "Adds the specified element to this set EVEN // if it is already present overwriting what was there previous". // Otherwise, has same attributes as ConcurrentSkipListSet private final ConcurrentNavigableMap<Cell, Cell> delegatee; CellSet(final ConcurrentNavigableMap<Cell, Cell> m) { this.delegatee = m;} {quote} bq. Your new names are better. I considered 'flat' Map but shied away given its meaning over in spark/scala; I think it will be ok as long as you stick why its a 'flat' map in the javadoc on CellFlatMap. I’ll change the names and add explanations bq. How do you see this working? We do not control the size of inbound Cells. They could have some regularity and they could also be erratic to the extreme (What to do when a 1G cell arrives into a column family that up to this has been taking on metrics?) Excellent comment! Indeed we have a problem with Cells bigger then Chunks. So we have no choice, but to introduce the special variable-size very-large Chunks to support the very-large Cells. We’ll improve the code after the basic benchmarking. bq. I still do not see how the 3 * int is BYTES_IN_CELL. Not important. I think the problem here (and also in some other questions) is the name “Cell”. Because CellFlatMap doesn’t work with “Cell data” or with "true Cells” as you are (correctly) using this word. CellFlatMap works with some "cell representation”, using those 3 integers you can get all other Cell information, what is the “true Cell”. Should I change this to BYTES_FOR_CELL_REPRESENTATION ? {quote} It was introduced and off by default as is usual when new features. But as also happens this is our practice, the facility was 'forgotten'. It came up then when our Lars noticed it and wanted to remove it since it was not being used. It came up again recently in HBASE-15513 It would seem to make sense enabling it by default if we come up w/ a proper sizing. Having it on seems to mess w/ G1GC too. Would need to figure that. {quote} I took a look on HBASE-15513, it is very interesting. It looks like it favors turning ChunkPool on by default.It also looks very reasonable to me. I also took a very brief look on HBASE-15180. Specifically on the statement: bq. I noticed about 5-10% improvement on GC times and CPU utilization after disabling MSLAB only if using G1GC. Tuning MSLAB helps a little but I don't see to much advantage to have it enabled when G1GC is there. However, I do not see enough evidence in those measurement. How many workloads were tested? What where the sizes of Cells? Need to read this Jira more carefully. bq. We need to do up a memory management doc. Between your work on Segments, Segment pipelines, MSLAB chunks, chunk pools and bytebufferpools to host requests read from sockets, bucket cache and reference counting bucketcache bucket blocks at read time, it would be good if we had a map so we could trace a Cell on its travels. I’ll do the document little later on. > Memory optimizations > -------------------- > > Key: HBASE-14921 > URL: https://issues.apache.org/jira/browse/HBASE-14921 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Eshcar Hillel > Assignee: Anastasia Braginsky > Attachments: CellBlocksSegmentInMemStore.pdf, > CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, > HBASE-14921-V02.patch > > > Memory optimizations including compressed format representation and offheap > allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)