[ https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353791#comment-14353791 ]
Colin Patrick McCabe commented on HDFS-7844: -------------------------------------------- bq. Stack wrote: High level, any notion of difference in perf when comparing native to offheap to current implementation? Reading off-heap memory using Unsafe#getLong is very quick. The main overhead from off-heap will be creating wrapper objects for things. But those are very short-lived objects that should never make it past the GC's young generation. The off-heap implementation may be able to use less memory for some things because we control the packing, which would speed things up (since fetching memory is a big cost in the block manager). We will see some numbers soon. bq. If we fail to pick up the configured memory manager (or the default), its worth a WARN log. Otherwise, folks may be confounded that they are getting the native memory manager though they asked for something else: We shouldn't need it because the creation of the hash table will log the name of the memory manager and its type at INFO. bq. This an arbitrary max? private final static long MAX_ADDRESS = 0x3fffffffffffffffL; It's just nice because it allows the code to be provably correct. I realize that the address will never get there in any reasonable length of time. bq. nit: make a method rather than dup the below...: ok :) bq. Is logging open at DEBUG but close at TRACE lead to confusion? Stumped debugger? {{MemoryManager#close}} is really only a unit test thing. But you're right, let's make it DEBUG since the open was DEBUG. bq. The close has to let out an IOE? What is the caller going to do w/ this IOE? The ByteArrayMemoryManager close error string construction is same as close on ProbingHashTable? It's a (mis)feature of {{java.io.Closeable}}. But I use that interface anyway, since Findbugs knows to nag us about it if we forget the close. A user defined interface wouldn't be known to FindBugs (although maybe there are annotations these days?) bq. I like the compromise put upon the Iterator (that resize is allowed while Iteration...) Seems appropriate given where this is to be deployed. Yeah, I think it will be useful. bq. On TestMemoryManager, maybe parameterize so once through with ByteArrayMemoryManager and then a run with the offheap implementation rather than have dedicated test for each: https://github.com/junit-team/junit/wiki/Parameterized-tests That's pretty cool. I think we should do that in a follow-on where we do more coverage stuff as well, though... bq. Yi wrote: It's better to assert maxLoadFactor < 1 (maybe < 0.8?), incorrect value will cause hash table failed. Good idea bq. \[maintainCompactness\] looks brief, but I think it's not effective. putInternal needs probing if the slot was not in the right place, so it's not effictive. {{putInternal}} does do probing, though. Maybe I'm missing something but I think this should work. Also, I can tell from the log messages that {{maintainCompactness}} is getting some testing. I didn't like the original implementation because it was duplicating a lot of code from {{putInternal}}. bq. I think ByteArrayMemoryManager can only used for test for it's performance reason. If SUN Unsafe is not available, we should use current implemention on Hadoop trunk. We will not remove current implementation on trunk, right? To my knowledge, all JVMs that are used in real hadoop clusters have access to {{sun.Unsafe}}. If we want to support a better on-heap memory allocator we can always work on that later. A more efficient on-heap implementation would be to take a big byte array and basically hand out offsets into it much the way malloc itself does. We're not going to keep around the old BlockManager code because that would be impossible. > Create an off-heap hash table implementation > -------------------------------------------- > > Key: HDFS-7844 > URL: https://issues.apache.org/jira/browse/HDFS-7844 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: HDFS-7836 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-7844-scl.001.patch, HDFS-7844-scl.002.patch, > HDFS-7844-scl.003.patch > > > Create an off-heap hash table implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)