[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076272#comment-14076272 ] Daryn Sharp commented on HDFS-6709: --- Yes, we definitely generate a lot of garbage per call. Due to GC concerns, I've got work in progress to reduce the garbage generated which is why I'm concerned about even more garbage per call (inodes are repeatedly looked up far more than you think, working on single resolution). We've already tuned young generation to be in-line with what other companies running large scale services use. bq. Maybe you think I've chosen an easy example. Hmm... the operation that I can think of that touches the most inodes is recursive delete. Yes, deletes of large trees is a good example but in practice it's a rare operation on a large tree. However getContentSummary is run often for monitoring. It may take many seconds on just a subtree of some clusters. It may visit millions or tens of millions of inodes. Many block level operation fetch the block collection which is really the inode. Sometimes to verify the block isn't abandoned or to access other related blocks. Decommissioning has always been a problem in general. It will repeatedly crawl hundreds of thousands of blocks, each requiring a BC/inode lookup. The replication monitor is likely to indirectly require the BC/inode too when it runs every 3s. Refer to {{BlocksMap.getBlockCollection}} to see how many other places it's called. Even the unobtainable best case of eliminating the 1.5s per 6h CMS pause at the expense of increasing the frequency and/duration of the ParNew pauses is a huge loss. I suppose the proof is in a simulation. Perhaps a rudimentary test is instantiating a garbage inode and blockinfo every time one is looked up, yet still return the real one, so we can how well ParNew handles the onslaught of garbage. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075894#comment-14075894 ] Colin Patrick McCabe commented on HDFS-6709: bq. I'm just asking leading questions to make sure this approach is sound. Y! stands to lose a lot if this doesn't actually scale The questions are good... hopefully the answers are too! I'm just trying to make my answers as complete as I can. bq. To clarify the RTTI, I thought you meant more than just a per-instance reference to the class would be saved - although saving a reference is indeed great Yeah. It will shrink objects by 4 or 8 bytes each. It's not immaterial! Savings like these are why I think it will shrink memory consumption bq. Regarding atomicity/CAS, it's relevant because using misalignment (over-optimization?) prevents adding concurrency to data structures that aren't but should allow concurrency. I digress Isn't this a minor implementation detail, though? We don't currently use atomic ops on these data structures. If we go ahead with a layout that uses unaligned access, and someone later decides to make things atomic, we can always switch to an aligned layout. bq. I know about generational collection but I'm admittedly not an expert. Which young gen GC method does not pause? ParNew+CMS definitively pauses... Here are some quickly gathered 12-day observations from a moderately loaded, multi-thousand node, non-production cluster: I'm not a GC expert either. But from what I've read, does not pause is a pretty high bar to clear. I think even Azul's GC pauses on occasion for sub-millisecond intervals. For CMS and G1, everything I've read talks about tuning the young-gen collection in terms of target pause times. bq. We have production clusters over 2.5X larger that sustained over 3X ops/sec. This non-prod cluster is generating ~625MB of garbage/sec. How do you predict dynamic instantiation of INode and BlockInfo objects will help? They generally won't be promoted to old gen which should reduce the infrequent CMS collection times. BUT, will it dramatically increase the frequency of young collection and/or lead to premature tenuring? If you look at the code, we create temporary objects all over the place. For example, look at setTimes: {code} private void setTimesInt(String src, long mtime, long atime) throws IOException, UnresolvedLinkException { HdfsFileStatus resultingStat = null; FSPermissionChecker pc = getPermissionChecker(); checkOperation(OperationCategory.WRITE); byte[][] pathComponents = FSDirectory.getPathComponentsForReservedPath(src); writeLock(); try { checkOperation(OperationCategory.WRITE); checkNameNodeSafeMode(Cannot set times + src); src = FSDirectory.resolvePath(src, pathComponents, dir); // Write access is required to set access and modification times if (isPermissionEnabled) { checkPathAccess(pc, src, FsAction.WRITE); } final INodesInPath iip = dir.getINodesInPath4Write(src); final INode inode = iip.getLastINode(); {code} You can see we create: HdfsFileStatus (with at least 5 sub-objects. one of those, FsPermission, has 3 sub-objects of its own) FSPermissionChecker (which has at least 5 sub-objects inside it) pathComponents new src string INodesInPath (at least 2 sub-objects of its own) That's at least 21 temporary objects just in this code snippet, and I'm sure I missed a lot of things. I'm not including any of the functions that called or were called by this function, or any of the RPC or protobuf machinations. The average path depth is maybe between 5 and 8... would having 5 to 8 extra temporary objects to represent INodes we traversed substantially increase the GC load? I would say no. Maybe you think I've chosen an easy example. Hmm... the operation that I can think of that touches the most inodes is recursive delete. But we've known about the problems with this for a while... that's why JIRAs like HDFS-2938 addressed the problem. Arguably, an off-heap implementation is actually better here since we avoid creating a lot of trash in the tenured generation. And trash in the tenured generation leads to heap fragmentations (at least in CMS), and the dreaded full GC. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074189#comment-14074189 ] Kai Zheng commented on HDFS-6709: - I repeated the test in the post and sadly found it's true that DirectByteBuffer doesn't perform well as write. I'm communicating with Oracle about this and hopefully they could explain about it and address this in future Java version. It's interesting. Thanks. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075157#comment-14075157 ] Daryn Sharp commented on HDFS-6709: --- I sense a bit of condescension. We're all friends here. I'm just asking leading questions to make sure this approach is sound. Y! stands to lose a lot if this doesn't actually scale. To clarify the RTTI, I thought you meant more than just a per-instance reference to the class would be saved - although saving a reference is indeed great. Regarding atomicity/CAS, it's relevant because using misalignment (over-optimization?) prevents adding concurrency to data structures that aren't but should allow concurrency. I digress The important point for discussion is this: bq. No, because every modern GC uses generational collection. This means that short-lived instances are quickly cleaned up, without any pauses. I know about generational collection but I'm admittedly not an expert. Which young gen GC method does not pause? ParNew+CMS definitively pauses... Here are some quickly gathered 12-day observations from a moderately loaded, multi-thousand node, non-production cluster: * ParNew collections: ** frequency: every ~4s ** range: 66ms - 468ms ** avg: 130ms ** collects: 2.5GB * CMS old collections: ** frequency: 4 per day ** range: 1.0s - 2.9s ** avg: 1.5s We have production clusters over 2.5X larger that sustained over 3X ops/sec. This non-prod cluster is generating ~625MB of garbage/sec. How do you predict dynamic instantiation of INode and BlockInfo objects will help? They generally won't be promoted to old gen which should reduce the infrequent CMS collection times. BUT, will it dramatically increase the frequency of young collection and/or lead to premature tenuring? I'm honestly asking for design specifics of how this will help. Are you positive we won't get a little by giving up a lot? I'd love for my concerns to be unfounded. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073404#comment-14073404 ] Daryn Sharp commented on HDFS-6709: --- Questions/comments on the advantages: * I thought RTTI is per class, not instance? If yes, the savings are immaterial? * Using misaligned access may result in processor incompatibility, impact performance, introduces atomicity and CAS problems, concurrent access to adjacent misaligned memory in the cache line may be completely unsafe. * No references, only primitives can be stored off-heap, so how do value types (non-boxed primitives, correct?) apply? Wouldn't the instance managing the slab have methods that return the correct primitive? I think off-heap may be a win in some limited cases, but I'm struggling with how it will work in practice. Here's thoughts for clarification on actual application of the technique: # OO encapsulation and polymorphism are lost? # We can't store references anymore so we're reduced to primitives? # Let's say we used to have a class {{Foo}} with instance fields {{field1..field4}} of various types. {{FooManager.get(id)}} returns a {{Foo}} instance. But now a off-heap structure doesn't have any instantiated {{Foo}} entries else there is no GC benefit other than smaller instances to compact. # Does {{FooManager}} instantiate new {{Foo}} instances every time {{FooManager.get(id)}} is called? If yes, it generates a tremendous amount of garbage that defeats the GC benefit of going off heap. # Does {{FooManager}} try to maintain a limited pool of mutable {{Foo}} objects for reuse (ex. via a {{Foo#reinitialize(id, f1..f4)}}? (I've tried this technique elsewhere with degraded performance but maybe there's a good way to do) # If no {{Foo}} entries are allowed: ## does {{FooManager}} have methods for every data member that used to be encapsulated by {{Foo}}? Ie. {{FooManager.getField$N(id)}}? We'll have to make N-many calls probably within a critical section? ## Will apis change from {{doSomething(Foo foo, String msg, boolean flag)}} to {{doSomething(Long fooId, int fooField1, long fooField2, boolean fooField3, long fooField4, String msg, boolean flag)}}? ## If we add another field, do we go back and update all the apis again? Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073529#comment-14073529 ] Colin Patrick McCabe commented on HDFS-6709: bq. I thought RTTI is per class, not instance? If yes, the savings are immaterial? RTTI has to be per-instance. That is why you can pass around Object instances and cast them to whatever you want. Java has to store this information somewhere (think about it). If Java didn't store this, it would have no way to know whether the cast should succeed or not. Then you would be in the same situation as in C, where you can cast something to something else and get random garbage bits. bq. Using misaligned access may result in processor incompatibility, impact performance, introduces atomicity and CAS problems, concurrent access to adjacent misaligned memory in the cache line may be completely unsafe. I know about alignment restrictions. There are easy ways around that problem-- instead of getLong you use two getShort calls, etc., depending on the minimum alignment you can rely on. I don't see how CAS or atomicity are relevant, since we're not discussing atomic data structures. The performance benefits of storing less data can often cancel out the performance disadvantages of doing unaligned access. It depends on the scenario. bq. No references, only primitives can be stored off-heap, so how do value types (non-boxed primitives, correct?) apply? Wouldn't the instance managing the slab have methods that return the correct primitive? The point is that with control over the layout, you can do better. I guess a more concrete example might help explain this. bq. OO encapsulation and polymorphism are lost? Take a look at {{BlockInfo#triplets}}. How much OO encapsulation do you see in an array of Object[], with a special comment above about how to interpret each set of three entries? Most of the places we'd like to use off-heap storage are already full of hacks to abuse the Java type system to squeeze in a few extra bytes. Arrays of primitives, arrays of objects, with special conventions are routine. bq. Does FooManager instantiate new Foo instances every time FooManager.get(id) is called? If yes, it generates a tremendous amount of garbage that defeats the GC benefit of going off heap. No, because every modern GC uses generational collection. This means that short-lived instances are quickly cleaned up, without any pauses. The rest of the questions seem to be variants on this one. Think about it. All the code we have in FSNamesystem follows the pattern: lookup inode, do something to inode, done with inode. We can create temporary INode objects and they'll never make it to PermGen, since they don't stick around between RPC calls. Even if they somehow did (how?) with a dramatically smaller heap, the full GC would no longer be scary. And we'd get other performance benefits like the compressed oops optimizations. Anyway, the termporary inode objects would probably just be a thin objects which contain an offheap memory reference and a bunch of getters/setters, to avoid doing a lot of unnecessary serde. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073677#comment-14073677 ] Andrew Purtell commented on HDFS-6709: -- bq. No, because every modern GC uses generational collection. This means that short-lived instances are quickly cleaned up, without any pauses. ... and modern JVM versions have escape analysis enabled by default. Although there are limitations, simple objects that don't escape the local block (like Iterators) or the method can be allocated on the stack once native code is emitted by the server compiler. No heap allocation happens at all. You can use fastdebug JVM builds during dev to learn explicitly what your code is doing in this regard. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074027#comment-14074027 ] Kai Zheng commented on HDFS-6709: - bq. Sadly, while investigating off heap performance last fall, I found this article that claims off-heap reads via a DirectByteBuffer have horrible performance I just took a look at the post. Yes it claimed DirectByteBuffer has the same great write performance with Unsafe, but the read performance is horrible. Why, the reason isn't clear yet. Look at the following code from JRE, there seems to be no big difference between read and write in DirectByteBuffer: {code} public byte get() { return ((unsafe.getByte(ix(nextGetIndex(); } {code} {code} public ByteBuffer put(byte x) { unsafe.putByte(ix(nextPutIndex()), ((x))); return this; } {code} Questions here: 1) why read performs so bad than write if it's true? 2) Is it true that simply adding the index check would cause big performance loss? Some tests would be needed to make sure DirectByteBuffer is good enough meeting the needs here even in performance consideration, and the performance should be compared apple to apple exactly in the cases here. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071964#comment-14071964 ] Daryn Sharp commented on HDFS-6709: --- If {{Unsafe}} is being removed then I don't think we should create a dependency on it. Sadly, while investigating off heap performance last fall, I found this article that claims off-heap reads via a {{DirectByteBuffer}} have *horrible* performance: http://www.javacodegeeks.com/2013/08/which-memory-is-faster-heap-or-bytebuffer-or-direct.html bq. With a hash table and a linked list, we could probably start off-heaping things such as the triplets array in the BlockInfo object. How you do envision off-heaping triplets in conjunction with those collections? Linked list entries cost 48 bytes on a 64-bit jvm. A hash table entry costs 52 bytes. I know your goal is reduced GC while ours is reduced memory usage, so it'll be unacceptable if an off-heap implementation consumes even more memory - which incidentally will require GC and may cancel any off-heap benefit? And/or cause a performance degradation. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072197#comment-14072197 ] Colin Patrick McCabe commented on HDFS-6709: If Unsafe is removed, then we'll work around it the same way we work around lack of symlink or hardlink support, missing error information from mkdir, etc. As you can see in this patch, we don't need Unsafe, we just use it because it's faster. I would assume that if Unsafe is removed, there will be work on improving DirectByteBuffer and JNI performance or putting in place other alternate APIs that allow Java to function effectively on the server. Otherwise, the future of the platform doesn't look good. Even Haskell has an Unsafe package. bq. How you do envision off-heaping triplets in conjunction with those collections? Linked list entries cost 48 bytes on a 64-bit jvm. A hash table entry costs 52 bytes. I know your goal is reduced GC while ours is reduced memory usage, so it'll be unacceptable if an off-heap implementation consumes even more memory - which incidentally will require GC and may cancel any off-heap benefit? And/or cause a performance degradation. With off-heap objects, the sizes can be whatever we want. I think a basic linked list entry would be 16 bytes (two 8-byte prev and next pointers), plus the size of the payload. A hash table entry has no real minimum size, since again, it's just a memory region that contains whatever we want. We will be able to do a lot better than the JVM because of a few things: * the jvm must store runtime type information (RTTI) for each object, and we won't * the 64-bit jvm usually aligns to 8 bytes, but we don't have to * we don't have to implement a lock bit, or any of that * we can use value types, and current JVMs can't (although future ones will be able to) * the JVM doesn't know that you will create 1 million of an object; it just creates a generic object layout that must balance access speed and object size. Since we know, we can be more clever. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070021#comment-14070021 ] Kai Zheng commented on HDFS-6709: - Hi Colin, I learned the patch. The Slab stuffs look great. A question, look at the following codes I doubt if we need the path to call unsafe.getByte(). Either allocated from direct buffer or heap buffer, ByteBuffer interface can be used to get/set byte from/to the buffer. Note it would be good to avoid using Unsafe if possible. Please clarify if I misunderstand anything here, thanks. {code} + byte getByte(int offset) { +if (base != 0) { + return NativeIO.getUnsafe().getByte(null, base + offset); +} else { + buf.position(offset); + return buf.get(); +} + } {code} Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070637#comment-14070637 ] Colin Patrick McCabe commented on HDFS-6709: The {{ByteBuffer}} interface should be slower, since it needs to update offsets and do boundary checking. The {{ByteBuffer}} stuff is just a fallback if the JVM doesn't support direct buffers (although I'm not sure how many such JVMs are left in the wild). It also might be helpful for debugging. Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization
[ https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071222#comment-14071222 ] Kai Zheng commented on HDFS-6709: - bq. The {{ByteBuffer}} interface should be slower, since it needs to update offsets and do boundary checking. Agree. But it still meets the primary goal here, i.e. putting data off heap. I'm wondering if using Unsafe can be avoided here or not. There were some discussions with Oracle related to this and we were updated that, JDK9 is likely to block all accesses to sun.* classes.*. Therefore we might need to clean up such calls before that and avoid new ones like here. Regarding Unsafe situation, you might look at the following doc (provided by Max from Oracle), and give your insights. Thanks. http://cr.openjdk.java.net/~psandoz/dv14-uk-paul-sandoz-unsafe-the-situation.pdf Implement off-heap data structures for NameNode and other HDFS memory optimization -- Key: HDFS-6709 URL: https://issues.apache.org/jira/browse/HDFS-6709 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6709.001.patch We should investigate implementing off-heap data structures for NameNode and other HDFS memory optimization. These data structures could reduce latency by avoiding the long GC times that occur with large Java heaps. We could also avoid per-object memory overheads and control memory layout a little bit better. This also would allow us to use the JVM's compressed oops optimization even with really large namespaces, if we could get the Java heap below 32 GB for those cases. This would provide another performance and memory efficiency boost. -- This message was sent by Atlassian JIRA (v6.2#6252)