subject:"\[jira\] \[Commented\] \(HDFS\-6709\) Implement off\-heap data structures for NameNode and other HDFS memory optimization"

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-28 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076272#comment-14076272
]

Daryn Sharp commented on HDFS-6709:
---

Yes, we definitely generate a lot of garbage per call. Due to GC concerns,
I've got work in progress to reduce the garbage generated which is why I'm
concerned about even more garbage per call (inodes are repeatedly looked up far
more than you think, working on single resolution). We've already tuned young
generation to be in-line with what other companies running large scale services
use.

bq. Maybe you think I've chosen an easy example. Hmm... the operation that I
can think of that touches the most inodes is recursive delete.

Yes, deletes of large trees is a good example but in practice it's a rare
operation on a large tree. However getContentSummary is run often for
monitoring. It may take many seconds on just a subtree of some clusters. It
may visit millions or tens of millions of inodes.

Many block level operation fetch the block collection which is really the
inode. Sometimes to verify the block isn't abandoned or to access other
related blocks. Decommissioning has always been a problem in general. It will
repeatedly crawl hundreds of thousands of blocks, each requiring a BC/inode
lookup. The replication monitor is likely to indirectly require the BC/inode
too when it runs every 3s. Refer to {{BlocksMap.getBlockCollection}} to see
how many other places it's called.

Even the unobtainable best case of eliminating the 1.5s per 6h CMS pause at the
expense of increasing the frequency and/duration of the ParNew pauses is a huge
loss. I suppose the proof is in a simulation. Perhaps a rudimentary test is
instantiating a garbage inode and blockinfo every time one is looked up, yet
still return the real one, so we can how well ParNew handles the onslaught of
garbage.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

Key: HDFS-6709
URL: https://issues.apache.org/jira/browse/HDFS-6709
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-6709.001.patch

We should investigate implementing off-heap data structures for NameNode and
other HDFS memory optimization. These data structures could reduce latency
by avoiding the long GC times that occur with large Java heaps. We could
also avoid per-object memory overheads and control memory layout a little bit
better. This also would allow us to use the JVM's compressed oops
optimization even with really large namespaces, if we could get the Java heap
below 32 GB for those cases. This would provide another performance and
memory efficiency boost.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-27 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075894#comment-14075894
]

Colin Patrick McCabe commented on HDFS-6709:

bq. I'm just asking leading questions to make sure this approach is sound. Y!
stands to lose a lot if this doesn't actually scale

The questions are good... hopefully the answers are too! I'm just trying to
make my answers as complete as I can.

bq. To clarify the RTTI, I thought you meant more than just a per-instance
reference to the class would be saved - although saving a reference is indeed
great

Yeah. It will shrink objects by 4 or 8 bytes each. It's not immaterial!
Savings like these are why I think it will shrink memory consumption

bq. Regarding atomicity/CAS, it's relevant because using misalignment
(over-optimization?) prevents adding concurrency to data structures that aren't
but should allow concurrency. I digress

Isn't this a minor implementation detail, though? We don't currently use
atomic ops on these data structures. If we go ahead with a layout that uses
unaligned access, and someone later decides to make things atomic, we can
always switch to an aligned layout.

bq. I know about generational collection but I'm admittedly not an expert.
Which young gen GC method does not pause? ParNew+CMS definitively pauses...
Here are some quickly gathered 12-day observations from a moderately loaded,
multi-thousand node, non-production cluster:

I'm not a GC expert either. But from what I've read, does not pause is a
pretty high bar to clear. I think even Azul's GC pauses on occasion for
sub-millisecond intervals. For CMS and G1, everything I've read talks about
tuning the young-gen collection in terms of target pause times.

bq. We have production clusters over 2.5X larger that sustained over 3X
ops/sec. This non-prod cluster is generating ~625MB of garbage/sec. How do you
predict dynamic instantiation of INode and BlockInfo objects will help? They
generally won't be promoted to old gen which should reduce the infrequent CMS
collection times. BUT, will it dramatically increase the frequency of young
collection and/or lead to premature tenuring?

If you look at the code, we create temporary objects all over the place.

For example, look at setTimes:

{code}
private void setTimesInt(String src, long mtime, long atime)
throws IOException, UnresolvedLinkException {
HdfsFileStatus resultingStat = null;
FSPermissionChecker pc = getPermissionChecker();
checkOperation(OperationCategory.WRITE);
byte[][] pathComponents = FSDirectory.getPathComponentsForReservedPath(src);
writeLock();
try {
checkOperation(OperationCategory.WRITE);
checkNameNodeSafeMode(Cannot set times + src);
src = FSDirectory.resolvePath(src, pathComponents, dir);

// Write access is required to set access and modification times
if (isPermissionEnabled) {
checkPathAccess(pc, src, FsAction.WRITE);
}
final INodesInPath iip = dir.getINodesInPath4Write(src);
final INode inode = iip.getLastINode();
{code}

You can see we create:
HdfsFileStatus (with at least 5 sub-objects. one of those, FsPermission, has 3
sub-objects of its own)
FSPermissionChecker (which has at least 5 sub-objects inside it)
pathComponents
new src string
INodesInPath (at least 2 sub-objects of its own)

That's at least 21 temporary objects just in this code snippet, and I'm sure I
missed a lot of things. I'm not including any of the functions that called or
were called by this function, or any of the RPC or protobuf machinations. The
average path depth is maybe between 5 and 8... would having 5 to 8 extra
temporary objects to represent INodes we traversed substantially increase the
GC load? I would say no.

Maybe you think I've chosen an easy example. Hmm... the operation that I can
think of that touches the most inodes is recursive delete. But we've known
about the problems with this for a while... that's why JIRAs like HDFS-2938
addressed the problem. Arguably, an off-heap implementation is actually better
here since we avoid creating a lot of trash in the tenured generation. And
trash in the tenured generation leads to heap fragmentations (at least in CMS),
and the dreaded full GC.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

We should investigate implementing off-heap data structures for NameNode and
other HDFS memory

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-25 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074189#comment-14074189
 ] 

Kai Zheng commented on HDFS-6709:
-

I repeated the test in the post and sadly found it's true that DirectByteBuffer 
doesn't perform well as write. I'm communicating with Oracle about this and 
hopefully they could explain about it and address this in future Java version. 
It's interesting. Thanks.

 Implement off-heap data structures for NameNode and other HDFS memory 
 optimization
 --

 Key: HDFS-6709
 URL: https://issues.apache.org/jira/browse/HDFS-6709
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6709.001.patch


 We should investigate implementing off-heap data structures for NameNode and 
 other HDFS memory optimization.  These data structures could reduce latency 
 by avoiding the long GC times that occur with large Java heaps.  We could 
 also avoid per-object memory overheads and control memory layout a little bit 
 better.  This also would allow us to use the JVM's compressed oops 
 optimization even with really large namespaces, if we could get the Java heap 
 below 32 GB for those cases.  This would provide another performance and 
 memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-25 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075157#comment-14075157
]

Daryn Sharp commented on HDFS-6709:
---

I sense a bit of condescension. We're all friends here. I'm just asking
leading questions to make sure this approach is sound. Y! stands to lose a lot
if this doesn't actually scale. To clarify the RTTI, I thought you meant more
than just a per-instance reference to the class would be saved - although
saving a reference is indeed great. Regarding atomicity/CAS, it's relevant
because using misalignment (over-optimization?) prevents adding concurrency to
data structures that aren't but should allow concurrency. I digress

The important point for discussion is this:
bq. No, because every modern GC uses generational collection. This means that
short-lived instances are quickly cleaned up, without any pauses.

I know about generational collection but I'm admittedly not an expert. Which
young gen GC method does not pause? ParNew+CMS definitively pauses... Here
are some quickly gathered 12-day observations from a moderately loaded,
multi-thousand node, non-production cluster:
* ParNew collections:
** frequency: every ~4s
** range: 66ms - 468ms
** avg: 130ms
** collects: 2.5GB
* CMS old collections:
** frequency: 4 per day
** range: 1.0s - 2.9s
** avg: 1.5s

We have production clusters over 2.5X larger that sustained over 3X ops/sec.
This non-prod cluster is generating ~625MB of garbage/sec. How do you predict
dynamic instantiation of INode and BlockInfo objects will help? They generally
won't be promoted to old gen which should reduce the infrequent CMS collection
times. BUT, will it dramatically increase the frequency of young collection
and/or lead to premature tenuring?

I'm honestly asking for design specifics of how this will help. Are you
positive we won't get a little by giving up a lot? I'd love for my concerns to
be unfounded.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-24 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073404#comment-14073404
 ] 

Daryn Sharp commented on HDFS-6709:
---

Questions/comments on the advantages:
* I thought RTTI is per class, not instance?  If yes, the savings are 
immaterial?
* Using misaligned access may result in processor incompatibility, impact 
performance, introduces atomicity and CAS problems, concurrent access to 
adjacent misaligned memory in the cache line may be completely unsafe.
* No references, only primitives can be stored off-heap, so how do value types 
(non-boxed primitives, correct?) apply?  Wouldn't the instance managing the 
slab have methods that return the correct primitive?

I think off-heap may be a win in some limited cases, but I'm struggling with 
how it will work in practice.  Here's thoughts for clarification on actual 
application of the technique:
# OO encapsulation and polymorphism are lost?
# We can't store references anymore so we're reduced to primitives?
# Let's say we used to have a class {{Foo}} with instance fields 
{{field1..field4}} of various types.  {{FooManager.get(id)}} returns a {{Foo}} 
instance.  But now a off-heap structure doesn't have any instantiated {{Foo}} 
entries else there is no GC benefit other than smaller instances to compact.
# Does {{FooManager}} instantiate new {{Foo}} instances every time 
{{FooManager.get(id)}} is called?  If yes, it generates a tremendous amount of 
garbage that defeats the GC benefit of going off heap.
# Does {{FooManager}} try to maintain a limited pool of mutable {{Foo}} objects 
for reuse (ex. via a {{Foo#reinitialize(id, f1..f4)}}?  (I've tried this 
technique elsewhere with degraded performance but maybe there's a good way to 
do)
# If no {{Foo}} entries are allowed:
## does {{FooManager}} have methods for every data member that used to be 
encapsulated by {{Foo}}?  Ie. {{FooManager.getField$N(id)}}?  We'll have to 
make N-many calls probably within a critical section?
## Will apis change from {{doSomething(Foo foo, String msg, boolean flag)}} to 
{{doSomething(Long fooId, int fooField1, long fooField2, boolean fooField3, 
long fooField4, String msg, boolean flag)}}?
## If we add another field, do we go back and update all the apis again?


 Implement off-heap data structures for NameNode and other HDFS memory 
 optimization
 --

 Key: HDFS-6709
 URL: https://issues.apache.org/jira/browse/HDFS-6709
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6709.001.patch


 We should investigate implementing off-heap data structures for NameNode and 
 other HDFS memory optimization.  These data structures could reduce latency 
 by avoiding the long GC times that occur with large Java heaps.  We could 
 also avoid per-object memory overheads and control memory layout a little bit 
 better.  This also would allow us to use the JVM's compressed oops 
 optimization even with really large namespaces, if we could get the Java heap 
 below 32 GB for those cases.  This would provide another performance and 
 memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-24 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073529#comment-14073529
]

Colin Patrick McCabe commented on HDFS-6709:

bq. I thought RTTI is per class, not instance? If yes, the savings are
immaterial?

RTTI has to be per-instance. That is why you can pass around Object instances
and cast them to whatever you want. Java has to store this information
somewhere (think about it). If Java didn't store this, it would have no way to
know whether the cast should succeed or not. Then you would be in the same
situation as in C, where you can cast something to something else and get
random garbage bits.

bq. Using misaligned access may result in processor incompatibility, impact
performance, introduces atomicity and CAS problems, concurrent access to
adjacent misaligned memory in the cache line may be completely unsafe.

I know about alignment restrictions. There are easy ways around that problem--
instead of getLong you use two getShort calls, etc., depending on the minimum
alignment you can rely on. I don't see how CAS or atomicity are relevant,
since we're not discussing atomic data structures. The performance benefits of
storing less data can often cancel out the performance disadvantages of doing
unaligned access. It depends on the scenario.

bq. No references, only primitives can be stored off-heap, so how do value
types (non-boxed primitives, correct?) apply? Wouldn't the instance managing
the slab have methods that return the correct primitive?

The point is that with control over the layout, you can do better. I guess a
more concrete example might help explain this.

bq. OO encapsulation and polymorphism are lost?

Take a look at {{BlockInfo#triplets}}. How much OO encapsulation do you see in
an array of Object[], with a special comment above about how to interpret each
set of three entries? Most of the places we'd like to use off-heap storage are
already full of hacks to abuse the Java type system to squeeze in a few extra
bytes. Arrays of primitives, arrays of objects, with special conventions are
routine.

bq. Does FooManager instantiate new Foo instances every time FooManager.get(id)
is called? If yes, it generates a tremendous amount of garbage that defeats the
GC benefit of going off heap.

No, because every modern GC uses generational collection. This means that
short-lived instances are quickly cleaned up, without any pauses.

The rest of the questions seem to be variants on this one. Think about it.
All the code we have in FSNamesystem follows the pattern: lookup inode, do
something to inode, done with inode. We can create temporary INode objects and
they'll never make it to PermGen, since they don't stick around between RPC
calls. Even if they somehow did (how?) with a dramatically smaller heap, the
full GC would no longer be scary. And we'd get other performance benefits like
the compressed oops optimizations. Anyway, the termporary inode objects would
probably just be a thin objects which contain an offheap memory reference and a
bunch of getters/setters, to avoid doing a lot of unnecessary serde.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-24 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073677#comment-14073677
]

Andrew Purtell commented on HDFS-6709:
--

bq. No, because every modern GC uses generational collection. This means that
short-lived instances are quickly cleaned up, without any pauses.

... and modern JVM versions have escape analysis enabled by default. Although
there are limitations, simple objects that don't escape the local block (like
Iterators) or the method can be allocated on the stack once native code is
emitted by the server compiler. No heap allocation happens at all. You can use
fastdebug JVM builds during dev to learn explicitly what your code is doing in
this regard.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-24 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074027#comment-14074027
 ] 

Kai Zheng commented on HDFS-6709:
-

bq.  Sadly, while investigating off heap performance last fall, I found this 
article that claims off-heap reads via a DirectByteBuffer have horrible 
performance
I just took a look at the post. Yes it claimed DirectByteBuffer has the same 
great write performance with Unsafe, but the read performance is horrible. Why, 
the reason isn't clear yet. Look at the following code from JRE, there seems to 
be no big difference between read and write in DirectByteBuffer:
{code}
public byte get() {
return ((unsafe.getByte(ix(nextGetIndex();
}
{code}
{code}
public ByteBuffer put(byte x) {
unsafe.putByte(ix(nextPutIndex()), ((x)));
return this;
}
{code}
Questions here: 1) why read performs so bad than write if it's true? 2) Is it 
true that simply adding the index check would cause big performance loss? 
Some tests would be needed to make sure DirectByteBuffer is good enough meeting 
the needs here even in performance consideration, and the performance should be 
compared apple to apple exactly in the cases here.

 Implement off-heap data structures for NameNode and other HDFS memory 
 optimization
 --

 Key: HDFS-6709
 URL: https://issues.apache.org/jira/browse/HDFS-6709
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6709.001.patch


 We should investigate implementing off-heap data structures for NameNode and 
 other HDFS memory optimization.  These data structures could reduce latency 
 by avoiding the long GC times that occur with large Java heaps.  We could 
 also avoid per-object memory overheads and control memory layout a little bit 
 better.  This also would allow us to use the JVM's compressed oops 
 optimization even with really large namespaces, if we could get the Java heap 
 below 32 GB for those cases.  This would provide another performance and 
 memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-23 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071964#comment-14071964
]

Daryn Sharp commented on HDFS-6709:
---

If {{Unsafe}} is being removed then I don't think we should create a dependency
on it. Sadly, while investigating off heap performance last fall, I found this
article that claims off-heap reads via a {{DirectByteBuffer}} have *horrible*
performance:

http://www.javacodegeeks.com/2013/08/which-memory-is-faster-heap-or-bytebuffer-or-direct.html

bq. With a hash table and a linked list, we could probably start off-heaping
things such as the triplets array in the BlockInfo object.

How you do envision off-heaping triplets in conjunction with those collections?
Linked list entries cost 48 bytes on a 64-bit jvm. A hash table entry costs
52 bytes. I know your goal is reduced GC while ours is reduced memory usage,
so it'll be unacceptable if an off-heap implementation consumes even more
memory - which incidentally will require GC and may cancel any off-heap
benefit? And/or cause a performance degradation.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-23 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072197#comment-14072197
]

Colin Patrick McCabe commented on HDFS-6709:

If Unsafe is removed, then we'll work around it the same way we work around
lack of symlink or hardlink support, missing error information from mkdir, etc.
As you can see in this patch, we don't need Unsafe, we just use it because
it's faster. I would assume that if Unsafe is removed, there will be work on
improving DirectByteBuffer and JNI performance or putting in place other
alternate APIs that allow Java to function effectively on the server.
Otherwise, the future of the platform doesn't look good. Even Haskell has an
Unsafe package.

bq. How you do envision off-heaping triplets in conjunction with those
collections? Linked list entries cost 48 bytes on a 64-bit jvm. A hash table
entry costs 52 bytes. I know your goal is reduced GC while ours is reduced
memory usage, so it'll be unacceptable if an off-heap implementation consumes
even more memory - which incidentally will require GC and may cancel any
off-heap benefit? And/or cause a performance degradation.

With off-heap objects, the sizes can be whatever we want. I think a basic
linked list entry would be 16 bytes (two 8-byte prev and next pointers), plus
the size of the payload. A hash table entry has no real minimum size, since
again, it's just a memory region that contains whatever we want. We will be
able to do a lot better than the JVM because of a few things:
* the jvm must store runtime type information (RTTI) for each object, and we
won't
* the 64-bit jvm usually aligns to 8 bytes, but we don't have to
* we don't have to implement a lock bit, or any of that
* we can use value types, and current JVMs can't (although future ones will be
able to)
* the JVM doesn't know that you will create 1 million of an object; it just
creates a generic object layout that must balance access speed and object size.
Since we know, we can be more clever.

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-22 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070021#comment-14070021
 ] 

Kai Zheng commented on HDFS-6709:
-

Hi Colin,

I learned the patch. The Slab stuffs look great. A question, look at the 
following codes I doubt if we need the path to call unsafe.getByte(). Either 
allocated from direct buffer or heap buffer, ByteBuffer interface can be used 
to get/set byte from/to the buffer. Note it would be good to avoid using Unsafe 
if possible. Please clarify if I misunderstand anything here, thanks.

{code}
+  byte getByte(int offset) {
+if (base != 0) {
+  return NativeIO.getUnsafe().getByte(null, base + offset);
+} else {
+  buf.position(offset);
+  return buf.get();
+}
+  }
{code}


 Implement off-heap data structures for NameNode and other HDFS memory 
 optimization
 --

 Key: HDFS-6709
 URL: https://issues.apache.org/jira/browse/HDFS-6709
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6709.001.patch


 We should investigate implementing off-heap data structures for NameNode and 
 other HDFS memory optimization.  These data structures could reduce latency 
 by avoiding the long GC times that occur with large Java heaps.  We could 
 also avoid per-object memory overheads and control memory layout a little bit 
 better.  This also would allow us to use the JVM's compressed oops 
 optimization even with really large namespaces, if we could get the Java heap 
 below 32 GB for those cases.  This would provide another performance and 
 memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-22 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070637#comment-14070637
 ] 

Colin Patrick McCabe commented on HDFS-6709:


The {{ByteBuffer}} interface should be slower, since it needs to update offsets 
and do boundary checking.  The {{ByteBuffer}} stuff is just a fallback if the 
JVM doesn't support direct buffers (although I'm not sure how many such JVMs 
are left in the wild).  It also might be helpful for debugging.

 Implement off-heap data structures for NameNode and other HDFS memory 
 optimization
 --

 Key: HDFS-6709
 URL: https://issues.apache.org/jira/browse/HDFS-6709
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6709.001.patch


 We should investigate implementing off-heap data structures for NameNode and 
 other HDFS memory optimization.  These data structures could reduce latency 
 by avoiding the long GC times that occur with large Java heaps.  We could 
 also avoid per-object memory overheads and control memory layout a little bit 
 better.  This also would allow us to use the JVM's compressed oops 
 optimization even with really large namespaces, if we could get the Java heap 
 below 32 GB for those cases.  This would provide another performance and 
 memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

2014-07-22 Thread Kai Zheng (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071222#comment-14071222
]

Kai Zheng commented on HDFS-6709:
-

bq. The {{ByteBuffer}} interface should be slower, since it needs to update
offsets and do boundary checking.
Agree. But it still meets the primary goal here, i.e. putting data off heap.

I'm wondering if using Unsafe can be avoided here or not. There were some
discussions with Oracle related to this and we were updated that, JDK9 is
likely to block all accesses to sun.* classes.*. Therefore we might need to
clean up such calls before that and avoid new ones like here. Regarding Unsafe
situation, you might look at the following doc (provided by Max from Oracle),
and give your insights. Thanks.
http://cr.openjdk.java.net/~psandoz/dv14-uk-paul-sandoz-unsafe-the-situation.pdf

Implement off-heap data structures for NameNode and other HDFS memory
optimization
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

13 matches

Site Navigation

Mail list logo

Footer information