Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Michael Segel
I would disagree. While you can tune the system to not over subscribe, I would rather have it hit swap then fail. Especially on long running jobs. If we look at oversubscription on Hadoop clusters which are not running HBase… they survive. Its when you have things like HBase that don’t

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Jörn Franke
this is probably the best way to manage it On Thu, Sep 22, 2016 at 6:42 PM, Josh Rosen wrote: > Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped > at spark.memory.offHeap.size bytes. This is purposely specified as an > absolute size rather than

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Sean Owen
I don't think I'd enable swap on a cluster. You'd rather processes fail than grind everything to a halt. You'd buy more memory or optimize memory before trading it for I/O. On Thu, Sep 22, 2016 at 6:29 PM, Michael Segel wrote: > Ok… gotcha… wasn’t sure that YARN just

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Jörn Franke
Well off-heap memory will be from an OS perspective be visible under the JVM process (you see the memory consumption of the jvm process growing when using off-heap memory). There is one exception: if there is another process, which has not been started by the JVM and "lives" outside the JVM, but

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Michael Segel
Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation and ignored the off heap. WRT over all OS memory… this would be one reason why I’d keep a decent amount of swap around. (Maybe even putting it on a fast device like an .m2 or PCIe flash drive…. > On Sep 22, 2016, at

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Josh Rosen
Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped at spark.memory.offHeap.size bytes. This is purposely specified as an absolute size rather than as a percentage of the heap size in order to allow end users to tune Spark so that its overall memory consumption stays within

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Sean Owen
It's looking at the whole process's memory usage, and doesn't care whether the memory is used by the heap or not within the JVM. Of course, allocating memory off-heap still counts against you at the OS level. On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel wrote: >

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Michael Segel
Thanks for the response Sean. But how does YARN know about the off-heap memory usage? That’s the piece that I’m missing. Thx again, -Mike > On Sep 21, 2016, at 10:09 PM, Sean Owen wrote: > > No, Xmx only controls the maximum size of on-heap allocated memory. > The JVM

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-21 Thread Sean Owen
No, Xmx only controls the maximum size of on-heap allocated memory. The JVM doesn't manage/limit off-heap (how could it? it doesn't know when it can be released). The answer is that YARN will kill the process because it's using more memory than it asked for. A JVM is always going to use a little

Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-21 Thread Jörn Franke
All off-heap memory is still managed by the JVM process. If you limit the memory of this process then you limit the memory. I think the memory of the JVM process could be limited via the xms/xmx parameter of the JVM. This can be configured via spark options for yarn (be aware that they are

Off Heap (Tungsten) Memory Usage / Management ?

2016-09-21 Thread Michael Segel
I’ve asked this question a couple of times from a friend who didn’t know the answer… so I thought I would try here. Suppose we launch a job on a cluster (YARN) and we have set up the containers to be 3GB in size. What does that 3GB represent? I mean what happens if we end up using 2-3GB