Re: Heap Memory in Spark 2.3.0

2018-07-17 Thread Imran Rashid
perhaps this is https://issues.apache.org/jira/browse/SPARK-24578?

that was reported as a performance issue, not OOMs, but its in the exact
same part of the code and the change was to reduce the memory pressure
significantly.

On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey 
wrote:

> Hello.
>
> I am working to move our system from Spark 2.1.0 to Spark 2.3.0.  Our
> system is running on Spark managed via Yarn.  During the course of the move
> I mirrored the settings to our new cluster.  However, on the Spark 2.3.0
> cluster with the same resource allocation I am seeing a number of executors
> die due to OOM:
>
> 18/07/16 17:23:06 ERROR YarnClusterScheduler: Lost executor 5 on wn80:
> Container killed by YARN for exceeding memory limits. 22.0 GB of 22 GB
> physical memory used. Consider boosting spark.yarn.executor.memoryOver
> head.
>
> I increased spark.driver.memoryOverhead and spark.executor.memoryOverhead
> from the default (384) to 2048.  I went ahead and disabled vmem and pmem
> Yarn checks on the cluster.  With that disabled I see the following error:
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> io.netty.buffer.CompositeByteBuf.nioBuffer(CompositeByteBuf.java:1466)
>   at io.netty.buffer.AbstractByteBuf.nioBuffer(AbstractByteBuf.java:1203)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:140)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
>   at 
> io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
>   at 
> io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831)
>   at 
> io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1041)
>   at 
> io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:300)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:222)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:146)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
>
>
>
> Looking at GC:
>
>[Eden: 16.0M(8512.0M)->0.0B(8484.0M) Survivors: 4096.0K->4096.0K Heap: 
> 8996.7M(20.0G)->8650.3M(20.0G)]
>  [Times: user=0.03 sys=0.01, real=0.01 secs]
>  794.949: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: 
> allocation request failed, allocation request: 401255000 bytes]
>  794.949: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion 
> amount: 401255000 bytes, attempted expansion amount: 402653184 bytes]
>  794.949: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap 
> already fully expanded]
> 

Heap Memory in Spark 2.3.0

2018-07-16 Thread Bryan Jeffrey
Hello.

I am working to move our system from Spark 2.1.0 to Spark 2.3.0.  Our
system is running on Spark managed via Yarn.  During the course of the move
I mirrored the settings to our new cluster.  However, on the Spark 2.3.0
cluster with the same resource allocation I am seeing a number of executors
die due to OOM:

18/07/16 17:23:06 ERROR YarnClusterScheduler: Lost executor 5 on wn80:
Container killed by YARN for exceeding memory limits. 22.0 GB of 22 GB
physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

I increased spark.driver.memoryOverhead and spark.executor.memoryOverhead
from the default (384) to 2048.  I went ahead and disabled vmem and pmem
Yarn checks on the cluster.  With that disabled I see the following error:

Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
io.netty.buffer.CompositeByteBuf.nioBuffer(CompositeByteBuf.java:1466)
at io.netty.buffer.AbstractByteBuf.nioBuffer(AbstractByteBuf.java:1203)
at 
org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:140)
at 
org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
at 
io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at 
io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at 
io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831)
at 
io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1041)
at 
io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:300)
at 
org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:222)
at 
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:146)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)



Looking at GC:

   [Eden: 16.0M(8512.0M)->0.0B(8484.0M) Survivors: 4096.0K->4096.0K
Heap: 8996.7M(20.0G)->8650.3M(20.0G)]
 [Times: user=0.03 sys=0.01, real=0.01 secs]
 794.949: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
allocation request failed, allocation request: 401255000 bytes]
 794.949: [G1Ergonomics (Heap Sizing) expand the heap, requested
expansion amount: 401255000 bytes, attempted expansion amount:
402653184 bytes]
 794.949: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
heap already fully expanded]
794.949: [Full GC (Allocation Failure) 801.766: [SoftReference, 0
refs, 0.359 secs]801.766: [WeakReference, 1604 refs, 0.0001191
secs]801.766: [FinalReference, 1180 refs, 0.882 secs]801.766:
[PhantomReference, 0 refs, 12 refs, 0.117 secs]801.766: [JNI Weak
Reference, 0.180 secs] 8650M->7931M(20G), 17.4838808 secs]
   [Eden: 0.0B(8484.0M)->0.0B(9588.0M) Survivors: 4096.0K->0.0B Heap: