Re: Heap Memory in Spark 2.3.0
perhaps this is https://issues.apache.org/jira/browse/SPARK-24578? that was reported as a performance issue, not OOMs, but its in the exact same part of the code and the change was to reduce the memory pressure significantly. On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey wrote: > Hello. > > I am working to move our system from Spark 2.1.0 to Spark 2.3.0. Our > system is running on Spark managed via Yarn. During the course of the move > I mirrored the settings to our new cluster. However, on the Spark 2.3.0 > cluster with the same resource allocation I am seeing a number of executors > die due to OOM: > > 18/07/16 17:23:06 ERROR YarnClusterScheduler: Lost executor 5 on wn80: > Container killed by YARN for exceeding memory limits. 22.0 GB of 22 GB > physical memory used. Consider boosting spark.yarn.executor.memoryOver > head. > > I increased spark.driver.memoryOverhead and spark.executor.memoryOverhead > from the default (384) to 2048. I went ahead and disabled vmem and pmem > Yarn checks on the cluster. With that disabled I see the following error: > > Caused by: java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > io.netty.buffer.CompositeByteBuf.nioBuffer(CompositeByteBuf.java:1466) > at io.netty.buffer.AbstractByteBuf.nioBuffer(AbstractByteBuf.java:1203) > at > org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:140) > at > org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123) > at > io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355) > at > io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224) > at > io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) > at > io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) > at > io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814) > at > io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) > at > io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831) > at > io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1041) > at > io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:300) > at > org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:222) > at > org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:146) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) > > > > Looking at GC: > >[Eden: 16.0M(8512.0M)->0.0B(8484.0M) Survivors: 4096.0K->4096.0K Heap: > 8996.7M(20.0G)->8650.3M(20.0G)] > [Times: user=0.03 sys=0.01, real=0.01 secs] > 794.949: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > allocation request failed, allocation request: 401255000 bytes] > 794.949: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion > amount: 401255000 bytes, attempted expansion amount: 402653184 bytes] > 794.949: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap > already fully expanded] >
Heap Memory in Spark 2.3.0
Hello. I am working to move our system from Spark 2.1.0 to Spark 2.3.0. Our system is running on Spark managed via Yarn. During the course of the move I mirrored the settings to our new cluster. However, on the Spark 2.3.0 cluster with the same resource allocation I am seeing a number of executors die due to OOM: 18/07/16 17:23:06 ERROR YarnClusterScheduler: Lost executor 5 on wn80: Container killed by YARN for exceeding memory limits. 22.0 GB of 22 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. I increased spark.driver.memoryOverhead and spark.executor.memoryOverhead from the default (384) to 2048. I went ahead and disabled vmem and pmem Yarn checks on the cluster. With that disabled I see the following error: Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at io.netty.buffer.CompositeByteBuf.nioBuffer(CompositeByteBuf.java:1466) at io.netty.buffer.AbstractByteBuf.nioBuffer(AbstractByteBuf.java:1203) at org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:140) at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123) at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355) at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224) at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814) at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831) at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1041) at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:300) at org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:222) at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:146) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) Looking at GC: [Eden: 16.0M(8512.0M)->0.0B(8484.0M) Survivors: 4096.0K->4096.0K Heap: 8996.7M(20.0G)->8650.3M(20.0G)] [Times: user=0.03 sys=0.01, real=0.01 secs] 794.949: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 401255000 bytes] 794.949: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 401255000 bytes, attempted expansion amount: 402653184 bytes] 794.949: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded] 794.949: [Full GC (Allocation Failure) 801.766: [SoftReference, 0 refs, 0.359 secs]801.766: [WeakReference, 1604 refs, 0.0001191 secs]801.766: [FinalReference, 1180 refs, 0.882 secs]801.766: [PhantomReference, 0 refs, 12 refs, 0.117 secs]801.766: [JNI Weak Reference, 0.180 secs] 8650M->7931M(20G), 17.4838808 secs] [Eden: 0.0B(8484.0M)->0.0B(9588.0M) Survivors: 4096.0K->0.0B Heap: