[jira] [Commented] (SPARK-4996) Memory leak?
[ https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285255#comment-14285255 ] Patrick Wendell commented on SPARK-4996: I'm de-escalating this right now because it's not clear what the actual issue is. Memory leak? Key: SPARK-4996 URL: https://issues.apache.org/jira/browse/SPARK-4996 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: uncleGen When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, everything is OK In spark 1.1.1 with the same resource setting. And, when I increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job can complete successfully. The above two job are running with default spark configurations. Following is the detailed log. {code} 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as failed. Exit status: 143. Diagnostics: Container is running beyond physical memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB of 23.1 GB virtual memory used. Killing container. {code} {code} Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109) at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:662) at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:156) at
[jira] [Commented] (SPARK-4996) Memory leak?
[ https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261832#comment-14261832 ] uncleGen commented on SPARK-4996: - [~srowen], it works when I increase the spark.yarn.executor.memoryOverhead slightly. Now, I give it 1024M with 10G executor memory. I am confused about that, Spark 1.1 just need 384M and work well, but spark 1.2 need much more. What the extra memory was spent on? Memory leak? Key: SPARK-4996 URL: https://issues.apache.org/jira/browse/SPARK-4996 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: uncleGen Priority: Blocker When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, everything is OK In spark 1.1.1 with the same resource setting. And, when I increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job can complete successfully. The above two job are running with default spark configurations. Following is the detailed log. {code} 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as failed. Exit status: 143. Diagnostics: Container is running beyond physical memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB of 23.1 GB virtual memory used. Killing container. {code} {code} Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109) at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at
[jira] [Commented] (SPARK-4996) Memory leak?
[ https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260186#comment-14260186 ] Sean Owen commented on SPARK-4996: -- What this error really says is that Spark thought it could use more memory than YARN thinks it can, which is *not* the same as saying Spark needs this much memory. This is affected by things like the YARN container overhead added to the YARN resource requirements. That changed in 1.2. You may need to just give more overhead. See http://spark.apache.org/docs/latest/running-on-yarn.html and spark.yarn.executor.memoryOverhead . Have you tried increasing this a little? You may not need more memory, just more cushion. Otherwise, have you analyzed the heap to see what the heap is filled with? That's what you'd have to do to find a memory leak, but that doesn't seem likely given the error you see. Memory leak? Key: SPARK-4996 URL: https://issues.apache.org/jira/browse/SPARK-4996 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: uncleGen Priority: Blocker When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, everything is OK In spark 1.1.1 with the same resource setting. And, when I increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job can complete successfully. The above two job are running with default spark configurations. Following is the detailed log. {code} 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as failed. Exit status: 143. Diagnostics: Container is running beyond physical memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB of 23.1 GB virtual memory used. Killing container. {code} {code} Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109) at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at