[jira] [Commented] (SPARK-4996) Memory leak?

2015-01-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285255#comment-14285255
 ] 

Patrick Wendell commented on SPARK-4996:


I'm de-escalating this right now because it's not clear what the actual issue 
is.

 Memory leak?
 

 Key: SPARK-4996
 URL: https://issues.apache.org/jira/browse/SPARK-4996
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: uncleGen

 When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, 
 everything is OK In spark 1.1.1 with the same resource setting. And, when I 
 increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job 
 can complete successfully. The above two job are running with default spark 
 configurations. Following is the detailed log.
 {code}
 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as 
 failed. Exit status: 143. Diagnostics: Container is running beyond physical 
 memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB 
 of 23.1 GB virtual memory used. Killing container.
 {code}
 {code}
  Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index
  (No such file or directory)
   at java.io.FileInputStream.open(Native Method)
   at java.io.FileInputStream.init(FileInputStream.java:120)
   at 
 org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
   at 
 org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
   at 
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
   at 
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
   at 
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
   at 
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
   at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
   at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
   at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
   at java.lang.Thread.run(Thread.java:662)
   at 
 org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:156)
   at 
 

[jira] [Commented] (SPARK-4996) Memory leak?

2014-12-30 Thread uncleGen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261832#comment-14261832
 ] 

uncleGen commented on SPARK-4996:
-

[~srowen], it works when I increase the spark.yarn.executor.memoryOverhead 
slightly. Now, I give it 1024M with 10G executor memory. I am confused about 
that, Spark 1.1 just need 384M and work well, but spark 1.2 need much more. 
What the  extra memory was spent on?

 Memory leak?
 

 Key: SPARK-4996
 URL: https://issues.apache.org/jira/browse/SPARK-4996
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: uncleGen
Priority: Blocker

 When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, 
 everything is OK In spark 1.1.1 with the same resource setting. And, when I 
 increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job 
 can complete successfully. The above two job are running with default spark 
 configurations. Following is the detailed log.
 {code}
 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as 
 failed. Exit status: 143. Diagnostics: Container is running beyond physical 
 memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB 
 of 23.1 GB virtual memory used. Killing container.
 {code}
 {code}
  Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index
  (No such file or directory)
   at java.io.FileInputStream.open(Native Method)
   at java.io.FileInputStream.init(FileInputStream.java:120)
   at 
 org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
   at 
 org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
   at 
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
   at 
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
   at 
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
   at 
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
   at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
   at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
   at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
   at 

[jira] [Commented] (SPARK-4996) Memory leak?

2014-12-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260186#comment-14260186
 ] 

Sean Owen commented on SPARK-4996:
--



What this error really says is that Spark thought it could use more memory than 
YARN thinks it can, which is *not* the same as saying Spark needs this much 
memory. This is affected by things like the YARN container overhead added to 
the YARN resource requirements. That changed in 1.2. You may need to just give 
more overhead. See http://spark.apache.org/docs/latest/running-on-yarn.html and 
spark.yarn.executor.memoryOverhead . Have you tried increasing this a little?

You may not need more memory, just more cushion.

Otherwise, have you analyzed the heap to see what the heap is filled with? 
That's what you'd have to do to find a memory leak, but that doesn't seem 
likely given the error you see.

 Memory leak?
 

 Key: SPARK-4996
 URL: https://issues.apache.org/jira/browse/SPARK-4996
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: uncleGen
Priority: Blocker

 When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, 
 everything is OK In spark 1.1.1 with the same resource setting. And, when I 
 increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job 
 can complete successfully. The above two job are running with default spark 
 configurations. Following is the detailed log.
 {code}
 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as 
 failed. Exit status: 143. Diagnostics: Container is running beyond physical 
 memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB 
 of 23.1 GB virtual memory used. Killing container.
 {code}
 {code}
  Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index
  (No such file or directory)
   at java.io.FileInputStream.open(Native Method)
   at java.io.FileInputStream.init(FileInputStream.java:120)
   at 
 org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
   at 
 org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at 
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
   at 
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
   at 
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
   at 
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
   at 
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
   at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
   at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
   at