[jira] [Commented] (SPARK-17503) Memory leak in Memory store when unable to cache the whole RDD

2016-09-12 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483389#comment-15483389
 ] 

Sean Zhong commented on SPARK-17503:


[~sowen] I have modified the title to mean "cache in memory"

> Memory leak in Memory store when unable to cache the whole RDD
> --
>
> Key: SPARK-17503
> URL: https://issues.apache.org/jira/browse/SPARK-17503
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0, 2.1.0
>Reporter: Sean Zhong
>
> h2.Problem description:
> The following query triggers out of memory error.  
> {code}
> sc.parallelize(1 to 1000, 5).map(new Array[Long](1000)).cache().count
> {code}
> This is not expected, we should fallback to use disk instead if there is not 
> enough memory for cache.
> Stacktrace:
> {code}
> scala> sc.parallelize(1 to 1000, 5).map(f).cache().count
> [Stage 0:>  (0 + 5) / 
> 5]16/09/11 17:27:20 WARN MemoryStore: Not enough space to cache rdd_1_4 in 
> memory! (computed 631.5 MB so far)
> 16/09/11 17:27:20 WARN MemoryStore: Not enough space to cache rdd_1_0 in 
> memory! (computed 631.5 MB so far)
> 16/09/11 17:27:20 WARN BlockManager: Putting block rdd_1_0 failed
> 16/09/11 17:27:20 WARN BlockManager: Putting block rdd_1_4 failed
> 16/09/11 17:27:21 WARN MemoryStore: Not enough space to cache rdd_1_1 in 
> memory! (computed 947.3 MB so far)
> 16/09/11 17:27:21 WARN BlockManager: Putting block rdd_1_1 failed
> 16/09/11 17:27:22 WARN MemoryStore: Not enough space to cache rdd_1_3 in 
> memory! (computed 1423.7 MB so far)
> 16/09/11 17:27:22 WARN BlockManager: Putting block rdd_1_3 failed
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid26528.hprof ...
> Heap dump file created [6551021666 bytes in 9.876 secs]
> 16/09/11 17:28:15 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(false)
> 16/09/11 17:28:15 WARN NettyRpcEndpointRef: Error sending message [message = 
> Heartbeat(driver,[Lscala.Tuple2;@46c9ce96,BlockManagerId(driver, 127.0.0.1, 
> 55360))] in 1 attempts
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 
> seconds]. This timeout is controlled by spark.executor.heartbeatInterval
>   at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:523)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:552)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:552)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:552)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
>   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:552)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 
> seconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:190)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
>   ... 14 more
> 16/09/11 17:28:15 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
> java.lang.OutOfMemoryError: Java heap space
>   at 

[jira] [Commented] (SPARK-17503) Memory leak in Memory store when unable to cache the whole RDD

2016-09-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483381#comment-15483381
 ] 

Apache Spark commented on SPARK-17503:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/15056

> Memory leak in Memory store when unable to cache the whole RDD
> --
>
> Key: SPARK-17503
> URL: https://issues.apache.org/jira/browse/SPARK-17503
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0, 2.1.0
>Reporter: Sean Zhong
>
> h2.Problem description:
> The following query triggers out of memory error.  
> {code}
> sc.parallelize(1 to 1000, 5).map(new Array[Long](1000)).cache().count
> {code}
> This is not expected, we should fallback to use disk instead if there is not 
> enough memory for cache.
> Stacktrace:
> {code}
> scala> sc.parallelize(1 to 1000, 5).map(f).cache().count
> [Stage 0:>  (0 + 5) / 
> 5]16/09/11 17:27:20 WARN MemoryStore: Not enough space to cache rdd_1_4 in 
> memory! (computed 631.5 MB so far)
> 16/09/11 17:27:20 WARN MemoryStore: Not enough space to cache rdd_1_0 in 
> memory! (computed 631.5 MB so far)
> 16/09/11 17:27:20 WARN BlockManager: Putting block rdd_1_0 failed
> 16/09/11 17:27:20 WARN BlockManager: Putting block rdd_1_4 failed
> 16/09/11 17:27:21 WARN MemoryStore: Not enough space to cache rdd_1_1 in 
> memory! (computed 947.3 MB so far)
> 16/09/11 17:27:21 WARN BlockManager: Putting block rdd_1_1 failed
> 16/09/11 17:27:22 WARN MemoryStore: Not enough space to cache rdd_1_3 in 
> memory! (computed 1423.7 MB so far)
> 16/09/11 17:27:22 WARN BlockManager: Putting block rdd_1_3 failed
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid26528.hprof ...
> Heap dump file created [6551021666 bytes in 9.876 secs]
> 16/09/11 17:28:15 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(false)
> 16/09/11 17:28:15 WARN NettyRpcEndpointRef: Error sending message [message = 
> Heartbeat(driver,[Lscala.Tuple2;@46c9ce96,BlockManagerId(driver, 127.0.0.1, 
> 55360))] in 1 attempts
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 
> seconds]. This timeout is controlled by spark.executor.heartbeatInterval
>   at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:523)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:552)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:552)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:552)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
>   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:552)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 
> seconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:190)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
>   ... 14 more
> 16/09/11 17:28:15 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
> java.lang.

[jira] [Commented] (SPARK-17503) Memory leak in Memory store when unable to cache the whole RDD

2016-09-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483326#comment-15483326
 ] 

Sean Owen commented on SPARK-17503:
---

cache() means "cache in memory" only. There is a persist() call for other 
levels, like storing on disk only. I am not sure what you mean then?

> Memory leak in Memory store when unable to cache the whole RDD
> --
>
> Key: SPARK-17503
> URL: https://issues.apache.org/jira/browse/SPARK-17503
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0, 2.1.0
>Reporter: Sean Zhong
>
> h2.Problem description:
> The following query triggers out of memory error.  
> {code}
> sc.parallelize(1 to 1000, 5).map(new Array[Long](1000)).cache().count
> {code}
> This is not expected, we should fallback to use disk instead if there is not 
> enough memory for cache.
> Stacktrace:
> {code}
> scala> sc.parallelize(1 to 1000, 5).map(f).cache().count
> [Stage 0:>  (0 + 5) / 
> 5]16/09/11 17:27:20 WARN MemoryStore: Not enough space to cache rdd_1_4 in 
> memory! (computed 631.5 MB so far)
> 16/09/11 17:27:20 WARN MemoryStore: Not enough space to cache rdd_1_0 in 
> memory! (computed 631.5 MB so far)
> 16/09/11 17:27:20 WARN BlockManager: Putting block rdd_1_0 failed
> 16/09/11 17:27:20 WARN BlockManager: Putting block rdd_1_4 failed
> 16/09/11 17:27:21 WARN MemoryStore: Not enough space to cache rdd_1_1 in 
> memory! (computed 947.3 MB so far)
> 16/09/11 17:27:21 WARN BlockManager: Putting block rdd_1_1 failed
> 16/09/11 17:27:22 WARN MemoryStore: Not enough space to cache rdd_1_3 in 
> memory! (computed 1423.7 MB so far)
> 16/09/11 17:27:22 WARN BlockManager: Putting block rdd_1_3 failed
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid26528.hprof ...
> Heap dump file created [6551021666 bytes in 9.876 secs]
> 16/09/11 17:28:15 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(false)
> 16/09/11 17:28:15 WARN NettyRpcEndpointRef: Error sending message [message = 
> Heartbeat(driver,[Lscala.Tuple2;@46c9ce96,BlockManagerId(driver, 127.0.0.1, 
> 55360))] in 1 attempts
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 
> seconds]. This timeout is controlled by spark.executor.heartbeatInterval
>   at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:523)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:552)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:552)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:552)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
>   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:552)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 
> seconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:190)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
>   ... 14 more
> 16/09/11 17:28:15 ERROR Executor: Exception in task 3