Hi there, I'm using Spark Streaming 1.2.1 with actorStreams. Initially, all goes well.
15/03/30 15:37:00 INFO spark.storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.2 KB, free 1589.8 MB) 15/03/30 15:37:00 INFO spark.storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on <master>:54258 (size: 1771.0 B, free: 1589.8 MB) 15/03/30 15:37:04 INFO spark.storage.BlockManagerInfo: Added input-1-1427722624400 in memory on <worker-1>:40379 (size: 997.0 B, free: 2.1 GB) ... this works for a while (20-40 minutes) until suddenly: [WARN] [03/30/2015 17:16:03.497] [stream-aggregate-transactions-akka.remote.default-remote-dispatcher-6] [akka.tcp://stream-aggregate-transactions@master-ip:7080/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40worker-5%3A49144-4] Association with remote system [akka.tcp://sparkExecutor@worker-5:49144] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. [WARN] [03/30/2015 17:16:03.501] [stream-aggregate-transactions-akka.remote.default-remote-dispatcher-6] [akka.tcp://stream-aggregate-transactions@master-ip:7080/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40worker-2%3A47280-7] Association with remote system [akka.tcp://sparkExecutor@worker-2:47280] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/03/30 17:16:04 ERROR spark.scheduler.TaskSchedulerImpl: Lost executor 1 on worker-2: remote Akka client disassociated [WARN] [03/30/2015 17:16:09.427] [stream-aggregate-transactions-akka.remote.default-remote-dispatcher-6] [Remoting] Tried to associate with unreachable remote address [akka.tcp://sparkExecutor@worker-5:49144]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: worker-5/worker-5-ip:49144 15/03/30 17:16:09 INFO scheduler.cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@worker-5:37442/user/Executor#-2032935972] with ID 6 15/03/30 17:16:09 INFO spark.scheduler.TaskSetManager: Starting task 2.0 in stage 1983.1 (TID 118538, worker-5, NODE_LOCAL, 1393 bytes) 15/03/30 17:16:10 INFO spark.storage.BlockManagerMasterActor: Registering block manager worker-5:39945 with 2.1 GB RAM, BlockManagerId(6, worker-5, 39945) 15/03/30 17:16:14 INFO spark.scheduler.TaskSetManager: Starting task 15.0 in stage 1983.1 (TID 118541, worker-5, NODE_LOCAL, 1393 bytes) 15/03/30 17:16:14 WARN spark.scheduler.TaskSetManager: Lost task 6.0 in stage 1983.1 (TID 118539, <worker-5>): java.lang.Exception: Could not compute split, block input-2-1427728282000 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61) at org.apache.spark.rdd.RDD.iterator(RDD.scala:245) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) From there one it goes only downhill, foreachRDD(.saveAsParquet) is messing up my Parquet files (only contains _temporary directory). In the Streaming UI, often times one of the receivers will stay at 0 records, the executor now uses 0 memory. Batch interval is 30secs with 50-70 recods per receiver per batch. I have 5 actor streams (one per node) with 10 total cores assigned. Driver has 3 GB RAM, each worker 4 GB. There is certainly no memory pressure, "Memory Used" is around 100kb, "Input" is around 10 MB. Thanks for any pointers, - Marius --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org