Networking issues with Spark on EC2
Hi, I am using Spark 1.2 and facing network related issues while performing simple computations. This is a custom cluster set up using ec2 machines and spark prebuilt binary from apache site. The problem is only when we have workers on other machines(networking involved). Having a single node for the master and the slave works correctly. The error log from slave node is attached below. It is reading textFile from local FS(copied each node) and counting it. The first 30 tasks get completed within 5 seconds. Then, it takes several minutes to complete another 10 tasks and eventually dies. Sometimes, one of the workers completes all the tasks assigned to it. Different workers have different behavior at different times(non-deterministic). Is it related to something specific to EC2? 15/09/24 13:04:40 INFO Executor: Running task 117.0 in stage 0.0 (TID 117) 15/09/24 13:04:41 INFO TorrentBroadcast: Started reading broadcast variable 1 15/09/24 13:04:41 INFO SendingConnection: Initiating connection to [master_ip:56305] 15/09/24 13:04:41 INFO SendingConnection: Connected to [master_ip/master_ip_address:56305], 1 messages pending 15/09/24 13:05:41 INFO TorrentBroadcast: Started reading broadcast variable 1 15/09/24 13:05:41 ERROR Executor: Exception in task 77.0 in stage 0.0 (TID 77) java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) at scala.Option.foreach(Option.scala:236) at org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) at java.lang.Thread.run(Thread.java:745) 15/09/24 13:05:41 INFO CoarseGrainedExecutorBackend: Got assigned task 122 15/09/24 13:05:41 INFO Executor: Running task 3.1 in stage 0.0 (TID 122) 15/09/24 13:06:41 ERROR Executor: Exception in task 113.0 in stage 0.0 (TID 113) java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) at scala.Option.foreach(Option.scala:236) at org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) at java.lang.Thread.run(Thread.java:745) 15/09/24 13:06:41 INFO TorrentBroadcast: Started reading broadcast variable 1 15/09/24 13:06:41 INFO SendingConnection: Initiating connection to [master_ip/master_ip_address:44427] 15/09/24 13:06:41 INFO SendingConnection: Connected to [master_ip/master_ip_address:44427], 1 messages pending 15/09/24 13:07:41 ERROR Executor: Exception in task 37.0 in stage 0.0 (TID 37) java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) at scala.Option.foreach(Option.scala:236) at org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) at java.lang.Thread.run(Thread.java:745) I checked the network speed between the master and the slave and it is able to scp large files at a speed of 60 MB/s. Any leads on how this can be fixed? Thanks and Regards, Suraj Sheth
Re: Networking issues with Spark on EC2
Hi Suraj, Spark uses a lot of ports to communicate between nodes. Probably your security group is restrictive and does not allow instances to communicate on all networks. The easiest way to resolve it is to add a Rule to allow all Inbound traffic on all ports (0-65535) to instances in same security group like this. All TCP TCP 0 - 65535 your security group Hope this helps!! Thanks Ankur On Thu, Sep 24, 2015 at 7:09 AM SURAJ SHETH wrote: > Hi, > > I am using Spark 1.2 and facing network related issues while performing > simple computations. > > This is a custom cluster set up using ec2 machines and spark prebuilt > binary from apache site. The problem is only when we have workers on other > machines(networking involved). Having a single node for the master and the > slave works correctly. > > The error log from slave node is attached below. It is reading textFile > from local FS(copied each node) and counting it. The first 30 tasks get > completed within 5 seconds. Then, it takes several minutes to complete > another 10 tasks and eventually dies. > > Sometimes, one of the workers completes all the tasks assigned to it. > Different workers have different behavior at different > times(non-deterministic). > > Is it related to something specific to EC2? > > > > 15/09/24 13:04:40 INFO Executor: Running task 117.0 in stage 0.0 (TID 117) > > 15/09/24 13:04:41 INFO TorrentBroadcast: Started reading broadcast > variable 1 > > 15/09/24 13:04:41 INFO SendingConnection: Initiating connection to > [master_ip:56305] > > 15/09/24 13:04:41 INFO SendingConnection: Connected to > [master_ip/master_ip_address:56305], 1 messages pending > > 15/09/24 13:05:41 INFO TorrentBroadcast: Started reading broadcast > variable 1 > > 15/09/24 13:05:41 ERROR Executor: Exception in task 77.0 in stage 0.0 (TID > 77) > > java.io.IOException: sendMessageReliably failed because ack was not > received within 60 sec > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) > > at scala.Option.foreach(Option.scala:236) > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) > > at > io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) > > at > io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) > > at > io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) > > at java.lang.Thread.run(Thread.java:745) > > 15/09/24 13:05:41 INFO CoarseGrainedExecutorBackend: Got assigned task 122 > > 15/09/24 13:05:41 INFO Executor: Running task 3.1 in stage 0.0 (TID 122) > > 15/09/24 13:06:41 ERROR Executor: Exception in task 113.0 in stage 0.0 > (TID 113) > > java.io.IOException: sendMessageReliably failed because ack was not > received within 60 sec > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) > > at scala.Option.foreach(Option.scala:236) > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) > > at > io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) > > at > io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) > > at > io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) > > at java.lang.Thread.run(Thread.java:745) > > 15/09/24 13:06:41 INFO TorrentBroadcast: Started reading broadcast > variable 1 > > 15/09/24 13:06:41 INFO SendingConnection: Initiating connection to > [master_ip/master_ip_address:44427] > > 15/09/24 13:06:41 INFO SendingConnection: Connected to > [master_ip/master_ip_address:44427], 1 messages pending > > 15/09/24 13:07:41 ERROR Executor: Exception in task 37.0 in stage 0.0 (TID > 37) > > java.io.IOException: sendMessageReliably failed because ack was not > received within 60 sec > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) > > at scala.Option.foreach(Option.scala:236) > > at > org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) > > at > io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) > > at > io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) > > at > io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) > > at java.lang.Thread.ru
Re: Networking issues with Spark on EC2
Hi Ankur, Thanks for the reply. This is already done. If I wait for a long amount of time(10 minutes), a few tasks get successful even on slave nodes. Sometime, a fraction of the tasks(20%) are completed on all the machines in the initial 5 seconds and then, it slows down drastically. Thanks and Regards, Suraj Sheth On Fri, Sep 25, 2015 at 2:10 AM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi Suraj, > > Spark uses a lot of ports to communicate between nodes. Probably your > security group is restrictive and does not allow instances to communicate > on all networks. The easiest way to resolve it is to add a Rule to allow > all Inbound traffic on all ports (0-65535) to instances in same security > group like this. > > All TCP > TCP > 0 - 65535 > your security group > > Hope this helps!! > > Thanks > Ankur > > On Thu, Sep 24, 2015 at 7:09 AM SURAJ SHETH wrote: > >> Hi, >> >> I am using Spark 1.2 and facing network related issues while performing >> simple computations. >> >> This is a custom cluster set up using ec2 machines and spark prebuilt >> binary from apache site. The problem is only when we have workers on other >> machines(networking involved). Having a single node for the master and the >> slave works correctly. >> >> The error log from slave node is attached below. It is reading textFile >> from local FS(copied each node) and counting it. The first 30 tasks get >> completed within 5 seconds. Then, it takes several minutes to complete >> another 10 tasks and eventually dies. >> >> Sometimes, one of the workers completes all the tasks assigned to it. >> Different workers have different behavior at different >> times(non-deterministic). >> >> Is it related to something specific to EC2? >> >> >> >> 15/09/24 13:04:40 INFO Executor: Running task 117.0 in stage 0.0 (TID 117) >> >> 15/09/24 13:04:41 INFO TorrentBroadcast: Started reading broadcast >> variable 1 >> >> 15/09/24 13:04:41 INFO SendingConnection: Initiating connection to >> [master_ip:56305] >> >> 15/09/24 13:04:41 INFO SendingConnection: Connected to >> [master_ip/master_ip_address:56305], 1 messages pending >> >> 15/09/24 13:05:41 INFO TorrentBroadcast: Started reading broadcast >> variable 1 >> >> 15/09/24 13:05:41 ERROR Executor: Exception in task 77.0 in stage 0.0 >> (TID 77) >> >> java.io.IOException: sendMessageReliably failed because ack was not >> received within 60 sec >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) >> >> at scala.Option.foreach(Option.scala:236) >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) >> >> at >> io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) >> >> at >> io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) >> >> at >> io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) >> >> at java.lang.Thread.run(Thread.java:745) >> >> 15/09/24 13:05:41 INFO CoarseGrainedExecutorBackend: Got assigned task 122 >> >> 15/09/24 13:05:41 INFO Executor: Running task 3.1 in stage 0.0 (TID 122) >> >> 15/09/24 13:06:41 ERROR Executor: Exception in task 113.0 in stage 0.0 >> (TID 113) >> >> java.io.IOException: sendMessageReliably failed because ack was not >> received within 60 sec >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) >> >> at scala.Option.foreach(Option.scala:236) >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) >> >> at >> io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) >> >> at >> io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) >> >> at >> io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) >> >> at java.lang.Thread.run(Thread.java:745) >> >> 15/09/24 13:06:41 INFO TorrentBroadcast: Started reading broadcast >> variable 1 >> >> 15/09/24 13:06:41 INFO SendingConnection: Initiating connection to >> [master_ip/master_ip_address:44427] >> >> 15/09/24 13:06:41 INFO SendingConnection: Connected to >> [master_ip/master_ip_address:44427], 1 messages pending >> >> 15/09/24 13:07:41 ERROR Executor: Exception in task 37.0 in stage 0.0 >> (TID 37) >> >> java.io.IOException: sendMessageReliably failed because ack was not >> received within 60 sec >> >> at >> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) >> >> at >> org.apache.spark.network.nio.Connecti
Re: Networking issues with Spark on EC2
Hi, Are you using EMR ? Natu On Sat, Sep 26, 2015 at 6:55 AM, SURAJ SHETH wrote: > Hi Ankur, > Thanks for the reply. > This is already done. > If I wait for a long amount of time(10 minutes), a few tasks get > successful even on slave nodes. Sometime, a fraction of the tasks(20%) are > completed on all the machines in the initial 5 seconds and then, it slows > down drastically. > > Thanks and Regards, > Suraj Sheth > > On Fri, Sep 25, 2015 at 2:10 AM, Ankur Srivastava < > ankur.srivast...@gmail.com> wrote: > >> Hi Suraj, >> >> Spark uses a lot of ports to communicate between nodes. Probably your >> security group is restrictive and does not allow instances to communicate >> on all networks. The easiest way to resolve it is to add a Rule to allow >> all Inbound traffic on all ports (0-65535) to instances in same security >> group like this. >> >> All TCP >> TCP >> 0 - 65535 >> your security group >> >> Hope this helps!! >> >> Thanks >> Ankur >> >> On Thu, Sep 24, 2015 at 7:09 AM SURAJ SHETH wrote: >> >>> Hi, >>> >>> I am using Spark 1.2 and facing network related issues while performing >>> simple computations. >>> >>> This is a custom cluster set up using ec2 machines and spark prebuilt >>> binary from apache site. The problem is only when we have workers on other >>> machines(networking involved). Having a single node for the master and the >>> slave works correctly. >>> >>> The error log from slave node is attached below. It is reading textFile >>> from local FS(copied each node) and counting it. The first 30 tasks get >>> completed within 5 seconds. Then, it takes several minutes to complete >>> another 10 tasks and eventually dies. >>> >>> Sometimes, one of the workers completes all the tasks assigned to it. >>> Different workers have different behavior at different >>> times(non-deterministic). >>> >>> Is it related to something specific to EC2? >>> >>> >>> >>> 15/09/24 13:04:40 INFO Executor: Running task 117.0 in stage 0.0 (TID >>> 117) >>> >>> 15/09/24 13:04:41 INFO TorrentBroadcast: Started reading broadcast >>> variable 1 >>> >>> 15/09/24 13:04:41 INFO SendingConnection: Initiating connection to >>> [master_ip:56305] >>> >>> 15/09/24 13:04:41 INFO SendingConnection: Connected to >>> [master_ip/master_ip_address:56305], 1 messages pending >>> >>> 15/09/24 13:05:41 INFO TorrentBroadcast: Started reading broadcast >>> variable 1 >>> >>> 15/09/24 13:05:41 ERROR Executor: Exception in task 77.0 in stage 0.0 >>> (TID 77) >>> >>> java.io.IOException: sendMessageReliably failed because ack was not >>> received within 60 sec >>> >>> at >>> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) >>> >>> at >>> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) >>> >>> at scala.Option.foreach(Option.scala:236) >>> >>> at >>> org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) >>> >>> at >>> io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) >>> >>> at >>> io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) >>> >>> at >>> io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> 15/09/24 13:05:41 INFO CoarseGrainedExecutorBackend: Got assigned task >>> 122 >>> >>> 15/09/24 13:05:41 INFO Executor: Running task 3.1 in stage 0.0 (TID 122) >>> >>> 15/09/24 13:06:41 ERROR Executor: Exception in task 113.0 in stage 0.0 >>> (TID 113) >>> >>> java.io.IOException: sendMessageReliably failed because ack was not >>> received within 60 sec >>> >>> at >>> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) >>> >>> at >>> org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) >>> >>> at scala.Option.foreach(Option.scala:236) >>> >>> at >>> org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) >>> >>> at >>> io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) >>> >>> at >>> io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) >>> >>> at >>> io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> 15/09/24 13:06:41 INFO TorrentBroadcast: Started reading broadcast >>> variable 1 >>> >>> 15/09/24 13:06:41 INFO SendingConnection: Initiating connection to >>> [master_ip/master_ip_address:44427] >>> >>> 15/09/24 13:06:41 INFO SendingConnection: Connected to >>> [master_ip/master_ip_address:44427], 1 messages pending >>> >>> 15/09/24 13:07:41 ERROR Executor: Exception in task 37.0 in stage 0.0 >>> (TID 37) >>> >>> java.io.IOException: sendMess
Re: Networking issues with Spark on EC2
Hi, Nopes. I was trying to use EC2(due to a few constraints) where I faced the problem. With EMR, it works flawlessly. But, I would like to go back and use EC2 if I can fix this issue. Has anybody set up a spark cluster using plain EC2 machines. What steps did you follow? Thanks and Regards, Suraj Sheth On Sat, Sep 26, 2015 at 10:36 AM, Natu Lauchande wrote: > Hi, > > Are you using EMR ? > > Natu > > On Sat, Sep 26, 2015 at 6:55 AM, SURAJ SHETH wrote: > >> Hi Ankur, >> Thanks for the reply. >> This is already done. >> If I wait for a long amount of time(10 minutes), a few tasks get >> successful even on slave nodes. Sometime, a fraction of the tasks(20%) are >> completed on all the machines in the initial 5 seconds and then, it slows >> down drastically. >> >> Thanks and Regards, >> Suraj Sheth >> >> On Fri, Sep 25, 2015 at 2:10 AM, Ankur Srivastava < >> ankur.srivast...@gmail.com> wrote: >> >>> Hi Suraj, >>> >>> Spark uses a lot of ports to communicate between nodes. Probably your >>> security group is restrictive and does not allow instances to communicate >>> on all networks. The easiest way to resolve it is to add a Rule to allow >>> all Inbound traffic on all ports (0-65535) to instances in same >>> security group like this. >>> >>> All TCP >>> TCP >>> 0 - 65535 >>> your security group >>> >>> Hope this helps!! >>> >>> Thanks >>> Ankur >>> >>> On Thu, Sep 24, 2015 at 7:09 AM SURAJ SHETH wrote: >>> Hi, I am using Spark 1.2 and facing network related issues while performing simple computations. This is a custom cluster set up using ec2 machines and spark prebuilt binary from apache site. The problem is only when we have workers on other machines(networking involved). Having a single node for the master and the slave works correctly. The error log from slave node is attached below. It is reading textFile from local FS(copied each node) and counting it. The first 30 tasks get completed within 5 seconds. Then, it takes several minutes to complete another 10 tasks and eventually dies. Sometimes, one of the workers completes all the tasks assigned to it. Different workers have different behavior at different times(non-deterministic). Is it related to something specific to EC2? 15/09/24 13:04:40 INFO Executor: Running task 117.0 in stage 0.0 (TID 117) 15/09/24 13:04:41 INFO TorrentBroadcast: Started reading broadcast variable 1 15/09/24 13:04:41 INFO SendingConnection: Initiating connection to [master_ip:56305] 15/09/24 13:04:41 INFO SendingConnection: Connected to [master_ip/master_ip_address:56305], 1 messages pending 15/09/24 13:05:41 INFO TorrentBroadcast: Started reading broadcast variable 1 15/09/24 13:05:41 ERROR Executor: Exception in task 77.0 in stage 0.0 (TID 77) java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) at scala.Option.foreach(Option.scala:236) at org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) at java.lang.Thread.run(Thread.java:745) 15/09/24 13:05:41 INFO CoarseGrainedExecutorBackend: Got assigned task 122 15/09/24 13:05:41 INFO Executor: Running task 3.1 in stage 0.0 (TID 122) 15/09/24 13:06:41 ERROR Executor: Exception in task 113.0 in stage 0.0 (TID 113) java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:918) at org.apache.spark.network.nio.ConnectionManager$$anon$13$$anonfun$run$19.apply(ConnectionManager.scala:917) at scala.Option.foreach(Option.scala:236) at org.apache.spark.network.nio.ConnectionManager$$anon$13.run(ConnectionManager.scala:917) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) >