Hello,

I get a lot of these exceptions on my mesos cluster when running spark jobs:

14/07/19 16:29:43 WARN spark.network.SendingConnection: Error finishing
connection to prd-atl-mesos-slave-010/10.88.160.200:37586
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.spark.network.SendingConnection.finishConnect(Connection.scala:318)
at
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/07/19 16:29:43 INFO spark.network.ConnectionManager: Handling connection
error on connection to ConnectionManagerId(prd-atl-mesos-slave-010,37586)
14/07/19 16:29:43 INFO spark.network.ConnectionManager: Removing
SendingConnection to ConnectionManagerId(prd-atl-mesos-slave-010,37586)
14/07/19 16:29:43 INFO spark.network.ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@4b0472b4
14/07/19 16:29:43 INFO spark.network.ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@1106ade6
14/07/19 16:29:43 ERROR
spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator: Could not get
block(s) from ConnectionManagerId(prd-atl-mesos-slave-010,37586)
14/07/19 16:29:43 ERROR
spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator: Could not get
block(s) from ConnectionManagerId(prd-atl-mesos-slave-010,37586)
14/07/19 16:29:43 WARN spark.network.SendingConnection: Error finishing
connection to prd-atl-mesos-slave-004/10.88.160.156:35446
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.spark.network.SendingConnection.finishConnect(Connection.scala:318)
at
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/07/19 16:29:43 INFO spark.network.ConnectionManager: Handling connection
error on connection to ConnectionManagerId(prd-atl-mesos-slave-004,35446)
14/07/19 16:29:43 INFO spark.network.ConnectionManager: Removing
SendingConnection to ConnectionManagerId(prd-atl-mesos-slave-004,35446)

I've tried bumping up the spark.akka.timeout, but it doesn't seem to have
much of an effect.

Has anyone else seen these? Is there a spark configuration option that I
should tune? Or perhaps some JVM properties that I should be setting on my
executors?

TIA

Reply via email to