I've worked around this by setting spark.shuffle.io.connectionTimeout=3600s, uploading the spark tarball to HDFS again and restarting the shuffle service (not 100% sure that last step is needed).

I attempted (with my newbie Scala skills) to allow ExternalShuffleClient() to accept an optional closeIdleConnections parameter (defaulting to "true") so that the MesosExternalShuffleClient can set this to "false". I then passsed this into the TransportContext call. However this didn't seem to work (maybe it's using the config from HDFS not the local spark (which I thought the Driver used).

Anyhow I'll do more testing and then raise a JIRA.

Adrian
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>| LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________

Reply via email to