I've worked around this by setting
spark.shuffle.io.connectionTimeout=3600s, uploading the spark tarball to
HDFS again and restarting the shuffle service (not 100% sure that last
step is needed).
I attempted (with my newbie Scala skills) to allow
ExternalShuffleClient() to accept an optional closeIdleConnections
parameter (defaulting to "true") so that the MesosExternalShuffleClient
can set this to "false". I then passsed this into the TransportContext
call. However this didn't seem to work (maybe it's using the config
from HDFS not the local spark (which I thought the Driver used).
Anyhow I'll do more testing and then raise a JIRA.
Adrian
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>|
LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________