[ https://issues.apache.org/jira/browse/SPARK-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-704: ---------------------------- Component/s: Spark Core > ConnectionManager sometimes cannot detect loss of sending connections > --------------------------------------------------------------------- > > Key: SPARK-704 > URL: https://issues.apache.org/jira/browse/SPARK-704 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Charles Reiss > Assignee: Henry Saputra > > ConnectionManager currently does not detect when SendingConnections > disconnect except if it is trying to send through them. As a result, a node > failure just after a connection is initiated but before any acknowledgement > messages can be sent may result in a hang. > ConnectionManager has code intended to detect this case by detecting the > failure of a corresponding ReceivingConnection, but this code assumes that > the remote host:port of the ReceivingConnection is the same as the > ConnectionManagerId, which is almost never true. Additionally, there does not > appear to be any reason to assume a corresponding ReceivingConnection will > exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org