turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait URL: https://github.com/apache/spark/pull/27943#issuecomment-601209554 Just attach the example mentioned in the description. > For example: there are two request connection, rc1 and rc2. Especially, the io.numConnectionsPerPeer is 1 and connection timeout is 2 minutes. 1: rc1 hold the client lock, it timeout after 2 minutes. 2: rc2 hold the client lock, it timeout after 2 minutes. 3: rc1 start the second retry, hold lock and timeout after 2 minutes. 4: rc2 start the second retry, hold lock and timeout after 2 minutes. 5: rc1 start the third retry, hold lock and timeout after 2 minutes. 6: rc2 start the third retry, hold lock and timeout after 2 minutes. It wastes lots of time. The concern is that, for some case, these request connections block each other. If the rc1 connect timeout, then we fast *break* the first retry of rc2 but don't increase the retry count of rc2. Then rc1 will wait a io retry wait, and then start the second retry and connect timeout. Then we fast break the rc2 and don't increase its retry count. Then rc1 will wait a io retry wait, and then start the third retry and connect timeout, then rc1 throw fetch failed exception. I think it is better than the request connections block each other.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org