turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection 
while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601209554
 
 
   Just attach the example mentioned in the description.
   > For example: there are two request connection, rc1 and rc2.
   Especially, the io.numConnectionsPerPeer is 1 and connection timeout is 2 
minutes.
   1: rc1 hold the client lock, it timeout after 2 minutes.
   2: rc2 hold the client lock, it timeout after 2 minutes.
   3: rc1 start the second retry, hold lock and timeout after 2 minutes.
   4: rc2 start the second retry, hold lock and timeout after 2 minutes.
   5: rc1 start the third retry, hold lock and timeout after 2 minutes.
   6: rc2 start the third retry, hold lock and timeout after 2 minutes.
   It wastes lots of time.
   
   The concern is that, for some case, these request connections block each 
other.
   If the rc1 connect timeout, then we fast *break* the first retry of rc2 but 
don't increase the retry count of rc2.
   Then rc1 will wait a io retry wait, and then start the second retry and 
connect timeout.
   Then we fast break the rc2 and don't increase its retry count.
   Then rc1 will wait a io retry wait, and then start the third retry and 
connect timeout, then rc1 throw fetch failed exception.
   
   
   I think it is better than the request connections block each other.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to