turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection 
while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602625438
 
 
   Thanks for the reply @tgravescs 
   Sorry for the unclear description. 
   
   `All connections` I mentioned above is that the sent request connections to 
the same unreachable address.
   
   It is my mistake that does not recognize  that there maybe several clients 
for the same address, may be we need keep a lastConnectionFailedTime variable 
for one clientPool.
   
   The problem is that, for a task, there maybe several request connections to 
the same address.
   Specially, for a shuffle read task and there is only one client in the 
client pool and it would always been picked by the connections, which want to 
connect the same ESS.
   If this address is unreachable, these connections would block each 
other(during createClient).
   These connections owned to a same task and want to connect the same ESS,  if 
this ESS was still unreachable.
   It  would cost connectionNum \* connectionTimeOut \* maxRetry to retry, and 
then fail the task.
   It is ideally that this task could fail in connectionTimeOut \* maxRetry.
   
   
   
   
   
   
    
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to