tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection 
while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602733405
 
 
   ok thanks for the description. So are you only trying to fail faster when 
only fetching blocks for the same task?  Or Also across other tasks fetching 
from same shuffle server?
   
   numConnectionsPerPeer = 1 by default so by default you get a client pool of 
size 1 but its configurable so it needs to be able to handle the case its 
bigger.  Yeah I think if you just have something per clientPool it would handle 
that. 
   
   I also wonder if it shouldn't be like 95% of conf.ioRetryWaitTimeMs() or 
something to make sure there isn't race condition in when the first one 
actually tries again. I would think there is enough delay in retrying to not 
hit that exactly but might be better to be safe.
   
   The only other question I have is connections not going through the 
retryingblockfetcher, this could potentially fail them much faster, if its a 
one time fetch is that what we want. I would need to look a bit more at the 
usages there.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to