tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait URL: https://github.com/apache/spark/pull/27943#issuecomment-602733405 ok thanks for the description. So are you only trying to fail faster when only fetching blocks for the same task? Or Also across other tasks fetching from same shuffle server? numConnectionsPerPeer = 1 by default so by default you get a client pool of size 1 but its configurable so it needs to be able to handle the case its bigger. Yeah I think if you just have something per clientPool it would handle that. I also wonder if it shouldn't be like 95% of conf.ioRetryWaitTimeMs() or something to make sure there isn't race condition in when the first one actually tries again. I would think there is enough delay in retrying to not hit that exactly but might be better to be safe. The only other question I have is connections not going through the retryingblockfetcher, this could potentially fail them much faster, if its a one time fetch is that what we want. I would need to look a bit more at the usages there.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org