[ https://issues.apache.org/jira/browse/SPARK-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025711#comment-15025711 ]
Josh Rosen commented on SPARK-9328: ----------------------------------- Actually, I spoke slightly too soon: there were some timeouts that had to be lowered in order for the master branch test to pass (my test was originally created for Spark 1.2.x for a backport). It looks like SPARK-7003 has addressed this for Spark 1.4.x+, so I'm going to resolve this as fixed in 1.4.0+. > Netty IO layer should implement read timeouts > --------------------------------------------- > > Key: SPARK-9328 > URL: https://issues.apache.org/jira/browse/SPARK-9328 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core > Affects Versions: 1.2.1, 1.3.1 > Reporter: Josh Rosen > Priority: Blocker > Fix For: 1.4.0 > > > Spark's network layer does not implement read timeouts which may lead to > stalls during shuffle: if a remote shuffle server stalls while responding to > a shuffle block fetch request but does not close the socket then the job may > block until an OS-level socket timeout occurs. > I think that we can fix this using Netty's ReadTimeoutHandler > (http://stackoverflow.com/questions/13390363/netty-connecttimeoutmillis-vs-readtimeouthandler). > The tricky part of working on this will be figuring out the right place to > add the handler and ensuring that we don't introduce performance issues by > not re-using sockets. > Quoting from that linked StackOverflow question: > {quote} > Note that the ReadTimeoutHandler is also unaware of whether you have sent a > request - it only cares whether data has been read from the socket. If your > connection is persistent, and you only want read timeouts to fire when a > request has been sent, you'll need to build a request / response aware > timeout handler. > {quote} > If we want to avoid tearing down connections between shuffles then we may > have to do something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org