[ https://issues.apache.org/jira/browse/SPARK-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214732#comment-15214732 ]
Miles Crawford edited comment on SPARK-14209 at 3/28/16 7:35 PM: ----------------------------------------------------------------- Yes, I am certainly caching. I did not realize that caching does not use the same external store. was (Author: milesc): Yes, I am certainly caching. > Application failure during preemption. > -------------------------------------- > > Key: SPARK-14209 > URL: https://issues.apache.org/jira/browse/SPARK-14209 > Project: Spark > Issue Type: Bug > Components: Block Manager > Affects Versions: 1.6.1 > Environment: Spark on YARN > Reporter: Miles Crawford > > We have a fair-sharing cluster set up, including the external shuffle > service. When a new job arrives, existing jobs are successfully preempted > down to fit. > A spate of these messages arrives: > ExecutorLostFailure (executor 48 exited unrelated to the running tasks) > Reason: Container container_1458935819920_0019_01_000143 on host: > ip-10-12-46-235.us-west-2.compute.internal was preempted. > This seems fine - the problem is that soon thereafter, our whole application > fails because it is unable to fetch blocks from the pre-empted containers: > org.apache.spark.storage.BlockFetchException: Failed to fetch block from 1 > locations. Most recent failure cause: > Caused by: java.io.IOException: Failed to connect to > ip-10-12-46-235.us-west-2.compute.internal/10.12.46.235:55681 > Caused by: java.net.ConnectException: Connection refused: > ip-10-12-46-235.us-west-2.compute.internal/10.12.46.235:55681 > Full stack: https://gist.github.com/milescrawford/33a1c1e61d88cc8c6daf > Spark does not attempt to recreate these blocks - the tasks simply fail over > and over until the maxTaskAttempts value is reached. > It appears to me that there is some fault in the way preempted containers are > being handled - shouldn't these blocks be recreated on demand? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org