[ https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057091#comment-16057091 ]
Damian Momot commented on SPARK-19293: -------------------------------------- Yep, Some tasks are marked as "killed" but some become "failed". In some specific cases if number of fails is very big it causes entire spark job to fail. Disabling "speculation" solves failing entirely. It was working flawlessly before Spark 2.1 with "speculation" enabled > Spark 2.1.x unstable with spark.speculation=true > ------------------------------------------------ > > Key: SPARK-19293 > URL: https://issues.apache.org/jira/browse/SPARK-19293 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0, 2.1.1 > Reporter: Damian Momot > Priority: Critical > > After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often > failing when speculative mode is enabled. > In 2.0.2 speculative tasks were simply skipped if they were not used for > result (i.e. other instance finished earlier) - and it was clearly visible in > UI that those tasks were not counted as failures. > In 2.1.0 many tasks are being marked failed/killed when speculative tasks > start to run (that is at the end of stage when there are spare executors to > use) which also leads to entire stage/job failures. > Disabling spark.speculation solves failing problem - but speculative mode is > very useful especially when different executors run on machines with varying > load (for example in YARN) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org