[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Graves updated SPARK-24415: ---------------------------------- Description: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 10000, 10).map\{ x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() was: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 10000, 10).map { x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | } .reduceByKey((a, b) => a + b).collect() > Stage page aggregated executor metrics wrong when failures > ----------------------------------------------------------- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 2.3.0 > Reporter: Thomas Graves > Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > > sc.parallelize(1 to 10000, 10).map\{ x => | if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org