[ 
https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-24415:
----------------------------------
    Description: 
Running with spark 2.3 on yarn and having task failures and blacklisting, the 
aggregated metrics by executor are not correct.  In my example it should have 2 
failed tasks but it only shows one.    Note I tested with master branch to 
verify its not fixed.

I will attach screen shot.

To reproduce:

$SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client 
--executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" 
--conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf 
"spark.blacklist.stage.maxFailedExecutorsPerNode=1"  --conf 
"spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf 
"spark.blacklist.killBlacklistedExecutors=true"

 

sc.parallelize(1 to 10000, 10).map

{ x => | if (SparkEnv.get.executorId.toInt >= 1 && 
SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") 
| else (x % 3, x) | }

.reduceByKey((a, b) => a + b).collect()

  was:
Running with spark 2.3 on yarn and having task failures and blacklisting, the 
aggregated metrics by executor are not correct.  In my example it should have 2 
failed tasks but it only shows one.  

I will attach screen shot.

To reproduce:

$SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client 
--executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" 
--conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf 
"spark.blacklist.stage.maxFailedExecutorsPerNode=1"  --conf 
"spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf 
"spark.blacklist.killBlacklistedExecutors=true"

 

sc.parallelize(1 to 10000, 10).map { x =>
 | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 
4) throw new RuntimeException("Bad executor")
 | else (x % 3, x)
 | }.reduceByKey((a, b) => a + b).collect()


> Stage page aggregated executor metrics wrong when failures 
> -----------------------------------------------------------
>
>                 Key: SPARK-24415
>                 URL: https://issues.apache.org/jira/browse/SPARK-24415
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 2.3.0
>            Reporter: Thomas Graves
>            Priority: Major
>         Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png
>
>
> Running with spark 2.3 on yarn and having task failures and blacklisting, the 
> aggregated metrics by executor are not correct.  In my example it should have 
> 2 failed tasks but it only shows one.    Note I tested with master branch to 
> verify its not fixed.
> I will attach screen shot.
> To reproduce:
> $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client 
> --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" 
> --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf 
> "spark.blacklist.stage.maxFailedExecutorsPerNode=1"  --conf 
> "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf 
> "spark.blacklist.killBlacklistedExecutors=true"
>  
> sc.parallelize(1 to 10000, 10).map
> { x => | if (SparkEnv.get.executorId.toInt >= 1 && 
> SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad 
> executor") | else (x % 3, x) | }
> .reduceByKey((a, b) => a + b).collect()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to