[spark] branch branch-2.4 updated: [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for partially launched task in barrier-mode

dongjoon Wed, 17 Jun 2020 10:03:06 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 23ff9e6  [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for 
partially launched task in barrier-mode
23ff9e6 is described below

commit 23ff9e6a13d7f2e14f2b154a9dabfc7cdca430a4
Author: yi.wu <yi...@databricks.com>
AuthorDate: Wed Jun 17 09:59:14 2020 -0700

    [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for partially launched 
task in barrier-mode
    
    ### What changes were proposed in this pull request?
    
    This PR changes the test to get an active executorId and set it as 
preferred location instead of setting a fixed preferred location.
    
    ### Why are the changes needed?
    
    The test is flaky. After checking the 
[log](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/artifact/core/),
 I find the root cause is:
    
    Two test cases from different test suites got submitted at the same time 
because of concurrent execution. In this particular case, the two test cases 
(from DistributedSuite and BarrierTaskContextSuite) both launch under 
local-cluster mode. The two applications are submitted at the SAME time so they 
have the same applications(app-20200615210132-0000). Thus, when the cluster of 
BarrierTaskContextSuite is launching executors, it failed to create the 
directory for the executor 0, because  [...]
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The test can not be reproduced locally. We can only know it's been fixed 
when it's no longer flaky on Jenkins.
    
    Closes #28851 from Ngone51/fix-spark-32000-24.
    
    Authored-by: yi.wu <yi...@databricks.com>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 .../scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala  | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 469cc4a..92a97d1 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -159,11 +159,13 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext {
       .setAppName("test-cluster")
       .set("spark.test.noStageRetry", "true")
     sc = new SparkContext(conf)
+    TestUtils.waitUntilExecutorsUp(sc, 2, 6000)
+    val id = sc.getExecutorIds().head
     val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
     val dep = new OneToOneDependency[Int](rdd0)
-    // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for
+    // set up a barrier stage with 2 tasks and both tasks prefer the same 
executor (only 1 core) for
     // scheduling. So, one of tasks won't be scheduled in one round of 
resource offer.
-    val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq("executor_h_0"), 
Seq("executor_h_0")))
+    val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq(s"executor_h_$id"), 
Seq(s"executor_h_$id")))
     val errorMsg = intercept[SparkException] {
       rdd.barrier().mapPartitions { iter =>
         BarrierTaskContext.get().barrier()


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for partially launched task in barrier-mode

Reply via email to