ScrapCodes commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-705428013
So, now the test is flaky and not failing. As it passed the second time. For my own exploration, later preserving the failed instance log: ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - Test basic decommissioning *** FAILED *** The code passed to eventually never returned normally. Attempted 184 times over 3.01608704245 minutes. Last failure message: "++ id -u + myuid=185 ++ id -g + mygid=0 + set +e ++ getent passwd 185 + uidentry= + set -e + '[' -z '' ']' + '[' -w /etc/passwd ']' + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' 3 == 2 ']' + '[' 3 == 3 ']' ++ python3 -V + pyv3='Python 3.7.3' + export PYTHON_VERSION=3.7.3 + PYTHON_VERSION=3.7.3 + export PYSPARK_PYTHON=python3 + PYSPARK_PYTHON=python3 + export PYSPARK_DRIVER_PYTHON=python3 + PYSPARK_DRIVER_PYTHON=python3 + '[' -n '' ']' + '[' -z ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///opt/spark/tests/decommissioning.py 20/10/07 11:36:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting decom test Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/10/07 11:36:41 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT 20/10/07 11:36:41 INFO ResourceUtils: ============================================================== 20/10/07 11:36:41 INFO ResourceUtils: No custom resources configured for spark.driver. 20/10/07 11:36:41 INFO ResourceUtils: ============================================================== 20/10/07 11:36:41 INFO SparkContext: Submitted application: PyMemoryTest 20/10/07 11:36:41 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 20/10/07 11:36:41 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor 20/10/07 11:36:41 INFO ResourceProfileManager: Added ResourceProfile id: 0 20/10/07 11:36:41 INFO SecurityManager: Changing view acls to: 185,jenkins 20/10/07 11:36:41 INFO SecurityManager: Changing modify acls to: 185,jenkins 20/10/07 11:36:41 INFO SecurityManager: Changing view acls groups to: 20/10/07 11:36:41 INFO SecurityManager: Changing modify acls groups to: 20/10/07 11:36:41 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(185, jenkins); groups with view permissions: Set(); users with modify permissions: Set(185, jenkins); groups with modify permissions: Set() 20/10/07 11:36:42 INFO Utils: Successfully started service 'sparkDriver' on port 7078. 20/10/07 11:36:42 INFO SparkEnv: Registering MapOutputTracker 20/10/07 11:36:42 INFO SparkEnv: Registering BlockManagerMaster 20/10/07 11:36:42 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 20/10/07 11:36:42 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 20/10/07 11:36:42 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 20/10/07 11:36:42 INFO DiskBlockManager: Created local directory at /var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/blockmgr-754bcb40-809e-4163-a4e2-98af300183f6 20/10/07 11:36:42 INFO MemoryStore: MemoryStore started with capacity 593.9 MiB 20/10/07 11:36:42 INFO SparkEnv: Registering OutputCommitCoordinator 20/10/07 11:36:42 INFO Utils: Successfully started service 'SparkUI' on port 4040. 20/10/07 11:36:42 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:4040 20/10/07 11:36:43 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 20/10/07 11:36:44 INFO ExecutorPodsAllocator: Going to request 3 executors from Kubernetes. 20/10/07 11:36:44 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/spark/conf) : 20/10/07 11:36:44 INFO BasicExecutorFeatureStep: Adding decommission script to lifecycle 20/10/07 11:36:44 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079. 20/10/07 11:36:44 INFO NettyBlockTransferService: Server created on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 20/10/07 11:36:44 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 20/10/07 11:36:44 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc, 7079, None) 20/10/07 11:36:44 INFO BlockManagerMasterEndpoint: Registering block manager spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 with 593.9 MiB RAM, BlockManagerId(driver, spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc, 7079, None) 20/10/07 11:36:44 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc, 7079, None) 20/10/07 11:36:44 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc, 7079, None) 20/10/07 11:36:44 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/spark/conf) : 20/10/07 11:36:45 INFO BasicExecutorFeatureStep: Adding decommission script to lifecycle 20/10/07 11:36:45 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/spark/conf) : 20/10/07 11:36:45 INFO BasicExecutorFeatureStep: Adding decommission script to lifecycle 20/10/07 11:36:45 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/spark/conf) : 20/10/07 11:36:50 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:33974) with ID 1, ResourceProfileId 0 20/10/07 11:36:50 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.6:53680) with ID 2, ResourceProfileId 0 20/10/07 11:36:50 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:39709 with 593.9 MiB RAM, BlockManagerId(1, 172.17.0.5, 39709, None) 20/10/07 11:36:50 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.6:39551 with 593.9 MiB RAM, BlockManagerId(2, 172.17.0.6, 39551, None) 20/10/07 11:37:14 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns) 20/10/07 11:37:15 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark/work-dir/spark-warehouse'). 20/10/07 11:37:15 INFO SharedState: Warehouse path is 'file:/opt/spark/work-dir/spark-warehouse'. 20/10/07 11:37:16 INFO SparkContext: Starting job: collect at /opt/spark/tests/decommissioning.py:44 20/10/07 11:37:16 INFO DAGScheduler: Registering RDD 2 (groupByKey at /opt/spark/tests/decommissioning.py:43) as input to shuffle 0 20/10/07 11:37:16 INFO DAGScheduler: Got job 0 (collect at /opt/spark/tests/decommissioning.py:44) with 5 output partitions 20/10/07 11:37:16 INFO DAGScheduler: Final stage: ResultStage 1 (collect at /opt/spark/tests/decommissioning.py:44) 20/10/07 11:37:16 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 20/10/07 11:37:16 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 20/10/07 11:37:16 INFO DAGScheduler: Submitting ShuffleMapStage 0 (PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which has no missing parents 20/10/07 11:37:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 10.6 KiB, free 593.9 MiB) 20/10/07 11:37:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.5 KiB, free 593.9 MiB) 20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:17 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1348 20/10/07 11:37:17 INFO DAGScheduler: Submitting 5 missing tasks from ShuffleMapStage 0 (PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4)) 20/10/07 11:37:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks resource profile 0 20/10/07 11:37:17 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (172.17.0.5, executor 1, partition 0, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:17 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (172.17.0.6, executor 2, partition 1, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:39709 (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.6:39551 (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:19 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (172.17.0.6, executor 2, partition 2, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2093 ms on 172.17.0.6 (executor 2) (1/5) 20/10/07 11:37:19 INFO PythonAccumulatorV2: Connected to AccumulatorServer at host: 127.0.0.1 port: 53377 20/10/07 11:37:19 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3) (172.17.0.5, executor 1, partition 3, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2153 ms on 172.17.0.5 (executor 1) (2/5) 20/10/07 11:37:19 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4) (172.17.0.6, executor 2, partition 4, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:19 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 132 ms on 172.17.0.6 (executor 2) (3/5) 20/10/07 11:37:19 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 145 ms on 172.17.0.5 (executor 1) (4/5) 20/10/07 11:37:19 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 99 ms on 172.17.0.6 (executor 2) (5/5) 20/10/07 11:37:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 20/10/07 11:37:19 INFO DAGScheduler: ShuffleMapStage 0 (groupByKey at /opt/spark/tests/decommissioning.py:43) finished in 2.530 s 20/10/07 11:37:19 INFO DAGScheduler: looking for newly runnable stages 20/10/07 11:37:19 INFO DAGScheduler: running: Set() 20/10/07 11:37:19 INFO DAGScheduler: waiting: Set(ResultStage 1) 20/10/07 11:37:19 INFO DAGScheduler: failed: Set() 20/10/07 11:37:19 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44), which has no missing parents 20/10/07 11:37:19 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 9.3 KiB, free 593.9 MiB) 20/10/07 11:37:19 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.4 KiB, free 593.9 MiB) 20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:19 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1348 20/10/07 11:37:19 INFO DAGScheduler: Submitting 5 missing tasks from ResultStage 1 (PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4)) 20/10/07 11:37:19 INFO TaskSchedulerImpl: Adding task set 1.0 with 5 tasks resource profile 0 20/10/07 11:37:19 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 5) (172.17.0.5, executor 1, partition 0, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:19 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 6) (172.17.0.6, executor 2, partition 1, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:39709 (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.6:39551 (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.17.0.5:33974 20/10/07 11:37:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.17.0.6:53680 20/10/07 11:37:20 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 7) (172.17.0.5, executor 1, partition 2, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 5) in 667 ms on 172.17.0.5 (executor 1) (1/5) 20/10/07 11:37:20 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 8) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 6) in 703 ms on 172.17.0.6 (executor 2) (2/5) 20/10/07 11:37:20 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 9) (172.17.0.5, executor 1, partition 4, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:20 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 7) in 87 ms on 172.17.0.5 (executor 1) (3/5) 20/10/07 11:37:20 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 8) in 119 ms on 172.17.0.6 (executor 2) (4/5) 20/10/07 11:37:20 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 9) in 117 ms on 172.17.0.5 (executor 1) (5/5) 20/10/07 11:37:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 20/10/07 11:37:20 INFO DAGScheduler: ResultStage 1 (collect at /opt/spark/tests/decommissioning.py:44) finished in 0.898 s 20/10/07 11:37:20 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 20/10/07 11:37:20 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished 20/10/07 11:37:20 INFO DAGScheduler: Job 0 finished: collect at /opt/spark/tests/decommissioning.py:44, took 3.730659 s 1st accumulator value is: 100 Waiting to give nodes time to finish migration, decom exec 1. ... 20/10/07 11:37:20 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.5: The executor with id 1 was deleted by a user or the framework. 20/10/07 11:37:20 INFO DAGScheduler: Executor lost: 1 (epoch 1) 20/10/07 11:37:20 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 20/10/07 11:37:20 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, 172.17.0.5, 39709, None) 20/10/07 11:37:20 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor 20/10/07 11:37:20 INFO DAGScheduler: Shuffle files lost for executor: 1 (epoch 1) 20/10/07 11:37:24 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes. 20/10/07 11:37:24 INFO BasicExecutorFeatureStep: Adding decommission script to lifecycle 20/10/07 11:37:24 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/spark/conf) : 20/10/07 11:37:26 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:34196) with ID 3, ResourceProfileId 0 20/10/07 11:37:26 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:38381 with 593.9 MiB RAM, BlockManagerId(3, 172.17.0.5, 38381, None) 20/10/07 11:37:50 INFO SparkContext: Starting job: count at /opt/spark/tests/decommissioning.py:49 20/10/07 11:37:50 INFO DAGScheduler: Got job 1 (count at /opt/spark/tests/decommissioning.py:49) with 5 output partitions 20/10/07 11:37:50 INFO DAGScheduler: Final stage: ResultStage 3 (count at /opt/spark/tests/decommissioning.py:49) 20/10/07 11:37:50 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2) 20/10/07 11:37:50 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2) 20/10/07 11:37:50 INFO DAGScheduler: Submitting ShuffleMapStage 2 (PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which has no missing parents 20/10/07 11:37:50 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 10.6 KiB, free 593.9 MiB) 20/10/07 11:37:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 6.5 KiB, free 593.9 MiB) 20/10/07 11:37:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:50 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1348 20/10/07 11:37:50 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions Vector(0, 3)) 20/10/07 11:37:50 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks resource profile 0 20/10/07 11:37:50 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 10) (172.17.0.5, executor 3, partition 0, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:50 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 11) (172.17.0.6, executor 2, partition 3, PROCESS_LOCAL, 7341 bytes) taskResourceAssignments Map() 20/10/07 11:37:50 INFO BlockManagerInfo: Removed broadcast_1_piece0 on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 in memory (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:50 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 172.17.0.6:39551 in memory (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.6:39551 (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:50 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 11) in 214 ms on 172.17.0.6 (executor 2) (1/2) 20/10/07 11:37:51 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:38381 (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:52 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 10) in 1936 ms on 172.17.0.5 (executor 3) (2/2) 20/10/07 11:37:52 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 20/10/07 11:37:52 INFO DAGScheduler: ShuffleMapStage 2 (groupByKey at /opt/spark/tests/decommissioning.py:43) finished in 1.985 s 20/10/07 11:37:52 INFO DAGScheduler: looking for newly runnable stages 20/10/07 11:37:52 INFO DAGScheduler: running: Set() 20/10/07 11:37:52 INFO DAGScheduler: waiting: Set(ResultStage 3) 20/10/07 11:37:52 INFO DAGScheduler: failed: Set() 20/10/07 11:37:52 INFO DAGScheduler: Submitting ResultStage 3 (PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49), which has no missing parents 20/10/07 11:37:52 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 10.6 KiB, free 593.9 MiB) 20/10/07 11:37:52 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 5.9 KiB, free 593.9 MiB) 20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 (size: 5.9 KiB, free: 593.9 MiB) 20/10/07 11:37:52 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1348 20/10/07 11:37:52 INFO DAGScheduler: Submitting 5 missing tasks from ResultStage 3 (PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4)) 20/10/07 11:37:52 INFO TaskSchedulerImpl: Adding task set 3.0 with 5 tasks resource profile 0 20/10/07 11:37:52 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 12) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:52 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 13) (172.17.0.5, executor 3, partition 1, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.17.0.6:39551 (size: 5.9 KiB, free: 593.9 MiB) 20/10/07 11:37:52 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.17.0.6:53680 20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.17.0.5:38381 (size: 5.9 KiB, free: 593.9 MiB) 20/10/07 11:37:52 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 14) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:52 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 12) in 215 ms on 172.17.0.6 (executor 2) (1/5) 20/10/07 11:37:52 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 15) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:52 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 14) in 94 ms on 172.17.0.6 (executor 2) (2/5) 20/10/07 11:37:53 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 15) in 93 ms on 172.17.0.6 (executor 2) (3/5) 20/10/07 11:37:53 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 16) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:53 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID 16) in 101 ms on 172.17.0.6 (executor 2) (4/5) 20/10/07 11:37:53 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.17.0.5:34196 20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 in memory (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 172.17.0.6:39551 in memory (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 172.17.0.5:38381 in memory (size: 6.5 KiB, free: 593.9 MiB) 20/10/07 11:37:53 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 13) in 741 ms on 172.17.0.5 (executor 3) (5/5) 20/10/07 11:37:53 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 20/10/07 11:37:53 INFO DAGScheduler: ResultStage 3 (count at /opt/spark/tests/decommissioning.py:49) finished in 0.758 s 20/10/07 11:37:53 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job 20/10/07 11:37:53 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished 20/10/07 11:37:53 INFO DAGScheduler: Job 1 finished: count at /opt/spark/tests/decommissioning.py:49, took 2.769232 s 20/10/07 11:37:53 INFO SparkContext: Starting job: collect at /opt/spark/tests/decommissioning.py:50 20/10/07 11:37:53 INFO DAGScheduler: Got job 2 (collect at /opt/spark/tests/decommissioning.py:50) with 5 output partitions 20/10/07 11:37:53 INFO DAGScheduler: Final stage: ResultStage 5 (collect at /opt/spark/tests/decommissioning.py:50) 20/10/07 11:37:53 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 4) 20/10/07 11:37:53 INFO DAGScheduler: Missing parents: List() 20/10/07 11:37:53 INFO DAGScheduler: Submitting ResultStage 5 (PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44), which has no missing parents 20/10/07 11:37:53 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 9.3 KiB, free 593.9 MiB) 20/10/07 11:37:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 5.4 KiB, free 593.9 MiB) 20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079 (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:53 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1348 20/10/07 11:37:53 INFO DAGScheduler: Submitting 5 missing tasks from ResultStage 5 (PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4)) 20/10/07 11:37:53 INFO TaskSchedulerImpl: Adding task set 5.0 with 5 tasks resource profile 0 20/10/07 11:37:53 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 17) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:53 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 18) (172.17.0.5, executor 3, partition 1, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.17.0.6:39551 (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.17.0.5:38381 (size: 5.4 KiB, free: 593.9 MiB) 20/10/07 11:37:53 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 19) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:53 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 17) in 120 ms on 172.17.0.6 (executor 2) (1/5) 20/10/07 11:37:53 INFO TaskSetManager: Starting task 3.0 in stage 5.0 (TID 20) (172.17.0.5, executor 3, partition 3, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:53 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 18) in 144 ms on 172.17.0.5 (executor 3) (2/5) 20/10/07 11:37:53 INFO TaskSetManager: Starting task 4.0 in stage 5.0 (TID 21) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) taskResourceAssignments Map() 20/10/07 11:37:53 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 19) in 71 ms on 172.17.0.6 (executor 2) (3/5) 20/10/07 11:37:53 INFO TaskSetManager: Finished task 3.0 in stage 5.0 (TID 20) in 83 ms on 172.17.0.5 (executor 3) (4/5) 20/10/07 11:37:53 INFO TaskSetManager: Finished task 4.0 in stage 5.0 (TID 21) in 71 ms on 172.17.0.6 (executor 2) (5/5) 20/10/07 11:37:53 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 20/10/07 11:37:53 INFO DAGScheduler: ResultStage 5 (collect at /opt/spark/tests/decommissioning.py:50) finished in 0.276 s 20/10/07 11:37:53 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job 20/10/07 11:37:53 INFO TaskSchedulerImpl: Killing all running tasks in stage 5: Stage finished 20/10/07 11:37:53 INFO DAGScheduler: Job 2 finished: collect at /opt/spark/tests/decommissioning.py:50, took 0.286559 s Final accumulator value is: 140 Finished waiting, stopping Spark. 20/10/07 11:37:53 INFO SparkUI: Stopped Spark web UI at http://spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:4040 20/10/07 11:37:53 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 20/10/07 11:37:53 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 20/10/07 11:37:53 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 20/10/07 11:37:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/10/07 11:37:53 INFO MemoryStore: MemoryStore cleared 20/10/07 11:37:53 INFO BlockManager: BlockManager stopped 20/10/07 11:37:53 INFO BlockManagerMaster: BlockManagerMaster stopped 20/10/07 11:37:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/10/07 11:37:53 INFO SparkContext: Successfully stopped SparkContext Done, exiting Python 20/10/07 11:37:54 INFO ShutdownHookManager: Shutdown hook called 20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory /var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/spark-6d0323a5-e6f6-4c3d-8851-149cb0110338 20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory /var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/spark-6d0323a5-e6f6-4c3d-8851-149cb0110338/pyspark-264151ef-fe00-4cc8-a175-ed2892c8f74f 20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-90eaed58-3856-4e81-9460-473673eb0ff1 " did not contain "Received decommission executor message" The application did not complete, did not find str Received decommission executor message. (KubernetesSuite.scala:387) Run completed in 12 minutes, 39 seconds. Total number of tests run: 18 Suites: completed 2, aborted 0 Tests: succeeded 17, failed 1, canceled 0, ignored 0, pending 0 *** 1 TEST FAILED *** [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 3.933 s] [INFO] Spark Project Tags ................................. SUCCESS [ 8.364 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 3.887 s] [INFO] Spark Project Networking ........................... SUCCESS [ 5.530 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 3.191 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 10.257 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 3.477 s] [INFO] Spark Project Core ................................. SUCCESS [02:15 min] [INFO] Spark Project Kubernetes Integration Tests ......... FAILURE [15:20 min] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18:15 min [INFO] Finished at: 2020-10-07T04:39:53-07:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :spark-kubernetes-integration-tests_2.12 + retcode3=1 + kill -9 62029 + minikube stop : Stopping "minikube" in kvm2 ... - "minikube" stopped. /tmp/hudson7622963494261370199.sh: line 66: 62029 Killed minikube mount ${PVC_TESTS_HOST_PATH}:${PVC_TESTS_VM_PATH} --9p-version=9p2000.L --gid=0 --uid=185 + [[ 1 = 0 ]] + test_status=failure + /home/jenkins/bin/post_github_pr_comment.py Attempting to post to Github... > Post successful. + rm -rf /tmp/tmp.PP0E0J0Req + exit 1 Build step 'Execute shell' marked build as failure Archiving artifacts Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34109/ Test FAILed. Finished: FAILURE ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org