ScrapCodes commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-705428013


   So, now the test is flaky and not failing. As it passed the second time.
   
   For my own exploration, later preserving the failed instance log:
   ```
   KubernetesSuite:
   - Run SparkPi with no resources
   - Run SparkPi with a very long application name.
   - Use SparkLauncher.NO_RESOURCE
   - Run SparkPi with a master URL without a scheme.
   - Run SparkPi with an argument.
   - Run SparkPi with custom labels, annotations, and environment variables.
   - All pods have the same service account by default
   - Run extraJVMOptions check on driver
   - Run SparkRemoteFileTest using a remote data file
   - Run SparkPi with env and mount secrets.
   - Run PySpark on simple pi.py example
   - Run PySpark with Python3 to test a pyfiles example
   - Run PySpark with memory customization
   - Run in client mode.
   - Start pod creation from template
   - PVs with local storage
   - Launcher client dependencies
   - Test basic decommissioning *** FAILED ***
     The code passed to eventually never returned normally. Attempted 184 times 
over 3.01608704245 minutes. Last failure message: "++ id -u
     + myuid=185
     ++ id -g
     + mygid=0
     + set +e
     ++ getent passwd 185
     + uidentry=
     + set -e
     + '[' -z '' ']'
     + '[' -w /etc/passwd ']'
     + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
     + SPARK_CLASSPATH=':/opt/spark/jars/*'
     + env
     + grep SPARK_JAVA_OPT_
     + sort -t_ -k4 -n
     + sed 's/[^=]*=\(.*\)/\1/g'
     + readarray -t SPARK_EXECUTOR_JAVA_OPTS
     + '[' -n '' ']'
     + '[' 3 == 2 ']'
     + '[' 3 == 3 ']'
     ++ python3 -V
     + pyv3='Python 3.7.3'
     + export PYTHON_VERSION=3.7.3
     + PYTHON_VERSION=3.7.3
     + export PYSPARK_PYTHON=python3
     + PYSPARK_PYTHON=python3
     + export PYSPARK_DRIVER_PYTHON=python3
     + PYSPARK_DRIVER_PYTHON=python3
     + '[' -n '' ']'
     + '[' -z ']'
     + '[' -z x ']'
     + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
     + case "$1" in
     + shift 1
     + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
     + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
local:///opt/spark/tests/decommissioning.py
     20/10/07 11:36:40 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
     Starting decom test
     Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
     20/10/07 11:36:41 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
     20/10/07 11:36:41 INFO ResourceUtils: 
==============================================================
     20/10/07 11:36:41 INFO ResourceUtils: No custom resources configured for 
spark.driver.
     20/10/07 11:36:41 INFO ResourceUtils: 
==============================================================
     20/10/07 11:36:41 INFO SparkContext: Submitted application: PyMemoryTest
     20/10/07 11:36:41 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
Map(cpus -> name: cpus, amount: 1.0)
     20/10/07 11:36:41 INFO ResourceProfile: Limiting resource is cpus at 1 
tasks per executor
     20/10/07 11:36:41 INFO ResourceProfileManager: Added ResourceProfile id: 0
     20/10/07 11:36:41 INFO SecurityManager: Changing view acls to: 185,jenkins
     20/10/07 11:36:41 INFO SecurityManager: Changing modify acls to: 
185,jenkins
     20/10/07 11:36:41 INFO SecurityManager: Changing view acls groups to: 
     20/10/07 11:36:41 INFO SecurityManager: Changing modify acls groups to: 
     20/10/07 11:36:41 INFO SecurityManager: SecurityManager: authentication 
enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
groups with view permissions: Set(); users  with modify permissions: Set(185, 
jenkins); groups with modify permissions: Set()
     20/10/07 11:36:42 INFO Utils: Successfully started service 'sparkDriver' 
on port 7078.
     20/10/07 11:36:42 INFO SparkEnv: Registering MapOutputTracker
     20/10/07 11:36:42 INFO SparkEnv: Registering BlockManagerMaster
     20/10/07 11:36:42 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
     20/10/07 11:36:42 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
     20/10/07 11:36:42 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
     20/10/07 11:36:42 INFO DiskBlockManager: Created local directory at 
/var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/blockmgr-754bcb40-809e-4163-a4e2-98af300183f6
     20/10/07 11:36:42 INFO MemoryStore: MemoryStore started with capacity 
593.9 MiB
     20/10/07 11:36:42 INFO SparkEnv: Registering OutputCommitCoordinator
     20/10/07 11:36:42 INFO Utils: Successfully started service 'SparkUI' on 
port 4040.
     20/10/07 11:36:42 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:4040
     20/10/07 11:36:43 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
     20/10/07 11:36:44 INFO ExecutorPodsAllocator: Going to request 3 executors 
from Kubernetes.
     20/10/07 11:36:44 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
     20/10/07 11:36:44 INFO BasicExecutorFeatureStep: Adding decommission 
script to lifecycle
     20/10/07 11:36:44 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
     20/10/07 11:36:44 INFO NettyBlockTransferService: Server created on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
     20/10/07 11:36:44 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
     20/10/07 11:36:44 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
 7079, None)
     20/10/07 11:36:44 INFO BlockManagerMasterEndpoint: Registering block 
manager 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 with 593.9 MiB RAM, BlockManagerId(driver, 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
 7079, None)
     20/10/07 11:36:44 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
 7079, None)
     20/10/07 11:36:44 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
 7079, None)
     20/10/07 11:36:44 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
     20/10/07 11:36:45 INFO BasicExecutorFeatureStep: Adding decommission 
script to lifecycle
     20/10/07 11:36:45 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
     20/10/07 11:36:45 INFO BasicExecutorFeatureStep: Adding decommission 
script to lifecycle
     20/10/07 11:36:45 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
     20/10/07 11:36:50 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:33974) with ID 1,  
ResourceProfileId 0
     20/10/07 11:36:50 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.6:53680) with ID 2,  
ResourceProfileId 0
     20/10/07 11:36:50 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.17.0.5:39709 with 593.9 MiB RAM, BlockManagerId(1, 172.17.0.5, 
39709, None)
     20/10/07 11:36:50 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.17.0.6:39551 with 593.9 MiB RAM, BlockManagerId(2, 172.17.0.6, 
39551, None)
     20/10/07 11:37:14 INFO KubernetesClusterSchedulerBackend: SchedulerBackend 
is ready for scheduling beginning after waiting 
maxRegisteredResourcesWaitingTime: 30000000000(ns)
     20/10/07 11:37:15 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/opt/spark/work-dir/spark-warehouse').
     20/10/07 11:37:15 INFO SharedState: Warehouse path is 
'file:/opt/spark/work-dir/spark-warehouse'.
     20/10/07 11:37:16 INFO SparkContext: Starting job: collect at 
/opt/spark/tests/decommissioning.py:44
     20/10/07 11:37:16 INFO DAGScheduler: Registering RDD 2 (groupByKey at 
/opt/spark/tests/decommissioning.py:43) as input to shuffle 0
     20/10/07 11:37:16 INFO DAGScheduler: Got job 0 (collect at 
/opt/spark/tests/decommissioning.py:44) with 5 output partitions
     20/10/07 11:37:16 INFO DAGScheduler: Final stage: ResultStage 1 (collect 
at /opt/spark/tests/decommissioning.py:44)
     20/10/07 11:37:16 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 0)
     20/10/07 11:37:16 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 
0)
     20/10/07 11:37:16 INFO DAGScheduler: Submitting ShuffleMapStage 0 
(PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which 
has no missing parents
     20/10/07 11:37:17 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 10.6 KiB, free 593.9 MiB)
     20/10/07 11:37:17 INFO MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 6.5 KiB, free 593.9 MiB)
     20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:17 INFO SparkContext: Created broadcast 0 from broadcast at 
DAGScheduler.scala:1348
     20/10/07 11:37:17 INFO DAGScheduler: Submitting 5 missing tasks from 
ShuffleMapStage 0 (PairwiseRDD[2] at groupByKey at 
/opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions 
Vector(0, 1, 2, 3, 4))
     20/10/07 11:37:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks 
resource profile 0
     20/10/07 11:37:17 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
0) (172.17.0.5, executor 1, partition 0, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:17 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 
1) (172.17.0.6, executor 2, partition 1, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 172.17.0.5:39709 (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 172.17.0.6:39551 (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:19 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 
2) (172.17.0.6, executor 2, partition 2, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 
1) in 2093 ms on 172.17.0.6 (executor 2) (1/5)
     20/10/07 11:37:19 INFO PythonAccumulatorV2: Connected to AccumulatorServer 
at host: 127.0.0.1 port: 53377
     20/10/07 11:37:19 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 
3) (172.17.0.5, executor 1, partition 3, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 
0) in 2153 ms on 172.17.0.5 (executor 1) (2/5)
     20/10/07 11:37:19 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 
4) (172.17.0.6, executor 2, partition 4, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:19 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 
2) in 132 ms on 172.17.0.6 (executor 2) (3/5)
     20/10/07 11:37:19 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 
3) in 145 ms on 172.17.0.5 (executor 1) (4/5)
     20/10/07 11:37:19 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 
4) in 99 ms on 172.17.0.6 (executor 2) (5/5)
     20/10/07 11:37:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
have all completed, from pool 
     20/10/07 11:37:19 INFO DAGScheduler: ShuffleMapStage 0 (groupByKey at 
/opt/spark/tests/decommissioning.py:43) finished in 2.530 s
     20/10/07 11:37:19 INFO DAGScheduler: looking for newly runnable stages
     20/10/07 11:37:19 INFO DAGScheduler: running: Set()
     20/10/07 11:37:19 INFO DAGScheduler: waiting: Set(ResultStage 1)
     20/10/07 11:37:19 INFO DAGScheduler: failed: Set()
     20/10/07 11:37:19 INFO DAGScheduler: Submitting ResultStage 1 
(PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44), which has 
no missing parents
     20/10/07 11:37:19 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 9.3 KiB, free 593.9 MiB)
     20/10/07 11:37:19 INFO MemoryStore: Block broadcast_1_piece0 stored as 
bytes in memory (estimated size 5.4 KiB, free 593.9 MiB)
     20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in 
memory on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:19 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:1348
     20/10/07 11:37:19 INFO DAGScheduler: Submitting 5 missing tasks from 
ResultStage 1 (PythonRDD[5] at collect at 
/opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions 
Vector(0, 1, 2, 3, 4))
     20/10/07 11:37:19 INFO TaskSchedulerImpl: Adding task set 1.0 with 5 tasks 
resource profile 0
     20/10/07 11:37:19 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 
5) (172.17.0.5, executor 1, partition 0, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:19 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 
6) (172.17.0.6, executor 2, partition 1, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in 
memory on 172.17.0.5:39709 (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in 
memory on 172.17.0.6:39551 (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.5:33974
     20/10/07 11:37:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.6:53680
     20/10/07 11:37:20 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 
7) (172.17.0.5, executor 1, partition 2, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 
5) in 667 ms on 172.17.0.5 (executor 1) (1/5)
     20/10/07 11:37:20 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 
8) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 
6) in 703 ms on 172.17.0.6 (executor 2) (2/5)
     20/10/07 11:37:20 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 
9) (172.17.0.5, executor 1, partition 4, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:20 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 
7) in 87 ms on 172.17.0.5 (executor 1) (3/5)
     20/10/07 11:37:20 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 
8) in 119 ms on 172.17.0.6 (executor 2) (4/5)
     20/10/07 11:37:20 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 
9) in 117 ms on 172.17.0.5 (executor 1) (5/5)
     20/10/07 11:37:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks 
have all completed, from pool 
     20/10/07 11:37:20 INFO DAGScheduler: ResultStage 1 (collect at 
/opt/spark/tests/decommissioning.py:44) finished in 0.898 s
     20/10/07 11:37:20 INFO DAGScheduler: Job 0 is finished. Cancelling 
potential speculative or zombie tasks for this job
     20/10/07 11:37:20 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 1: Stage finished
     20/10/07 11:37:20 INFO DAGScheduler: Job 0 finished: collect at 
/opt/spark/tests/decommissioning.py:44, took 3.730659 s
     1st accumulator value is: 100
     Waiting to give nodes time to finish migration, decom exec 1.
     ...
     20/10/07 11:37:20 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.5: 
The executor with id 1 was deleted by a user or the framework.
     20/10/07 11:37:20 INFO DAGScheduler: Executor lost: 1 (epoch 1)
     20/10/07 11:37:20 INFO BlockManagerMasterEndpoint: Trying to remove 
executor 1 from BlockManagerMaster.
     20/10/07 11:37:20 INFO BlockManagerMasterEndpoint: Removing block manager 
BlockManagerId(1, 172.17.0.5, 39709, None)
     20/10/07 11:37:20 INFO BlockManagerMaster: Removed 1 successfully in 
removeExecutor
     20/10/07 11:37:20 INFO DAGScheduler: Shuffle files lost for executor: 1 
(epoch 1)
     20/10/07 11:37:24 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes.
     20/10/07 11:37:24 INFO BasicExecutorFeatureStep: Adding decommission 
script to lifecycle
     20/10/07 11:37:24 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
     20/10/07 11:37:26 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:34196) with ID 3,  
ResourceProfileId 0
     20/10/07 11:37:26 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.17.0.5:38381 with 593.9 MiB RAM, BlockManagerId(3, 172.17.0.5, 
38381, None)
     20/10/07 11:37:50 INFO SparkContext: Starting job: count at 
/opt/spark/tests/decommissioning.py:49
     20/10/07 11:37:50 INFO DAGScheduler: Got job 1 (count at 
/opt/spark/tests/decommissioning.py:49) with 5 output partitions
     20/10/07 11:37:50 INFO DAGScheduler: Final stage: ResultStage 3 (count at 
/opt/spark/tests/decommissioning.py:49)
     20/10/07 11:37:50 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 2)
     20/10/07 11:37:50 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 
2)
     20/10/07 11:37:50 INFO DAGScheduler: Submitting ShuffleMapStage 2 
(PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which 
has no missing parents
     20/10/07 11:37:50 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 10.6 KiB, free 593.9 MiB)
     20/10/07 11:37:50 INFO MemoryStore: Block broadcast_2_piece0 stored as 
bytes in memory (estimated size 6.5 KiB, free 593.9 MiB)
     20/10/07 11:37:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in 
memory on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:50 INFO SparkContext: Created broadcast 2 from broadcast at 
DAGScheduler.scala:1348
     20/10/07 11:37:50 INFO DAGScheduler: Submitting 2 missing tasks from 
ShuffleMapStage 2 (PairwiseRDD[2] at groupByKey at 
/opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions 
Vector(0, 3))
     20/10/07 11:37:50 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks 
resource profile 0
     20/10/07 11:37:50 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 
10) (172.17.0.5, executor 3, partition 0, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:50 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 
11) (172.17.0.6, executor 2, partition 3, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:50 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 in memory (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:50 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
172.17.0.6:39551 in memory (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in 
memory on 172.17.0.6:39551 (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:50 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 
11) in 214 ms on 172.17.0.6 (executor 2) (1/2)
     20/10/07 11:37:51 INFO BlockManagerInfo: Added broadcast_2_piece0 in 
memory on 172.17.0.5:38381 (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:52 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 
10) in 1936 ms on 172.17.0.5 (executor 3) (2/2)
     20/10/07 11:37:52 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks 
have all completed, from pool 
     20/10/07 11:37:52 INFO DAGScheduler: ShuffleMapStage 2 (groupByKey at 
/opt/spark/tests/decommissioning.py:43) finished in 1.985 s
     20/10/07 11:37:52 INFO DAGScheduler: looking for newly runnable stages
     20/10/07 11:37:52 INFO DAGScheduler: running: Set()
     20/10/07 11:37:52 INFO DAGScheduler: waiting: Set(ResultStage 3)
     20/10/07 11:37:52 INFO DAGScheduler: failed: Set()
     20/10/07 11:37:52 INFO DAGScheduler: Submitting ResultStage 3 
(PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49), which has no 
missing parents
     20/10/07 11:37:52 INFO MemoryStore: Block broadcast_3 stored as values in 
memory (estimated size 10.6 KiB, free 593.9 MiB)
     20/10/07 11:37:52 INFO MemoryStore: Block broadcast_3_piece0 stored as 
bytes in memory (estimated size 5.9 KiB, free 593.9 MiB)
     20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in 
memory on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 (size: 5.9 KiB, free: 593.9 MiB)
     20/10/07 11:37:52 INFO SparkContext: Created broadcast 3 from broadcast at 
DAGScheduler.scala:1348
     20/10/07 11:37:52 INFO DAGScheduler: Submitting 5 missing tasks from 
ResultStage 3 (PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49) 
(first 15 tasks are for partitions Vector(0, 1, 2, 3, 4))
     20/10/07 11:37:52 INFO TaskSchedulerImpl: Adding task set 3.0 with 5 tasks 
resource profile 0
     20/10/07 11:37:52 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 
12) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:52 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 
13) (172.17.0.5, executor 3, partition 1, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in 
memory on 172.17.0.6:39551 (size: 5.9 KiB, free: 593.9 MiB)
     20/10/07 11:37:52 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.6:53680
     20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in 
memory on 172.17.0.5:38381 (size: 5.9 KiB, free: 593.9 MiB)
     20/10/07 11:37:52 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 
14) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:52 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 
12) in 215 ms on 172.17.0.6 (executor 2) (1/5)
     20/10/07 11:37:52 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 
15) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:52 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 
14) in 94 ms on 172.17.0.6 (executor 2) (2/5)
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 
15) in 93 ms on 172.17.0.6 (executor 2) (3/5)
     20/10/07 11:37:53 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 
16) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID 
16) in 101 ms on 172.17.0.6 (executor 2) (4/5)
     20/10/07 11:37:53 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.5:34196
     20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 in memory (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
172.17.0.6:39551 in memory (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
172.17.0.5:38381 in memory (size: 6.5 KiB, free: 593.9 MiB)
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 
13) in 741 ms on 172.17.0.5 (executor 3) (5/5)
     20/10/07 11:37:53 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks 
have all completed, from pool 
     20/10/07 11:37:53 INFO DAGScheduler: ResultStage 3 (count at 
/opt/spark/tests/decommissioning.py:49) finished in 0.758 s
     20/10/07 11:37:53 INFO DAGScheduler: Job 1 is finished. Cancelling 
potential speculative or zombie tasks for this job
     20/10/07 11:37:53 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 3: Stage finished
     20/10/07 11:37:53 INFO DAGScheduler: Job 1 finished: count at 
/opt/spark/tests/decommissioning.py:49, took 2.769232 s
     20/10/07 11:37:53 INFO SparkContext: Starting job: collect at 
/opt/spark/tests/decommissioning.py:50
     20/10/07 11:37:53 INFO DAGScheduler: Got job 2 (collect at 
/opt/spark/tests/decommissioning.py:50) with 5 output partitions
     20/10/07 11:37:53 INFO DAGScheduler: Final stage: ResultStage 5 (collect 
at /opt/spark/tests/decommissioning.py:50)
     20/10/07 11:37:53 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 4)
     20/10/07 11:37:53 INFO DAGScheduler: Missing parents: List()
     20/10/07 11:37:53 INFO DAGScheduler: Submitting ResultStage 5 
(PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44), which has 
no missing parents
     20/10/07 11:37:53 INFO MemoryStore: Block broadcast_4 stored as values in 
memory (estimated size 9.3 KiB, free 593.9 MiB)
     20/10/07 11:37:53 INFO MemoryStore: Block broadcast_4_piece0 stored as 
bytes in memory (estimated size 5.4 KiB, free 593.9 MiB)
     20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in 
memory on 
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
 (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:53 INFO SparkContext: Created broadcast 4 from broadcast at 
DAGScheduler.scala:1348
     20/10/07 11:37:53 INFO DAGScheduler: Submitting 5 missing tasks from 
ResultStage 5 (PythonRDD[5] at collect at 
/opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions 
Vector(0, 1, 2, 3, 4))
     20/10/07 11:37:53 INFO TaskSchedulerImpl: Adding task set 5.0 with 5 tasks 
resource profile 0
     20/10/07 11:37:53 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 
17) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:53 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 
18) (172.17.0.5, executor 3, partition 1, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in 
memory on 172.17.0.6:39551 (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in 
memory on 172.17.0.5:38381 (size: 5.4 KiB, free: 593.9 MiB)
     20/10/07 11:37:53 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 
19) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 
17) in 120 ms on 172.17.0.6 (executor 2) (1/5)
     20/10/07 11:37:53 INFO TaskSetManager: Starting task 3.0 in stage 5.0 (TID 
20) (172.17.0.5, executor 3, partition 3, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 
18) in 144 ms on 172.17.0.5 (executor 3) (2/5)
     20/10/07 11:37:53 INFO TaskSetManager: Starting task 4.0 in stage 5.0 (TID 
21) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 
19) in 71 ms on 172.17.0.6 (executor 2) (3/5)
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 3.0 in stage 5.0 (TID 
20) in 83 ms on 172.17.0.5 (executor 3) (4/5)
     20/10/07 11:37:53 INFO TaskSetManager: Finished task 4.0 in stage 5.0 (TID 
21) in 71 ms on 172.17.0.6 (executor 2) (5/5)
     20/10/07 11:37:53 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks 
have all completed, from pool 
     20/10/07 11:37:53 INFO DAGScheduler: ResultStage 5 (collect at 
/opt/spark/tests/decommissioning.py:50) finished in 0.276 s
     20/10/07 11:37:53 INFO DAGScheduler: Job 2 is finished. Cancelling 
potential speculative or zombie tasks for this job
     20/10/07 11:37:53 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 5: Stage finished
     20/10/07 11:37:53 INFO DAGScheduler: Job 2 finished: collect at 
/opt/spark/tests/decommissioning.py:50, took 0.286559 s
     Final accumulator value is: 140
     Finished waiting, stopping Spark.
     20/10/07 11:37:53 INFO SparkUI: Stopped Spark web UI at 
http://spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:4040
     20/10/07 11:37:53 INFO KubernetesClusterSchedulerBackend: Shutting down 
all executors
     20/10/07 11:37:53 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
     20/10/07 11:37:53 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
has been closed (this is expected if the application is shutting down.)
     20/10/07 11:37:53 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
     20/10/07 11:37:53 INFO MemoryStore: MemoryStore cleared
     20/10/07 11:37:53 INFO BlockManager: BlockManager stopped
     20/10/07 11:37:53 INFO BlockManagerMaster: BlockManagerMaster stopped
     20/10/07 11:37:53 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
     20/10/07 11:37:53 INFO SparkContext: Successfully stopped SparkContext
     Done, exiting Python
     20/10/07 11:37:54 INFO ShutdownHookManager: Shutdown hook called
     20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory 
/var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/spark-6d0323a5-e6f6-4c3d-8851-149cb0110338
     20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory 
/var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/spark-6d0323a5-e6f6-4c3d-8851-149cb0110338/pyspark-264151ef-fe00-4cc8-a175-ed2892c8f74f
     20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-90eaed58-3856-4e81-9460-473673eb0ff1
     " did not contain "Received decommission executor message" The application 
did not complete, did not find str Received decommission executor message. 
(KubernetesSuite.scala:387)
   Run completed in 12 minutes, 39 seconds.
   Total number of tests run: 18
   Suites: completed 2, aborted 0
   Tests: succeeded 17, failed 1, canceled 0, ignored 0, pending 0
   *** 1 TEST FAILED ***
   [INFO] 
------------------------------------------------------------------------
   [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
   [INFO] 
   [INFO] Spark Project Parent POM ........................... SUCCESS [  3.933 
s]
   [INFO] Spark Project Tags ................................. SUCCESS [  8.364 
s]
   [INFO] Spark Project Local DB ............................. SUCCESS [  3.887 
s]
   [INFO] Spark Project Networking ........................... SUCCESS [  5.530 
s]
   [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  3.191 
s]
   [INFO] Spark Project Unsafe ............................... SUCCESS [ 10.257 
s]
   [INFO] Spark Project Launcher ............................. SUCCESS [  3.477 
s]
   [INFO] Spark Project Core ................................. SUCCESS [02:15 
min]
   [INFO] Spark Project Kubernetes Integration Tests ......... FAILURE [15:20 
min]
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time:  18:15 min
   [INFO] Finished at: 2020-10-07T04:39:53-07:00
   [INFO] 
------------------------------------------------------------------------
   [ERROR] Failed to execute goal 
org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project 
spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please 
read the following articles:
   [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   [ERROR] 
   [ERROR] After correcting the problems, you can resume the build with the 
command
   [ERROR]   mvn <args> -rf :spark-kubernetes-integration-tests_2.12
   + retcode3=1
   + kill -9 62029
   + minikube stop
   :   Stopping "minikube" in kvm2 ...
   -   "minikube" stopped.
   /tmp/hudson7622963494261370199.sh: line 66: 62029 Killed                  
minikube mount ${PVC_TESTS_HOST_PATH}:${PVC_TESTS_VM_PATH} 
--9p-version=9p2000.L --gid=0 --uid=185
   + [[ 1 = 0 ]]
   + test_status=failure
   + /home/jenkins/bin/post_github_pr_comment.py
   Attempting to post to Github...
    > Post successful.
   + rm -rf /tmp/tmp.PP0E0J0Req
   + exit 1
   Build step 'Execute shell' marked build as failure
   Archiving artifacts
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34109/
   Test FAILed.
   Finished: FAILURE
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to