[jira] [Created] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

Prashant Sharma (Jira) Fri, 18 Sep 2020 04:46:08 -0700

Prashant Sharma created SPARK-32937:
---------------------------------------


             Summary: DecomissionSuite in k8s integration tests is failing.
                 Key: SPARK-32937
                 URL: https://issues.apache.org/jira/browse/SPARK-32937
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.1.0
            Reporter: Prashant Sharma



Logs from the failing test, copied from jenkins. As of now, it is always 
failing. 

{code}
- Test basic decommissioning *** FAILED ***
  The code passed to eventually never returned normally. Attempted 182 times 
over 3.00377927275 minutes. Last failure message: "++ id -u
  + myuid=185
  ++ id -g
  + mygid=0
  + set +e
  ++ getent passwd 185
  + uidentry=
  + set -e
  + '[' -z '' ']'
  + '[' -w /etc/passwd ']'
  + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
  + SPARK_CLASSPATH=':/opt/spark/jars/*'
  + env
  + grep SPARK_JAVA_OPT_
  + sort -t_ -k4 -n
  + sed 's/[^=]*=\(.*\)/\1/g'
  + readarray -t SPARK_EXECUTOR_JAVA_OPTS
  + '[' -n '' ']'
  + '[' 3 == 2 ']'
  + '[' 3 == 3 ']'
  ++ python3 -V
  + pyv3='Python 3.7.3'
  + export PYTHON_VERSION=3.7.3
  + PYTHON_VERSION=3.7.3
  + export PYSPARK_PYTHON=python3
  + PYSPARK_PYTHON=python3
  + export PYSPARK_DRIVER_PYTHON=python3
  + PYSPARK_DRIVER_PYTHON=python3
  + '[' -n '' ']'
  + '[' -z ']'
  + '[' -z x ']'
  + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
  + case "$1" in
  + shift 1
  + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
  + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
local:///opt/spark/tests/decommissioning.py
  20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
  Starting decom test
  Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
  20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
  20/09/17 11:06:57 INFO ResourceUtils: 
==============================================================
  20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
spark.driver.
  20/09/17 11:06:57 INFO ResourceUtils: 
==============================================================
  20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
  20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
Map(cpus -> name: cpus, amount: 1.0)
  20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 tasks 
per executor
  20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
  20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
  20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
  20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
  20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
  20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
groups with view permissions: Set(); users  with modify permissions: Set(185, 
jenkins); groups with modify permissions: Set()
  20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
port 7078.
  20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
  20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
  20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
  20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
up
  20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
  20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
/var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
  20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
MiB
  20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator
  20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
  20/09/17 11:06:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:4040
  20/09/17 11:06:58 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
  20/09/17 11:06:59 INFO ExecutorPodsAllocator: Going to request 3 executors 
from Kubernetes.
  20/09/17 11:06:59 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
  20/09/17 11:07:00 INFO BasicExecutorFeatureStep: Adding decommission script 
to lifecycle
  20/09/17 11:07:00 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
  20/09/17 11:07:00 INFO NettyBlockTransferService: Server created on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
  20/09/17 11:07:00 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
  20/09/17 11:07:00 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc,
 7079, None)
  20/09/17 11:07:00 INFO BlockManagerMasterEndpoint: Registering block manager 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 with 593.9 MiB RAM, BlockManagerId(driver, 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc,
 7079, None)
  20/09/17 11:07:00 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc,
 7079, None)
  20/09/17 11:07:00 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
  20/09/17 11:07:00 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc,
 7079, None)
  20/09/17 11:07:00 INFO BasicExecutorFeatureStep: Adding decommission script 
to lifecycle
  20/09/17 11:07:00 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
  20/09/17 11:07:00 INFO BasicExecutorFeatureStep: Adding decommission script 
to lifecycle
  20/09/17 11:07:00 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
  20/09/17 11:07:05 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.6:50176) with ID 2,  
ResourceProfileId 0
  20/09/17 11:07:05 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:35624) with ID 1,  
ResourceProfileId 0
  20/09/17 11:07:05 INFO BlockManagerMasterEndpoint: Registering block manager 
172.17.0.6:33547 with 593.9 MiB RAM, BlockManagerId(2, 172.17.0.6, 33547, None)
  20/09/17 11:07:05 INFO BlockManagerMasterEndpoint: Registering block manager 
172.17.0.5:46327 with 593.9 MiB RAM, BlockManagerId(1, 172.17.0.5, 46327, None)
  20/09/17 11:07:29 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is 
ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
30000000000(ns)
  20/09/17 11:07:30 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/opt/spark/work-dir/spark-warehouse').
  20/09/17 11:07:30 INFO SharedState: Warehouse path is 
'file:/opt/spark/work-dir/spark-warehouse'.
  20/09/17 11:07:32 INFO SparkContext: Starting job: collect at 
/opt/spark/tests/decommissioning.py:44
  20/09/17 11:07:32 INFO DAGScheduler: Registering RDD 2 (groupByKey at 
/opt/spark/tests/decommissioning.py:43) as input to shuffle 0
  20/09/17 11:07:32 INFO DAGScheduler: Got job 0 (collect at 
/opt/spark/tests/decommissioning.py:44) with 5 output partitions
  20/09/17 11:07:32 INFO DAGScheduler: Final stage: ResultStage 1 (collect at 
/opt/spark/tests/decommissioning.py:44)
  20/09/17 11:07:32 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 0)
  20/09/17 11:07:32 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
  20/09/17 11:07:32 INFO DAGScheduler: Submitting ShuffleMapStage 0 
(PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which 
has no missing parents
  20/09/17 11:07:32 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 10.6 KiB, free 593.9 MiB)
  20/09/17 11:07:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
in memory (estimated size 6.5 KiB, free 593.9 MiB)
  20/09/17 11:07:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 (size: 6.5 KiB, free: 593.9 MiB)
  20/09/17 11:07:32 INFO SparkContext: Created broadcast 0 from broadcast at 
DAGScheduler.scala:1348
  20/09/17 11:07:32 INFO DAGScheduler: Submitting 5 missing tasks from 
ShuffleMapStage 0 (PairwiseRDD[2] at groupByKey at 
/opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions 
Vector(0, 1, 2, 3, 4))
  20/09/17 11:07:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks 
resource profile 0
  20/09/17 11:07:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) 
(172.17.0.6, executor 2, partition 0, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) 
(172.17.0.5, executor 1, partition 1, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.17.0.5:46327 (size: 6.5 KiB, free: 593.9 MiB)
  20/09/17 11:07:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.17.0.6:33547 (size: 6.5 KiB, free: 593.9 MiB)
  20/09/17 11:07:34 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) 
(172.17.0.5, executor 1, partition 2, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:34 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) 
in 1825 ms on 172.17.0.5 (executor 1) (1/5)
  20/09/17 11:07:34 INFO PythonAccumulatorV2: Connected to AccumulatorServer at 
host: 127.0.0.1 port: 47109
  20/09/17 11:07:34 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3) 
(172.17.0.6, executor 2, partition 3, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
in 1960 ms on 172.17.0.6 (executor 2) (2/5)
  20/09/17 11:07:34 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4) 
(172.17.0.5, executor 1, partition 4, PROCESS_LOCAL, 7341 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:34 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) 
in 136 ms on 172.17.0.5 (executor 1) (3/5)
  20/09/17 11:07:34 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) 
in 119 ms on 172.17.0.6 (executor 2) (4/5)
  20/09/17 11:07:34 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) 
in 117 ms on 172.17.0.5 (executor 1) (5/5)
  20/09/17 11:07:34 INFO DAGScheduler: ShuffleMapStage 0 (groupByKey at 
/opt/spark/tests/decommissioning.py:43) finished in 2.352 s
  20/09/17 11:07:34 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
have all completed, from pool 
  20/09/17 11:07:34 INFO DAGScheduler: looking for newly runnable stages
  20/09/17 11:07:34 INFO DAGScheduler: running: Set()
  20/09/17 11:07:34 INFO DAGScheduler: waiting: Set(ResultStage 1)
  20/09/17 11:07:34 INFO DAGScheduler: failed: Set()
  20/09/17 11:07:34 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[5] 
at collect at /opt/spark/tests/decommissioning.py:44), which has no missing 
parents
  20/09/17 11:07:34 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 9.3 KiB, free 593.9 MiB)
  20/09/17 11:07:34 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
in memory (estimated size 5.4 KiB, free 593.9 MiB)
  20/09/17 11:07:34 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:07:34 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:1348
  20/09/17 11:07:34 INFO DAGScheduler: Submitting 5 missing tasks from 
ResultStage 1 (PythonRDD[5] at collect at 
/opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions 
Vector(0, 1, 2, 3, 4))
  20/09/17 11:07:34 INFO TaskSchedulerImpl: Adding task set 1.0 with 5 tasks 
resource profile 0
  20/09/17 11:07:34 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 5) 
(172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:34 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 6) 
(172.17.0.5, executor 1, partition 1, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:34 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
on 172.17.0.6:33547 (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:07:34 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
on 172.17.0.5:46327 (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:07:34 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.5:35624
  20/09/17 11:07:34 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.6:50176
  20/09/17 11:07:35 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 7) 
(172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:35 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 5) 
in 729 ms on 172.17.0.6 (executor 2) (1/5)
  20/09/17 11:07:35 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 8) 
(172.17.0.5, executor 1, partition 3, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:35 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 6) 
in 755 ms on 172.17.0.5 (executor 1) (2/5)
  20/09/17 11:07:35 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 9) 
(172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:07:35 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 7) 
in 113 ms on 172.17.0.6 (executor 2) (3/5)
  20/09/17 11:07:35 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 8) 
in 104 ms on 172.17.0.5 (executor 1) (4/5)
  20/09/17 11:07:35 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 9) 
in 82 ms on 172.17.0.6 (executor 2) (5/5)
  20/09/17 11:07:35 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks 
have all completed, from pool 
  20/09/17 11:07:35 INFO DAGScheduler: ResultStage 1 (collect at 
/opt/spark/tests/decommissioning.py:44) finished in 0.943 s
  20/09/17 11:07:35 INFO DAGScheduler: Job 0 is finished. Cancelling potential 
speculative or zombie tasks for this job
  20/09/17 11:07:35 INFO TaskSchedulerImpl: Killing all running tasks in stage 
1: Stage finished
  20/09/17 11:07:35 INFO DAGScheduler: Job 0 finished: collect at 
/opt/spark/tests/decommissioning.py:44, took 3.420388 s
  1st accumulator value is: 100
  Waiting to give nodes time to finish migration, decom exec 1.
  ...
  20/09/17 11:07:36 WARN 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Received executor 1 
decommissioned message
  20/09/17 11:07:36 INFO ShuffleStatus: Updating map output for 4 to 
BlockManagerId(2, 172.17.0.6, 33547, None)
  20/09/17 11:07:36 INFO ShuffleStatus: Updating map output for 1 to 
BlockManagerId(2, 172.17.0.6, 33547, None)
  20/09/17 11:07:36 INFO ShuffleStatus: Updating map output for 2 to 
BlockManagerId(2, 172.17.0.6, 33547, None)
  20/09/17 11:07:36 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 in memory (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:07:36 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
172.17.0.5:46327 in memory (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:07:36 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
172.17.0.6:33547 in memory (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:07:37 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.5: 
Executor decommission.
  20/09/17 11:07:37 INFO DAGScheduler: Executor lost: 1 (epoch 1)
  20/09/17 11:07:37 INFO BlockManagerMasterEndpoint: Trying to remove executor 
1 from BlockManagerMaster.
  20/09/17 11:07:37 INFO BlockManagerMasterEndpoint: Removing block manager 
BlockManagerId(1, 172.17.0.5, 46327, None)
  20/09/17 11:07:37 INFO BlockManagerMaster: Removed 1 successfully in 
removeExecutor
  20/09/17 11:07:37 INFO DAGScheduler: Shuffle files lost for executor: 1 
(epoch 1)
  20/09/17 11:07:41 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes.
  20/09/17 11:07:41 INFO BasicExecutorFeatureStep: Adding decommission script 
to lifecycle
  20/09/17 11:07:41 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/opt/spark/conf) : 
  20/09/17 11:07:43 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:35848) with ID 3,  
ResourceProfileId 0
  20/09/17 11:07:43 INFO BlockManagerMasterEndpoint: Registering block manager 
172.17.0.5:34299 with 593.9 MiB RAM, BlockManagerId(3, 172.17.0.5, 34299, None)
  20/09/17 11:08:05 INFO SparkContext: Starting job: count at 
/opt/spark/tests/decommissioning.py:49
  20/09/17 11:08:05 INFO DAGScheduler: Got job 1 (count at 
/opt/spark/tests/decommissioning.py:49) with 5 output partitions
  20/09/17 11:08:05 INFO DAGScheduler: Final stage: ResultStage 3 (count at 
/opt/spark/tests/decommissioning.py:49)
  20/09/17 11:08:05 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 2)
  20/09/17 11:08:05 INFO DAGScheduler: Missing parents: List()
  20/09/17 11:08:05 INFO DAGScheduler: Submitting ResultStage 3 (PythonRDD[6] 
at count at /opt/spark/tests/decommissioning.py:49), which has no missing 
parents
  20/09/17 11:08:05 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 10.6 KiB, free 593.9 MiB)
  20/09/17 11:08:05 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes 
in memory (estimated size 5.9 KiB, free 593.9 MiB)
  20/09/17 11:08:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory 
on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 (size: 5.9 KiB, free: 593.9 MiB)
  20/09/17 11:08:05 INFO SparkContext: Created broadcast 2 from broadcast at 
DAGScheduler.scala:1348
  20/09/17 11:08:05 INFO DAGScheduler: Submitting 5 missing tasks from 
ResultStage 3 (PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49) 
(first 15 tasks are for partitions Vector(0, 1, 2, 3, 4))
  20/09/17 11:08:05 INFO TaskSchedulerImpl: Adding task set 3.0 with 5 tasks 
resource profile 0
  20/09/17 11:08:05 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 
10) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory 
on 172.17.0.6:33547 (size: 5.9 KiB, free: 593.9 MiB)
  20/09/17 11:08:05 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 0 to 172.17.0.6:50176
  20/09/17 11:08:05 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 
10) in 133 ms on 172.17.0.6 (executor 2) (1/5)
  20/09/17 11:08:05 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 
11) (172.17.0.6, executor 2, partition 1, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:05 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 
12) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:05 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 
11) in 81 ms on 172.17.0.6 (executor 2) (2/5)
  20/09/17 11:08:05 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 
13) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:05 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 
12) in 85 ms on 172.17.0.6 (executor 2) (3/5)
  20/09/17 11:08:05 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 
14) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:05 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 
13) in 73 ms on 172.17.0.6 (executor 2) (4/5)
  20/09/17 11:08:06 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID 
14) in 91 ms on 172.17.0.6 (executor 2) (5/5)
  20/09/17 11:08:06 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks 
have all completed, from pool 
  20/09/17 11:08:06 INFO DAGScheduler: ResultStage 3 (count at 
/opt/spark/tests/decommissioning.py:49) finished in 0.478 s
  20/09/17 11:08:06 INFO DAGScheduler: Job 1 is finished. Cancelling potential 
speculative or zombie tasks for this job
  20/09/17 11:08:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 
3: Stage finished
  20/09/17 11:08:06 INFO DAGScheduler: Job 1 finished: count at 
/opt/spark/tests/decommissioning.py:49, took 0.489355 s
  20/09/17 11:08:06 INFO SparkContext: Starting job: collect at 
/opt/spark/tests/decommissioning.py:50
  20/09/17 11:08:06 INFO DAGScheduler: Got job 2 (collect at 
/opt/spark/tests/decommissioning.py:50) with 5 output partitions
  20/09/17 11:08:06 INFO DAGScheduler: Final stage: ResultStage 5 (collect at 
/opt/spark/tests/decommissioning.py:50)
  20/09/17 11:08:06 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 4)
  20/09/17 11:08:06 INFO DAGScheduler: Missing parents: List()
  20/09/17 11:08:06 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 in memory (size: 5.9 KiB, free: 593.9 MiB)
  20/09/17 11:08:06 INFO DAGScheduler: Submitting ResultStage 5 (PythonRDD[5] 
at collect at /opt/spark/tests/decommissioning.py:44), which has no missing 
parents
  20/09/17 11:08:06 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
172.17.0.6:33547 in memory (size: 5.9 KiB, free: 593.9 MiB)
  20/09/17 11:08:06 INFO MemoryStore: Block broadcast_3 stored as values in 
memory (estimated size 9.3 KiB, free 593.9 MiB)
  20/09/17 11:08:06 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes 
in memory (estimated size 5.4 KiB, free 593.9 MiB)
  20/09/17 11:08:06 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory 
on 
spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:7079
 (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:08:06 INFO SparkContext: Created broadcast 3 from broadcast at 
DAGScheduler.scala:1348
  20/09/17 11:08:06 INFO DAGScheduler: Submitting 5 missing tasks from 
ResultStage 5 (PythonRDD[5] at collect at 
/opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions 
Vector(0, 1, 2, 3, 4))
  20/09/17 11:08:06 INFO TaskSchedulerImpl: Adding task set 5.0 with 5 tasks 
resource profile 0
  20/09/17 11:08:06 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 
15) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:06 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory 
on 172.17.0.6:33547 (size: 5.4 KiB, free: 593.9 MiB)
  20/09/17 11:08:06 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 
16) (172.17.0.6, executor 2, partition 1, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:06 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 
15) in 105 ms on 172.17.0.6 (executor 2) (1/5)
  20/09/17 11:08:06 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 
16) in 84 ms on 172.17.0.6 (executor 2) (2/5)
  20/09/17 11:08:06 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 
17) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:06 INFO TaskSetManager: Starting task 3.0 in stage 5.0 (TID 
18) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:06 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 
17) in 76 ms on 172.17.0.6 (executor 2) (3/5)
  20/09/17 11:08:06 INFO TaskSetManager: Starting task 4.0 in stage 5.0 (TID 
19) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes) 
taskResourceAssignments Map()
  20/09/17 11:08:06 INFO TaskSetManager: Finished task 3.0 in stage 5.0 (TID 
18) in 72 ms on 172.17.0.6 (executor 2) (4/5)
  20/09/17 11:08:06 INFO TaskSetManager: Finished task 4.0 in stage 5.0 (TID 
19) in 90 ms on 172.17.0.6 (executor 2) (5/5)
  20/09/17 11:08:06 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks 
have all completed, from pool 
  20/09/17 11:08:06 INFO DAGScheduler: ResultStage 5 (collect at 
/opt/spark/tests/decommissioning.py:50) finished in 0.448 s
  20/09/17 11:08:06 INFO DAGScheduler: Job 2 is finished. Cancelling potential 
speculative or zombie tasks for this job
  20/09/17 11:08:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 
5: Stage finished
  20/09/17 11:08:06 INFO DAGScheduler: Job 2 finished: collect at 
/opt/spark/tests/decommissioning.py:50, took 0.460430 s
  Final accumulator value is: 100
  Finished waiting, stopping Spark.
  20/09/17 11:08:06 INFO SparkUI: Stopped Spark web UI at 
http://spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:4040
  20/09/17 11:08:06 INFO KubernetesClusterSchedulerBackend: Shutting down all 
executors
  20/09/17 11:08:06 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
  20/09/17 11:08:06 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
  20/09/17 11:08:06 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
  20/09/17 11:08:06 INFO MemoryStore: MemoryStore cleared
  20/09/17 11:08:06 INFO BlockManager: BlockManager stopped
  20/09/17 11:08:06 INFO BlockManagerMaster: BlockManagerMaster stopped
  20/09/17 11:08:06 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
  20/09/17 11:08:06 INFO SparkContext: Successfully stopped SparkContext
  Done, exiting Python
  20/09/17 11:08:07 INFO ShutdownHookManager: Shutdown hook called
  20/09/17 11:08:07 INFO ShutdownHookManager: Deleting directory 
/var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/spark-d5ac2f3e-fe8b-4122-8026-807d265f3a69/pyspark-62a6caeb-b2e5-4b8f-8eb3-e7b2c5fb155c
  20/09/17 11:08:07 INFO ShutdownHookManager: Deleting directory 
/var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/spark-d5ac2f3e-fe8b-4122-8026-807d265f3a69
  20/09/17 11:08:07 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-b74e6224-3fa7-40d2-abc4-6622bd524e65
  " did not contain "Received decommission executor message" The application 
did not complete, did not find str Received decommission executor message. 
(KubernetesSuite.scala:387)
Run completed in 12 minutes, 29 seconds.
Total number of tests run: 18
Suites: completed 2, aborted 0
Tests: succeeded 17, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.094 s]
[INFO] Spark Project Tags ................................. SUCCESS [  8.630 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  4.062 s]
[INFO] Spark Project Networking ........................... SUCCESS [  5.891 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  3.059 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 10.869 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  3.432 s]
[INFO] Spark Project Core ................................. SUCCESS [02:26 min]
[INFO] Spark Project Kubernetes Integration Tests ......... FAILURE [15:13 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  18:21 min
[INFO] Finished at: 2020-09-17T04:10:08-07:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.0:test 
(integration-test) on project spark-kubernetes-integration-tests_2.12: There 
are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :spark-kubernetes-integration-tests_2.12
+ retcode3=1
+ kill -9 82255
+ minikube stop
:   Stopping "minikube" in kvm2 ...
-   "minikube" stopped.
/tmp/hudson6767824981271828433.sh: line 66: 82255 Killed                  
minikube mount ${PVC_TESTS_HOST_PATH}:${PVC_TESTS_VM_PATH} 
--9p-version=9p2000.L --gid=0 --uid=185
+ [[ 1 = 0 ]]
+ test_status=failure
+ /home/jenkins/bin/post_github_pr_comment.py
Attempting to post to Github...
 > Post successful.
+ rm -rf /tmp/tmp.epTpFHp0Dl
+ exit 1
Build step 'Execute shell' marked build as failure
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

Reply via email to