[ 
https://issues.apache.org/jira/browse/SPARK-46701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hussein Ballout updated SPARK-46701:
------------------------------------
    Shepherd: Mohamad Haidar  (was: Garren Smith)

> Spark Cluster Crashing 
> -----------------------
>
>                 Key: SPARK-46701
>                 URL: https://issues.apache.org/jira/browse/SPARK-46701
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.4.0
>         Environment: Kubernetes: 1.27
> Apache Spark 3.4.0
>  
>            Reporter: Hussein Ballout
>            Priority: Critical
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I am getting these errors in the spark executors:
> 2024-01-12 03:26:17.887 WARN     
> [task-result-getter-2]:org.apache.spark.internal.Logging - Lost task 65.0 in 
> stage 79.0 (TID 10250) (10.1.208.60 executor 119): TaskKilled (Stage 
> cancelled: Job aborted due to stage failure: Exception while getting task 
> result: java.io.OptionalDataException)
> 2024-01-12 03:26:17.891 WARN     
> [task-result-getter-3]:org.apache.spark.internal.Logging - Lost task 69.0 in 
> stage 79.0 (TID 10263) (10.1.99.211 executor 72): TaskKilled (Stage 
> cancelled: Job aborted due to stage failure: Exception while getting task 
> result: java.io.OptionalDataException)
> 2024-01-12 03:26:17.893 WARN     
> [task-result-getter-0]:org.apache.spark.internal.Logging - Lost task 115.0 in 
> stage 79.0 (TID 10202) (10.1.236.96 executor 27): TaskKilled (Stage 
> cancelled: Job aborted due to stage failure: Exception while getting task 
> result: java.io.OptionalDataException)
> 2024-01-12 03:26:17.895 WARN     
> [task-result-getter-1]:org.apache.spark.internal.Logging - Lost task 4.0 in 
> stage 79.0 (TID 10231) (10.1.165.84 executor 80): TaskKilled (Stage 
> cancelled: Job aborted due to stage failure: Exception while getting task 
> result: java.io.OptionalDataException)
> 2024-01-12 03:26:17.897 WARN     
> [task-result-getter-2]:org.apache.spark.internal.Logging - Lost task 75.0 in 
> stage 79.0 (TID 10228) (10.1.6.211 executor 18): TaskKilled (Stage cancelled: 
> Job aborted due to stage failure: Exception while getting task result: 
> java.io.OptionalDataException)
> 2024-01-12 03:26:17.902 WARN     
> [task-result-getter-3]:org.apache.spark.internal.Logging - Lost task 102.0 in 
> stage 79.0 (TID 10285) (10.1.160.108 executor 53): TaskKilled (Stage 
> cancelled: Job aborted due to stage failure: Exception while getting task 
> result: java.io.OptionalDataException)
> 2024-01-12 03:27:13.092 ERROR    
> [dispatcher-CoarseGrainedScheduler]:org.apache.spark.internal.Logging - Lost 
> executor 117 on 10.1.197.197: 
> The executor with id 117 exited with exit code 50(Uncaught exception).
>  
>  
>  
> The API gave the following container statuses:
>  
>  
> container name: spark-kubernetes-executor
> container image: 
> ngxp-registry.service.lab.ngxp.cci.att.com:5000/nova/midlayer/midlayer-streaming-core:1.7.1
> container state: terminated
> container started at: 2024-01-12T03:03:46Z
> container finished at: 2024-01-12T03:27:12Z
> exit code: 50
> termination reason: Error
>       
> 2024-01-12 03:27:13.095 WARN     
> [dispatcher-CoarseGrainedScheduler]:org.apache.spark.internal.Logging - Lost 
> task 79.0 in stage 79.0 (TID 10305) (10.1.197.197 executor 117): 
> ExecutorLostFailure (executor 117 exited caused by one of the running tasks) 
> Reason: 
> The executor with id 117 exited with exit code 50(Uncaught exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to