[jira] [Updated] (SPARK-53007) Spark UI: Incorrect metrics reported after spark task failures and successful retries.

Harsha Gudladona (Jira) Tue, 29 Jul 2025 14:21:22 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-53007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Harsha Gudladona updated SPARK-53007:
-------------------------------------
    Description: 
Hello!,

We are running Hudi Delta Streamer on Spark sourcing data from Kafka. We have a 
case where the Spark UI shows negative RDD block counts and incorrect storage 
values upon intermittent task failures and successful retries. The metrics stay 
correct until it hits the first task failure and a subsequent retry. My first 
thought is that the status event queue on the listener bus on the driver is 
full, but JMX metrics show the dropped count as 0. I am unaware of any other 
ways to troubleshoot this, any help is appreciated. 

  was:
Hello!,

We are running Hudi Delta Streamer on Spark sourcing data from Kafka. We have a 
case where the Spark UI shows negative RDD block counts and incorrect storage 
values upon intermittent task failures and successful retries. The metrics stay 
correct until it hits the first task failure and a subsequent retry. My first 
thought is that the status event queue on the listener bus on the driver is 
full, but JMX metrics show the dropped count as 0. I am unaware of any other 
ways to troubleshoot this, any help is appreciated. 

 

!image-2025-07-29-17-04-51-074.png!

!image-2025-07-29-17-04-27-564.png!


> Spark UI: Incorrect metrics reported after spark task failures and successful 
> retries.
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-53007
>                 URL: https://issues.apache.org/jira/browse/SPARK-53007
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 3.4.1
>         Environment: Spark Version: 3.4.1
> Infra: Spark on EKS
> Operator: Kubeflow
>            Reporter: Harsha Gudladona
>            Priority: Major
>
> Hello!,
> We are running Hudi Delta Streamer on Spark sourcing data from Kafka. We have 
> a case where the Spark UI shows negative RDD block counts and incorrect 
> storage values upon intermittent task failures and successful retries. The 
> metrics stay correct until it hits the first task failure and a subsequent 
> retry. My first thought is that the status event queue on the listener bus on 
> the driver is full, but JMX metrics show the dropped count as 0. I am unaware 
> of any other ways to troubleshoot this, any help is appreciated. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-53007) Spark UI: Incorrect metrics reported after spark task failures and successful retries.

Reply via email to