[ 
https://issues.apache.org/jira/browse/SPARK-29562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29562.
-----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 26218
[https://github.com/apache/spark/pull/26218]

> SQLAppStatusListener metrics aggregation is slow and memory hungry
> ------------------------------------------------------------------
>
>                 Key: SPARK-29562
>                 URL: https://issues.apache.org/jira/browse/SPARK-29562
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Marcelo Masiero Vanzin
>            Assignee: Marcelo Masiero Vanzin
>            Priority: Major
>             Fix For: 3.0.0
>
>
> While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very 
> similar to what it was previously, so I'm sure this is even older.
> Long story short, the aggregation code 
> ({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take 
> a non-trivial amount of time with large queries, aside from using a ton of 
> memory.
> There are also cascading issues caused by that: since it's called from an 
> event handler, it can slow down event processing, causing events to be 
> dropped, which can cause listeners to miss important events that would tell 
> them to free up internal state (and, thus, memory).
> To given an anecdotal example, one app I looked at ran into the "events being 
> dropped" issue, which caused the listener to accumulate state for 100s of 
> live stages, even though most of them were already finished. That lead to a 
> few GB of memory being wasted due to finished stages that were still being 
> tracked.
> Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}} 
> and making it faster. We should look at the other issues (unblocking event 
> processing, cleaning up of stale data in listeners) separately.
> (I also remember someone in the past trying to fix something in this area, 
> but couldn't find a PR nor an open bug.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to