[ 
https://issues.apache.org/jira/browse/YUNIKORN-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-3232:
--------------------------------------------
     Fix Version/s:     (was: 1.9.0)
    Target Version: 1.9.0
            Labels: latency performance preemption  (was: latency performance 
preemption pull-request-available)

> Preemption latency regression
> -----------------------------
>
>                 Key: YUNIKORN-3232
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3232
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Victor Zhou
>            Priority: Major
>              Labels: latency, performance, preemption
>
> *Summary*
> Preemption latency regression in 1.8:
> oversized queue snapshots are cloned twice in calculateVictimsByNode
>  
> *Description*
> After upgrading from 1.5 to 1.8, we observed a major preemption latency 
> increase in production-like workloads.
> In our environment:
> * total queues: 400+
> * queues that are actually valid victim candidates per cycle: typically 20–50
> observed preemption latency:
> * ~0.5s on 1.5
> * 10+ seconds on 1.8
> The latency is more noticeable as queue count grows.
>  
> *Root Cause*
> The preemption queue snapshot map becomes too large because it includes 
> queues that are already within guaranteed resource limits.
> Those queues should not be targeted as victims, and therefore should not be 
> included in the working snapshot set for victim selection.
> This oversized snapshot map is then cloned twice in calculateVictimsByNode 
> (first and second pass), amplifying overhead and causing significant latency 
> regression at scale.
>  
> *Fix / Improvement*
> Prune preemption snapshots to keep only relevant paths:
> * ask queue path and ancestors
> * victim-contributing leaf queues and ancestors
> * Remove leaf snapshots early when queue is within guaranteed limits



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to