SNagarajan2243 opened a new issue, #1701:
URL: https://github.com/apache/datafusion-ballista/issues/1701

   Hi @andygrove 
   ## Setup
   
   - Ballista Scheduler: v53.0.0  
   - Ballista Executor: v53.0.0  
   - Deployment: Kubernetes (Rancher Desktop / K3s)  
   - Autoscaling: KEDA external scaler for Ballista  
   - Metric used: `pending_jobs` (from `get_metric_spec`)
   
   ---
   
   ## Observed behavior
   
   1. The KEDA scaler exposes only:
   ```
   pending_jobs (targetSize = 1)
   ```
   
   2. Under realistic load, `pending_jobs` remains almost always `0`.
   
   3. When multiple queries are submitted:
   - Scheduler quickly assigns tasks to available executors
   - Tasks are removed from the “pending” state almost immediately
   - Executors may still be actively running tasks, but the scheduler queue 
appears empty
   
   4. As a result, HPA calculation becomes:
   ```
   desiredReplicas = currentReplicas * (0 / 1) → 1 (minReplicaCount)
   ```
   → No scale-up occurs even under load
   
   ---
   
   ## Clarification on semantics
   
   My current understanding is:
   
   - `pending_jobs` represents jobs waiting in the scheduler queue before being 
assigned
   - Once a job is assigned to an executor, it is no longer considered “pending”
   - “Assigned” does not necessarily mean “running” on the executor yet
   
   If this is correct, then the metric may not reflect actual system saturation 
under fast scheduling conditions.
   
   ---
   
   ## Concern
   
   If scheduler assignment happens very quickly (which seems expected 
behavior), then:
   
   - The system can be fully saturated at the executor level  
   - But `pending_jobs` remains near zero  
   - Therefore, KEDA may never observe a backlog long enough to trigger 
scale-up  
   
   This makes autoscaling dependent on a transient scheduler backlog window 
that may not appear under normal workloads.
   
   ---
   
   ## Additional question (deployment / image usage)
   
   I also noticed that enabling the KEDA external scaler currently requires 
building the Ballista images from source.
   
   Is there any supported way to enable KEDA functionality using the official 
prebuilt Ballista images (for example via configuration flags or feature 
gates), without requiring a custom build?
   
   If not, is there a plan to publish official images with KEDA support 
included by default?
   
   ---
   
   ## Questions
   
   1. Is the interpretation of `pending_jobs` correct in terms of scheduler vs 
executor state?
   2. Does “assignment to executor” mark the end of the pending state by design?
   3. Is there an intended metric for autoscaling that better reflects executor 
load (e.g., running tasks or combined queue depth)?
   4. Is autoscaling via the Ballista KEDA external scaler intentionally driven 
only by scheduler-side `pending_jobs`?
   5. Is it expected that KEDA support requires building images from source, or 
can it be enabled with official images?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to