SNagarajan2243 opened a new issue, #1701: URL: https://github.com/apache/datafusion-ballista/issues/1701
Hi @andygrove ## Setup - Ballista Scheduler: v53.0.0 - Ballista Executor: v53.0.0 - Deployment: Kubernetes (Rancher Desktop / K3s) - Autoscaling: KEDA external scaler for Ballista - Metric used: `pending_jobs` (from `get_metric_spec`) --- ## Observed behavior 1. The KEDA scaler exposes only: ``` pending_jobs (targetSize = 1) ``` 2. Under realistic load, `pending_jobs` remains almost always `0`. 3. When multiple queries are submitted: - Scheduler quickly assigns tasks to available executors - Tasks are removed from the “pending” state almost immediately - Executors may still be actively running tasks, but the scheduler queue appears empty 4. As a result, HPA calculation becomes: ``` desiredReplicas = currentReplicas * (0 / 1) → 1 (minReplicaCount) ``` → No scale-up occurs even under load --- ## Clarification on semantics My current understanding is: - `pending_jobs` represents jobs waiting in the scheduler queue before being assigned - Once a job is assigned to an executor, it is no longer considered “pending” - “Assigned” does not necessarily mean “running” on the executor yet If this is correct, then the metric may not reflect actual system saturation under fast scheduling conditions. --- ## Concern If scheduler assignment happens very quickly (which seems expected behavior), then: - The system can be fully saturated at the executor level - But `pending_jobs` remains near zero - Therefore, KEDA may never observe a backlog long enough to trigger scale-up This makes autoscaling dependent on a transient scheduler backlog window that may not appear under normal workloads. --- ## Additional question (deployment / image usage) I also noticed that enabling the KEDA external scaler currently requires building the Ballista images from source. Is there any supported way to enable KEDA functionality using the official prebuilt Ballista images (for example via configuration flags or feature gates), without requiring a custom build? If not, is there a plan to publish official images with KEDA support included by default? --- ## Questions 1. Is the interpretation of `pending_jobs` correct in terms of scheduler vs executor state? 2. Does “assignment to executor” mark the end of the pending state by design? 3. Is there an intended metric for autoscaling that better reflects executor load (e.g., running tasks or combined queue depth)? 4. Is autoscaling via the Ballista KEDA external scaler intentionally driven only by scheduler-side `pending_jobs`? 5. Is it expected that KEDA support requires building images from source, or can it be enabled with official images? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
