[jira] [Resolved] (YUNIKORN-2507) Picking Victims should consider usage and max quota for queues at each level

2024-03-27 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2507.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

Merged to master

> Picking Victims should consider usage and max quota for queues at each level
> 
>
> Key: YUNIKORN-2507
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2507
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Queue setup: root.family.parent.child[1-2]
> Max res has been set on parent queue. Say, 10GB. Usage is slightly lesser. 
> Say, 9GB. Ask1 (say, 2 GB) had come in for Child1 Queue whose usage is 
> already lesser than its guaranteed quota. So there is a need for preemption. 
> In this case, fence selection/queue selection should not go outside the 
> parent queue hierarchy and queues in the same level as parent (parent queue's 
> siblings) should not be considered at all as accommodating the ask somewhere 
> in parent siblings hierarchy would violate the max resource quota of parent 
> queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2520) PVC errors in AssumePod() is not handled properly

2024-03-27 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2520:
--

 Summary: PVC errors in AssumePod() is not handled properly
 Key: YUNIKORN-2520
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2520
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Peter Bacsko


When there is an error caused by a volume operation in {{{}AssumePod(){}}}, the 
allocation on core side will not be removed.

Although we check the result from UpdateAllocation, the error handling is just 
logging:
{noformat}
if err := callback.UpdateAllocation(response); err != nil {
rmp.handleUpdateResponseError(rmID, err)
}
...

func (rmp *RMProxy) handleUpdateResponseError(rmID string, err error) {
log.Log(log.RMProxy).Error("failed to handle response",
   zap.String("rmID", rmID),
   zap.Error(err))
}{noformat}
I suggest moving volume-related code to {{{}Task.postTaskAllocated{}}}. In this 
case, the task will transition to "Failed" state and we'll have allocationID 
available, so we can release both the ask and the allocation:
{noformat}
func (task *Task) releaseAllocation() {
...
var releaseRequest *si.AllocationRequest
s := TaskStates()
switch task.GetTaskState() {
case s.New, s.Pending, s.Scheduling, s.Rejected:
releaseRequest = common.CreateReleaseAskRequestForTask(
task.applicationID, task.taskID, 
task.application.partition)  <-- release ask + allocation if possible
default:
if task.allocationID == "" {
... log error ...
return
}
releaseRequest = 
common.CreateReleaseAllocationRequestForTask(
task.applicationID, task.taskID, 
task.allocationID, task.application.partition, task.terminationType)
}
...{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2519) Remove bypass ACL check from placement rules

2024-03-27 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2519:
---

 Summary: Remove bypass ACL check from placement rules
 Key: YUNIKORN-2519
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2519
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Instead of returning a flag to not bypass the ACL check by all rules except for 
the recovery rule special case the recovery rule to bypass checks.

The recovery queue is created without ACLs, quota and is always a leaf queue. 
The only rule that can return the recovery queue is the recovery rule which is 
the last one in the list.

Use all these facts to simplify the placement processing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2518) Allow recovery queue in REST requests

2024-03-27 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2518:
---

 Summary: Allow recovery queue in REST requests
 Key: YUNIKORN-2518
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2518
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


The current checks for the REST requests that require a queue path to be 
provided prevent looking at the {{root.@recover@}} queue.

The validator filters the queue names which makes it impossible to check if the 
queue has any running applications or pod after initialisation using the REST 
requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)