[jira] [Created] (YUNIKORN-2723) Wordwrap queuename in QueuesV2 (Beta) page

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2723:
--

 Summary: Wordwrap queuename in QueuesV2 (Beta) page
 Key: YUNIKORN-2723
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2723
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: webapp
Reporter: Manikandan R


Please see attached image (captured from Mac M1 chrome)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-07-02 Thread Xi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862661#comment-17862661
 ] 

Xi Chen edited comment on YUNIKORN-2629 at 7/3/24 6:00 AM:
---

[~pbacsko] Hi, I don't know if this is the right place, but we've found a 
potential new issue of deadlock. It happened 2 times during the past week in 
one of our production environment. We are using different queues for different 
namespaces, and binpacking policy enabled. The deadlock detection output is 
uploaded [^yunikorn-scheduler-20240627.log]. We are running the scheduler built 
from this branch with the fix for this ticket (early version of branch-1.5): 
[https://github.com/apache/yunikorn-k8shim/tree/fb4e3f11345e6a9866dfaea97770c94b9421807b]
 

Please let me know if this should be a new Jira ticket, thanks!


was (Author: jshmchenxi):
Hi, I don't know if this is the right place, but we've found a potential new 
issue of deadlock. It happened 2 times during the past week in one of our 
production environment. We are using different queues for different namespaces, 
and binpacking policy enabled. The deadlock detection output is uploaded 
[^yunikorn-scheduler-20240627.log]. We are running the scheduler built from 
this branch with the fix for this ticket (early version of branch-1.5): 
[https://github.com/apache/yunikorn-k8shim/tree/fb4e3f11345e6a9866dfaea97770c94b9421807b]
 

 

> Adding a node can result in a deadlock
> --
>
> Key: YUNIKORN-2629
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.2
>
> Attachments: updateNode_deadlock_trace.txt, 
> yunikorn-scheduler-20240627.log
>
>
> Adding a new node after Yunikorn state initialization can result in a 
> deadlock.
> The problem is that {{Context.addNode()}} holds a lock while we're waiting 
> for the {{NodeAccepted}} event:
> {noformat}
>dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, 
> func(event interface{}) {
>   nodeEvent, ok := event.(CachedSchedulerNodeEvent)
>   if !ok {
>   return
>   }
>   [...] removed for clarity
>   wg.Done()
>   })
>   defer dispatcher.UnregisterEventHandler(handlerID, 
> dispatcher.EventTypeNode)
>   if err := 
> ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{
>   Nodes: nodesToRegister,
>   RmID:  schedulerconf.GetSchedulerConf().ClusterID,
>   }); err != nil {
>   log.Log(log.ShimContext).Error("Failed to register nodes", 
> zap.Error(err))
>   return nil, err
>   }
>   // wait for all responses to accumulate
>   wg.Wait()  <--- shim gets stuck here
>  {noformat}
> If tasks are being processed, then the dispatcher will try to retrieve the 
> evend handler, which is returned from Context:
> {noformat}
> go func() {
>   for {
>   select {
>   case event := <-getDispatcher().eventChan:
>   switch v := event.(type) {
>   case events.TaskEvent:
>   getEventHandler(EventTypeTask)(v)  <--- 
> eventually calls Context.getTask()
>   case events.ApplicationEvent:
>   getEventHandler(EventTypeApp)(v)
>   case events.SchedulerNodeEvent:
>   getEventHandler(EventTypeNode)(v)  
> {noformat}
> Since {{addNode()}} is holding a write lock, the event processing loop gets 
> stuck, so {{registerNodes()}} will never progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-07-02 Thread Xi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862661#comment-17862661
 ] 

Xi Chen commented on YUNIKORN-2629:
---

Hi, I don't know if this is the right place, but we've found a potential new 
issue of deadlock. It happened 2 times during the past week in one of our 
production environment. We are using different queues for different namespaces, 
and binpacking policy enabled. The deadlock detection output is uploaded 
[^yunikorn-scheduler-20240627.log]. We are running the scheduler built from 
this branch with the fix for this ticket (early version of branch-1.5): 
[https://github.com/apache/yunikorn-k8shim/tree/fb4e3f11345e6a9866dfaea97770c94b9421807b]
 

 

> Adding a node can result in a deadlock
> --
>
> Key: YUNIKORN-2629
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.2
>
> Attachments: updateNode_deadlock_trace.txt, 
> yunikorn-scheduler-20240627.log
>
>
> Adding a new node after Yunikorn state initialization can result in a 
> deadlock.
> The problem is that {{Context.addNode()}} holds a lock while we're waiting 
> for the {{NodeAccepted}} event:
> {noformat}
>dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, 
> func(event interface{}) {
>   nodeEvent, ok := event.(CachedSchedulerNodeEvent)
>   if !ok {
>   return
>   }
>   [...] removed for clarity
>   wg.Done()
>   })
>   defer dispatcher.UnregisterEventHandler(handlerID, 
> dispatcher.EventTypeNode)
>   if err := 
> ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{
>   Nodes: nodesToRegister,
>   RmID:  schedulerconf.GetSchedulerConf().ClusterID,
>   }); err != nil {
>   log.Log(log.ShimContext).Error("Failed to register nodes", 
> zap.Error(err))
>   return nil, err
>   }
>   // wait for all responses to accumulate
>   wg.Wait()  <--- shim gets stuck here
>  {noformat}
> If tasks are being processed, then the dispatcher will try to retrieve the 
> evend handler, which is returned from Context:
> {noformat}
> go func() {
>   for {
>   select {
>   case event := <-getDispatcher().eventChan:
>   switch v := event.(type) {
>   case events.TaskEvent:
>   getEventHandler(EventTypeTask)(v)  <--- 
> eventually calls Context.getTask()
>   case events.ApplicationEvent:
>   getEventHandler(EventTypeApp)(v)
>   case events.SchedulerNodeEvent:
>   getEventHandler(EventTypeNode)(v)  
> {noformat}
> Since {{addNode()}} is holding a write lock, the event processing loop gets 
> stuck, so {{registerNodes()}} will never progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-07-02 Thread Xi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Chen updated YUNIKORN-2629:
--
Attachment: yunikorn-scheduler-20240627.log

> Adding a node can result in a deadlock
> --
>
> Key: YUNIKORN-2629
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.2
>
> Attachments: updateNode_deadlock_trace.txt, 
> yunikorn-scheduler-20240627.log
>
>
> Adding a new node after Yunikorn state initialization can result in a 
> deadlock.
> The problem is that {{Context.addNode()}} holds a lock while we're waiting 
> for the {{NodeAccepted}} event:
> {noformat}
>dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, 
> func(event interface{}) {
>   nodeEvent, ok := event.(CachedSchedulerNodeEvent)
>   if !ok {
>   return
>   }
>   [...] removed for clarity
>   wg.Done()
>   })
>   defer dispatcher.UnregisterEventHandler(handlerID, 
> dispatcher.EventTypeNode)
>   if err := 
> ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{
>   Nodes: nodesToRegister,
>   RmID:  schedulerconf.GetSchedulerConf().ClusterID,
>   }); err != nil {
>   log.Log(log.ShimContext).Error("Failed to register nodes", 
> zap.Error(err))
>   return nil, err
>   }
>   // wait for all responses to accumulate
>   wg.Wait()  <--- shim gets stuck here
>  {noformat}
> If tasks are being processed, then the dispatcher will try to retrieve the 
> evend handler, which is returned from Context:
> {noformat}
> go func() {
>   for {
>   select {
>   case event := <-getDispatcher().eventChan:
>   switch v := event.(type) {
>   case events.TaskEvent:
>   getEventHandler(EventTypeTask)(v)  <--- 
> eventually calls Context.getTask()
>   case events.ApplicationEvent:
>   getEventHandler(EventTypeApp)(v)
>   case events.SchedulerNodeEvent:
>   getEventHandler(EventTypeNode)(v)  
> {noformat}
> Since {{addNode()}} is holding a write lock, the event processing loop gets 
> stuck, so {{registerNodes()}} will never progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2667) E2E test for Gang app originator pod changes after restart

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2667:
-
Labels: pull-request-available  (was: )

> E2E test for Gang app originator pod changes after restart
> --
>
> Key: YUNIKORN-2667
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2667
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: Manikandan R
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/YUNIKORN-2665 had covered unit test for 
> the changes. Need to have a test to cover the full cycle - Before and after 
> restart either by writing a e2e test or using mock scheduler kind of setup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2722) Expose the IsOriginator flag in REST

2024-07-02 Thread Yu-Lin Chen (Jira)
Yu-Lin Chen created YUNIKORN-2722:
-

 Summary: Expose the IsOriginator flag in REST
 Key: YUNIKORN-2722
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2722
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Yu-Lin Chen
Assignee: Tzu-Hua Lan


The first real pod for each application is marked as originator. And it’s 
typically considered as driver/owner pod. This flag is propagated to core and 
impacts the preemption decision flow.

 

However, the current REST API doesn’t expose the originator flag. Exposing the 
flag will allow user to check which allocation is originator and will be 
beneficial for monitoring and troubleshooting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API

2024-07-02 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YUNIKORN-2712:
---
Description: 
Some REST API's throw "missing specific param" kind of errors, but not all. For 
example, user name is missing. Similarly, all mandatory parameters in other 
REST API's can follow the same pattern. It is very clear, rather than saying 
"doesn't exists" kind of error.

Suggestion given in 
[https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can 
be used as reference for implementation.

  was:Some REST API's throw "missing specific param" kind of errors, but not 
all. For example, user name is missing. Similarly, all mandatory parameters in 
other REST API's can follow the same pattern. It is very clear, rather than 
saying "doesn't exists" kind of error.


> Missing specific param error for REST API
> -
>
> Key: YUNIKORN-2712
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2712
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Major
>
> Some REST API's throw "missing specific param" kind of errors, but not all. 
> For example, user name is missing. Similarly, all mandatory parameters in 
> other REST API's can follow the same pattern. It is very clear, rather than 
> saying "doesn't exists" kind of error.
> Suggestion given in 
> [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can 
> be used as reference for implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2695) remove core dependency pkg/common

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2695:
-
Labels: pull-request-available  (was: )

> remove core dependency pkg/common
> -
>
> Key: YUNIKORN-2695
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2695
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: HUAN-IU LIOU
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2566) Remove AllocationAsk reference from askEvents

2024-07-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YUNIKORN-2566:
---
Component/s: core - scheduler

> Remove AllocationAsk reference from askEvents
> -
>
> Key: YUNIKORN-2566
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2566
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2564) [Umbrella] Move xxxEvents types to a different package

2024-07-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2564.

Fix Version/s: 1.6.0
   Resolution: Fixed

> [Umbrella] Move xxxEvents types to a different package
> --
>
> Key: YUNIKORN-2564
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2564
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.6.0
>
>
> There are several Events that can be moved to a different package:
> * queueEvents
> * applicationEvents
> * askEvents
> * nodeEvents
> There are numerous files in {{pkg/scheduler/objects}}. This is an opportunity 
> to clean it up a bit and move these under eg. 
> {{pkg/scheduler/objects/events}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2568) Move all xxxEvents types to objects/events

2024-07-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2568.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Move all xxxEvents types to objects/events
> --
>
> Key: YUNIKORN-2568
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2568
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



(yunikorn-core) branch master updated: [YUNIKORN-2568] Move all xxxEvents types to objects/events (#903)

2024-07-02 Thread pbacsko
This is an automated email from the ASF dual-hosted git repository.

pbacsko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-core.git


The following commit(s) were added to refs/heads/master by this push:
 new bd9a060c [YUNIKORN-2568] Move all xxxEvents types to objects/events 
(#903)
bd9a060c is described below

commit bd9a060c3097e5400e38e0c4d23d9d75fd056481
Author: Peter Bacsko 
AuthorDate: Tue Jul 2 23:26:10 2024 +0200

[YUNIKORN-2568] Move all xxxEvents types to objects/events (#903)

Closes: #903

Signed-off-by: Peter Bacsko 
---
 pkg/scheduler/objects/allocation_ask.go|  17 +-
 pkg/scheduler/objects/allocation_ask_test.go   |   5 +-
 pkg/scheduler/objects/application.go   |  29 ++--
 pkg/scheduler/objects/application_events.go| 148 
 pkg/scheduler/objects/application_state.go |   2 +-
 pkg/scheduler/objects/application_state_test.go|   7 +
 pkg/scheduler/objects/application_test.go  |   9 +-
 pkg/scheduler/objects/events/application_events.go | 150 
 .../{ => events}/application_events_test.go| 190 -
 pkg/scheduler/objects/{ => events}/ask_events.go   |  22 +--
 .../objects/{ => events}/ask_events_test.go|  52 +++---
 pkg/scheduler/objects/{ => events}/node_events.go  |  26 +--
 .../objects/{ => events}/node_events_test.go   |  78 -
 pkg/scheduler/objects/{ => events}/queue_events.go |  22 +--
 .../objects/{ => events}/queue_events_test.go  |  74 
 pkg/scheduler/objects/node.go  |  23 +--
 pkg/scheduler/objects/node_test.go |   5 +-
 pkg/scheduler/objects/queue.go |  33 ++--
 pkg/scheduler/objects/queue_test.go|  12 +-
 pkg/scheduler/objects/utilities_test.go|   3 +-
 20 files changed, 442 insertions(+), 465 deletions(-)

diff --git a/pkg/scheduler/objects/allocation_ask.go 
b/pkg/scheduler/objects/allocation_ask.go
index 341a19e9..09be0acb 100644
--- a/pkg/scheduler/objects/allocation_ask.go
+++ b/pkg/scheduler/objects/allocation_ask.go
@@ -30,6 +30,7 @@ import (
"github.com/apache/yunikorn-core/pkg/events"
"github.com/apache/yunikorn-core/pkg/locking"
"github.com/apache/yunikorn-core/pkg/log"
+   schedEvt "github.com/apache/yunikorn-core/pkg/scheduler/objects/events"
siCommon "github.com/apache/yunikorn-scheduler-interface/lib/go/common"
"github.com/apache/yunikorn-scheduler-interface/lib/go/si"
 )
@@ -59,7 +60,7 @@ type AllocationAsk struct {
scaleUpTriggeredbool  // whether this ask has triggered 
autoscaling or not
resKeyPerNode   map[string]string // reservation key for a given 
node
 
-   askEvents*askEvents
+   askEvents*schedEvt.AskEvents
userQuotaCheckFailed bool
headroomCheckFailed  bool
 
@@ -79,7 +80,7 @@ func NewAllocationAsk(allocationKey string, applicationID 
string, allocatedResou
allocatedResource: allocatedResource,
allocLog:  make(map[string]*AllocationLogEntry),
resKeyPerNode: make(map[string]string),
-   askEvents: newAskEvents(events.GetEventSystem()),
+   askEvents: 
schedEvt.NewAskEvents(events.GetEventSystem()),
}
aa.resKeyWithoutNode = reservationKeyWithoutNode(applicationID, 
allocationKey)
return aa
@@ -112,7 +113,7 @@ func NewAllocationAskFromSI(ask *si.AllocationAsk) 
*AllocationAsk {
originator:ask.Originator,
allocLog:  make(map[string]*AllocationLogEntry),
resKeyPerNode: make(map[string]string),
-   askEvents: newAskEvents(events.GetEventSystem()),
+   askEvents: 
schedEvt.NewAskEvents(events.GetEventSystem()),
}
// this is a safety check placeholder and task group name must be set 
as a combo
// order is important as task group can be set without placeholder but 
not the other way around
@@ -273,7 +274,7 @@ func (aa *AllocationAsk) LogAllocationFailure(message 
string, allocate bool) {
 }
 
 func (aa *AllocationAsk) SendPredicateFailedEvent(message string) {
-   aa.askEvents.sendPredicateFailed(aa.allocationKey, aa.applicationID, 
message, aa.GetAllocatedResource())
+   aa.askEvents.SendPredicateFailed(aa.allocationKey, aa.applicationID, 
message, aa.GetAllocatedResource())
 }
 
 // GetAllocationLog returns a list of log entries corresponding to allocation 
preconditions not being met
@@ -357,7 +358,7 @@ func (aa *AllocationAsk) setHeadroomCheckFailed(headroom 
*resources.Resource, qu
defer aa.Unlock()
if !aa.headroomCheckFailed {
aa.headroomCheckFailed = true
-   aa.askEvents.sendRequestExceedsQueueHeadroom(aa.allocationKey, 
aa.application

[jira] [Updated] (YUNIKORN-2703) Scheduler does not honor default queue setting from the ConfigMap

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2703:
-
Labels: pull-request-available  (was: )

> Scheduler does not honor default queue setting from the ConfigMap
> -
>
> Key: YUNIKORN-2703
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2703
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
>  Labels: pull-request-available
>
> YUNIKORN-1650 added an override for default queue name in the config map to 
> solve for the scenario where the provided placement rule is evaluated before 
> other rules.
> Scheduler also adds a default queue if the pod labels or annotations does not 
> define a queue name. Because this happens before the placement rules are 
> evaluated, we end up in the same situation of applications getting placed in 
> the default queue and ignoring all other placement rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2721) Improve template funtion's test coverage

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2721:
-
Labels: pull-request-available  (was: )

> Improve template funtion's test coverage
> 
>
> Key: YUNIKORN-2721
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2721
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2693) A Example doc of RayService management with Yunikorn

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2693:
-
Labels: pull-request-available  (was: )

> A Example doc of RayService management with Yunikorn
> 
>
> Key: YUNIKORN-2693
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2693
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chen Yu Teng
>Assignee: JunHong Peng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2715:
-
Labels: pull-request-available  (was: )

> Handle special characters for params like queue, username & groupname
> -
>
> Key: YUNIKORN-2715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2715
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler, shim - kubernetes, test - e2e
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> With more special characters coming in for queue, username etc there is a 
> need to ensure those characters has been handled at both sides. Clients need 
> to send those values using escaping methods. Receiver need to parse those 
> values using unescaping method to collect the actual values. Also need to add 
> test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2269) remove the USER_LABEL_KEY from docs

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2269:
-
Labels: pull-request-available  (was: )

> remove the USER_LABEL_KEY from docs
> ---
>
> Key: YUNIKORN-2269
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2269
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> core does not support USER_LABEL_KEY  after YUNIKORN-1405 got merged, so we 
> should remove it from docs.
> https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/#using-the-yunikornapacheorgusername-label
> {quote}
> The yunikorn.apache.org/username key can be customized by overriding the 
> default value using the USER_LABEL_KEYenv variable in the K8s Deployment. 
> This is particularly useful in scenarios where the user label is already 
> being added or if the label has to be modified for some secuirty reasons.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2704) Event publish errors out when predicates fail

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2704:
-
Labels: pull-request-available  (was: )

> Event publish errors out when predicates fail
> -
>
> Key: YUNIKORN-2704
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2704
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Mit Desai
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> I consistently see this error in the logs when events are published.
> I did put some debug logs and found that I only get it when the events for 
> untolerated taints are published.
> E0618 17:43:17.858946       1 event_broadcaster.go:270] "Server rejected 
> event (will not retry!)" err="Event \"<>.17da2a31072bb32f\" is 
> invalid: [action: Required value, reason: Required value]" 
> event="&Event\{ObjectMeta:{<>.17da2a31072bb32f  dpi-dev    0 
> 0001-01-01 00:00:00 + UTC   map[] map[] [] [] 
> []},EventTime:2024-06-18 17:43:17.857332069 + UTC 
> m=+84279.014490005,Series:nil,ReportingController:yunikorn,ReportingInstance:yunikorn-yunikorn-scheduler-59bdc88fdc-7h5bt,Action:,Reason:,Regarding:\{Pod
>  <> <> 5c90315c-a07d-4801-9ecc-baf61ee45f11 v1 
> 4323324038 },Related:nil,Note:Predicate failed for request 
> '5c90315c-a07d-4801-9ecc-baf61ee45f11' with message: 'node(s) had untolerated 
> taint \{<>: <>}',Type:Normal,DeprecatedSource:\{ 
> },DeprecatedFirstTimestamp:0001-01-01 00:00:00 + 
> UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 + UTC,DeprecatedCount:0,}"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2568) Move all xxxEvents types to objects/events

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2568:
-
Labels: pull-request-available  (was: )

> Move all xxxEvents types to objects/events
> --
>
> Key: YUNIKORN-2568
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2568
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2698:
-
Labels: pull-request-available  (was: )

> E2e tests for k8shim don't compile with latest core
> ---
>
> Key: YUNIKORN-2698
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2698
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2721) Improve template funtion's test coverage

2024-07-02 Thread JunHong Peng (Jira)
JunHong Peng created YUNIKORN-2721:
--

 Summary: Improve template funtion's test coverage
 Key: YUNIKORN-2721
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2721
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - common
Reporter: JunHong Peng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2720) Use createRequest() in handlers_test.go

2024-07-02 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YUNIKORN-2720:
---
Description: Use createRequest() helper methods where ever applicable in 
handlers_test.go. handlers_test.go is huge.  (was: Use createRequest() helper 
methods where ever applicable in handlers_test.go)

> Use createRequest() in handlers_test.go
> ---
>
> Key: YUNIKORN-2720
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2720
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Major
>
> Use createRequest() helper methods where ever applicable in handlers_test.go. 
> handlers_test.go is huge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2720) Use createRequest() in handlers_test.go

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2720:
--

 Summary: Use createRequest() in handlers_test.go
 Key: YUNIKORN-2720
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2720
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R


Use createRequest() helper methods where ever applicable in handlers_test.go



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2719) Assert invalid group name in Get Group REST API

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2719:
--

 Summary: Assert invalid group name in Get Group REST API
 Key: YUNIKORN-2719
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2719
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R


Assert invalid group name in Get Group REST API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2717:
--

 Summary: Assert invalid queue name in get queue applications 
handler
 Key: YUNIKORN-2717
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2717
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R


Assert invalid queue name in TestGetQueueApplicationsHandler test method using 

assertQueueInvalid(). Also cleanup the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org