[DISCUSSION] Proposal to fix Preemption use case without causing Preemption storm

2024-07-25 Thread Manikandan R
Hi Everyone,

As you are aware, We have been doing Preemption Hardening exercise to
address the gaps discovered in earlier releases. It is being tracked under
this  umbrella jira.

It contains a number of sub tasks. While working on this, we came across a use
case  from a member of
the community that seemed to be valid at first sight. On further
investigation, We realized that taking care of the same by making changes
in code would lead to Preemption Storm causing instability of the overall
functioning of the queues, which is not the desirable behaviour. Preemption
storm has been covered in the usage guide

doc. Along the way, we realized that the same use case can be addressed
without causing Preemption Storm and also preventing other cases from
causing storms. Solution has been discussed in this

document. Please refer to the 2A case in the document in detail as it is
the only case that would be addressed. Not only does this proposed solution
increase chances of freeing up resources for a potential candidate waiting
for resources in a starving queue for a particular situation, it also
follows the same principles of already working cases discussed in 3A & 3B.
3A & 3B are the cases where one can realize how preemption laws have been
followed strictly. Proposed solution for case 2A follows the same path.

Please share your thoughts on this.

Thanks,
Mani


[jira] [Created] (YUNIKORN-2769) Preemption fails between two siblings when preemptor is UG and parent is above OG

2024-07-25 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2769:
--

 Summary: Preemption fails between two siblings when preemptor is 
UG and parent is above OG
 Key: YUNIKORN-2769
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2769
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Test set up:

root.parent.parent1. guaranteed - vcores:2, usage - vcores:6. Over guaranteed.

root.parent.parent1.c1. no guarantee set. usage - vcores:6

root.parent.parent1.c2. guaranteed - vcores:1, pending - vcores: 1. Under 
guaranteed.

Expected o/p: Ask is pending on root.parent.parent1.c2 waiting for resources, 
expecting preemption to kick in to kill 1 pod running in c1.

But preemption is not yielding any results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.2 RC1

2024-07-25 Thread Manikandan R
+1 (Binding)

- Built images from source on Mac M1 MacOS Monterey (arm64) with go 1.22.4
- Verified the signatures
- Verified the licences and checksums
- Run the scheduler with a local kind cluster (version 1.28.0)
- Ran simple sleep jobs
- Verified REST APIs outputs, Web UI

Thanks,
Mani

On Wed, Jul 24, 2024 at 9:22 PM 黃咸誠  wrote:

> +1 (non-binding)
>
> Testing environment: Ubuntu 24.04, go1.22.5 linux/amd64
>
>   *
> Verified signature and checksum
>   *
> Built release on Ubuntu 24.04 x86
>   *
> e2e tests passed
>   *
> All make test passed
>   *
> Run simple spark jobs
>
> best regards,
> Hsien-Cheng(Ryan) Huang
> 
> From: Wilfred Spiegelenburg 
> Sent: July 24, 2024 9:24 AM
> To: dev@yunikorn.apache.org 
> Subject: Re: [VOTE] Release Apache YuniKorn 1.5.2 RC1
>
> +1 (binding)
>
> - Verified signatures and checksums
> - Verified LICENSE and NOTICE files
> - Verified release tarball structure
> - Built release on Mac Sonoma (ARM64):
>   - make image with go 1.22.5 and 1.21.12
> - Ran make test, all tests passed
> - Installed locally on Kind cluster (1.29.4)
>   - ran simple jobs
>   - created and deleted queues via config
>
> - REST interface checks:
>   - verified the SHA references in the cluster detail
>   - verified the build date is set correctly
> - checked REST endpoints and UI
>
> On Tue, 23 Jul 2024 at 22:22, Peter Bacsko  wrote:
> >
> > Quick correction: the proper URL is
> > https://dist.apache.org/repos/dist/dev/yunikorn/1.5.2-RC1/
> >
> >
> > On Tue, Jul 23, 2024 at 2:15 PM Peter Bacsko  wrote:
> >
> > > Hello everyone,
> > >
> > > I would like to call a vote for releasing Apache YuniKorn 1.5.2 RC1.
> > > This is a minor release which contains only bugfixes.
> > >
> > > The release artefacts have been uploaded here:
> > >   https://dist.apache.org/repos/dist/dev/yunikorn/1.5.1-RC2/
> > >
> > > My public key is located in the KEYS file:
> > >   https://downloads.apache.org//yunikorn/KEYS
> > >
> > > JIRA issues that have been resolved in this release:
> > >https://issues.apache.org/jira/issues/?filter=12353487
> > >
> > > The release (similarly to 1.5.1) solves a deadlock issue. If possible,
> > > test Yunikorn with workloads that put Yunikorn under stress (ie.
> > > thousands/tens of thousands of pods).
> > >
> > > Git tags for each component are as follows:
> > > yunikorn-scheduler-interface: v1.5.2-1
> > > yunikorn-core: v1.5.2-1
> > > yunikorn-k8shim: v1.5.2-1
> > > yunikorn-web: v1.5.2-1
> > > yunikorn-release: v1.5.2-1
> > >
> > > Once the release is voted on and approved, all repos will be tagged
> > > 1.5.2 for consistency.
> > >
> > > Please review and vote. The vote will be open for at least 72 hours
> > > and closes on Firday 26 Jul 2024 16:00:00 CEST.
> > >
> > > [ ] +1 Approve
> > > [ ] +0 No opinion
> > > [ ] -1 Disapprove (and the reason why)
> > >
> > >
> > > Thank you,
> > > Peter
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>
>


[jira] [Resolved] (YUNIKORN-2761) Explain preemption storm in usage doc

2024-07-24 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2761.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Explain preemption storm in usage doc
> -
>
> Key: YUNIKORN-2761
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2761
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: website
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2736) Preemption fails between two siblings even guaranteed set on one child and parent

2024-07-18 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2736.

Resolution: Won't Fix

> Preemption fails between two siblings even guaranteed set on one child and 
> parent
> -
>
> Key: YUNIKORN-2736
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2736
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Elad Dolev
>    Assignee: Manikandan R
>Priority: Major
>
> Test set up:
> root.parent.parent1 guaranteed - vcores:10, usage - vcores:2
> root.parent.parent1.c1 usage - vcores:2
> root.parent.parent1.c2 guaranteed - vcores:1
> root.parent.parent2.c3
> Expected o/p: Ask is pending on root.parent.parent1.c2 waiting for resources, 
> expecting preemption to kick in to kill 1 pod running in c1. 
> Please refer 
> [https://github.com/apache/yunikorn-core/pull/911/files#diff-7b65cc904d1c0a0395b409e51db43bfe65238432eb96b66831c950060feac911R860]
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2761) Explain preemption storm in usage doc

2024-07-17 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2761:
--

 Summary: Explain preemption storm in usage doc
 Key: YUNIKORN-2761
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2761
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: website
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname

2024-07-15 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2715.

Resolution: Fixed

> Handle special characters for params like queue, username & groupname
> -
>
> Key: YUNIKORN-2715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2715
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler, shim - kubernetes, test - e2e
>    Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> With more special characters coming in for queue, username etc there is a 
> need to ensure those characters has been handled at both sides. Clients need 
> to send those values using escaping methods. Receiver need to parse those 
> values using unescaping method to collect the actual values. Also need to add 
> test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2699) Preemption e2e tests fail in latest master

2024-07-06 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2699.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Preemption e2e tests fail in latest master
> --
>
> Key: YUNIKORN-2699
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2699
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Craig Condit
>    Assignee: Manikandan R
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Output:
>  
> {noformat}
> Preemption Verify_basic_preemption
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139
>   STEP: Creating development namespace: dev-anvkm @ 06/25/24 18:08:14.291
>   STEP: A queue uses resource more than the guaranteed value even after 
> removing one of the pods. The cluster doesn't have enough resource to deploy 
> a pod in another queue which uses resource less than the guaranteed value. @ 
> 06/25/24 18:08:15.301
>   STEP: Update root.sandbox1 and root.sandbox2 with guaranteed memory 4677M @ 
> 06/25/24 18:08:15.301
>   STEP: Port-forward the scheduler pod @ 06/25/24 18:08:15.302
> port-forward is already running  STEP: Enabling new scheduling config @ 
> 06/25/24 18:08:15.302
>   STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 
> 06/25/24 18:08:18.313
>   STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 
> 06/25/24 18:08:22.518
>   STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 
> 06/25/24 18:08:26.517
>   STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 
> 06/25/24 18:08:30.518
>   STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:08:38.517
>   [FAILED] in [It] - 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198
>  @ 06/25/24 18:08:38.718
>   Logging yk fullstatedump, spec: Verify_basic_preemption
>   Created log file: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykFullStateDump.json
>   Logging k8s cluster info, spec: Verify_basic_preemption
>   Created log file: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_k8sClusterInfo.txt
>   Logging yk container logs, spec: Verify_basic_preemption
>   Created log file: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykContainerLog.txt
>   STEP: Tear down namespace: dev-anvkm @ 06/25/24 18:08:39.235
>   STEP: Restoring YuniKorn configuration @ 06/25/24 18:08:40.118
>   STEP: Restoring the old config maps @ 06/25/24 18:08:40.119
> • [FAILED] [27.837 seconds]
> Preemption [It] Verify_basic_preemption
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139
>   [FAILED] One of the pods in root.sandbox1 should be preempted
>   Expected
>       : 1
>   to equal
>       : 2
>   In [It] at: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198
>  @ 06/25/24 18:08:38.718-- Preemption 
> Verify_preemption_on_priority_queue
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:333
>   STEP: Creating development namespace: dev-u0kt7 @ 06/25/24 18:10:24.975
>   STEP: A task can only preempt a task with lower or equal priority @ 
> 06/25/24 18:10:25.982
>   STEP: Update root.sandbox1, root.low-priority, root.high-priority with 
> guaranteed memory 4677M @ 06/25/24 18:10:25.982
>   STEP: Port-forward the scheduler pod @ 06/25/24 18:10:25.983
> port-forward is already running  STEP: Enabling new scheduling config @ 
> 06/25/24 18:10:25.983
>   STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 
> 06/25/24 18:10:28.99
>   STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 
> 06/25/24 18:10:32.791
>   STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 
> 06/25/24 18:10:35.792
>   STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 
> 06/25/24 18:10:38.792
>   STEP: Deploy the sleep pod sleepjob5 to the development namespace @ 
> 06/25/24 18:10:38.995
>   STEP: The sleep pod sleepjob4 can't be scheduled @ 06/25/24 18:10:39.194
>   STEP: The sleep pod sleepjob5 can be scheduled @ 06/25/24 18:10:41.392
>   STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:10:46.392
>   [FAILED] in [It] - 
> /h

[jira] [Resolved] (YUNIKORN-2716) Doc changes to escape query params in REST API

2024-07-05 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2716.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Doc changes to escape query params in REST API
> --
>
> Key: YUNIKORN-2716
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2716
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Need to make changes in REST API doc to escape the query params like queue 
> name, user name and group name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Reopened] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname

2024-07-04 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reopened YUNIKORN-2715:


> Handle special characters for params like queue, username & groupname
> -
>
> Key: YUNIKORN-2715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2715
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler, shim - kubernetes, test - e2e
>    Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> With more special characters coming in for queue, username etc there is a 
> need to ensure those characters has been handled at both sides. Clients need 
> to send those values using escaping methods. Receiver need to parse those 
> values using unescaping method to collect the actual values. Also need to add 
> test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname

2024-07-04 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2715.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Handle special characters for params like queue, username & groupname
> -
>
> Key: YUNIKORN-2715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2715
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler, shim - kubernetes, test - e2e
>    Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> With more special characters coming in for queue, username etc there is a 
> need to ensure those characters has been handled at both sides. Clients need 
> to send those values using escaping methods. Receiver need to parse those 
> values using unescaping method to collect the actual values. Also need to add 
> test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2723) Wordwrap queuename in QueuesV2 (Beta) page

2024-07-03 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2723:
--

 Summary: Wordwrap queuename in QueuesV2 (Beta) page
 Key: YUNIKORN-2723
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2723
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: webapp
Reporter: Manikandan R


Please see attached image (captured from Mac M1 chrome)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2720) Use createRequest() in handlers_test.go

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2720:
--

 Summary: Use createRequest() in handlers_test.go
 Key: YUNIKORN-2720
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2720
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R


Use createRequest() helper methods where ever applicable in handlers_test.go



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2719) Assert invalid group name in Get Group REST API

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2719:
--

 Summary: Assert invalid group name in Get Group REST API
 Key: YUNIKORN-2719
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2719
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R


Assert invalid group name in Get Group REST API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler

2024-07-02 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2717:
--

 Summary: Assert invalid queue name in get queue applications 
handler
 Key: YUNIKORN-2717
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2717
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R


Assert invalid queue name in TestGetQueueApplicationsHandler test method using 

assertQueueInvalid(). Also cleanup the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2716) Doc changes to escape query params in REST API

2024-07-01 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2716:
--

 Summary: Doc changes to escape query params in REST API
 Key: YUNIKORN-2716
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2716
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Need to make changes in REST API doc to escape the query params like queue 
name, user name and group name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname

2024-07-01 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2715:
--

 Summary: Handle special characters for params like queue, username 
& groupname
 Key: YUNIKORN-2715
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2715
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler, shim - kubernetes, test - e2e
Reporter: Manikandan R
Assignee: Manikandan R


With more special characters coming in for queue, username etc there is a need 
to ensure those characters has been handled at both sides. Clients need to send 
those values using escaping methods. Receiver need to parse those values using 
unescaping method to collect the actual values. Also need to add test for the 
same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2714) e2e test to ensure queue name with all allowed characters

2024-07-01 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2714:
--

 Summary: e2e test to ensure queue name with all allowed characters
 Key: YUNIKORN-2714
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2714
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes, test - e2e
Reporter: Manikandan R


Create a e2e test to ensure queue name with all allowed special characters goes 
through successfully. This is mainly required to confirm there is no breakage 
in REST API url because of special characters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2713) Use queue specific REST API directly

2024-07-01 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2713:
--

 Summary: Use queue specific REST API directly
 Key: YUNIKORN-2713
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2713
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes, test - e2e
Reporter: Manikandan R


There are some places in e2e tests using old way to fetching all queues for the 
given partition, then fetch queue specific info in next call. Instead, Queue 
info can be fetched directly in a single call. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2712) Missing specific param error for REST API

2024-07-01 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2712:
--

 Summary: Missing specific param error for REST API
 Key: YUNIKORN-2712
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2712
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R


Some REST API's throw "missing specific param" kind of errors, but not all. For 
example, user name is missing. Similarly, all mandatory parameters in other 
REST API's can follow the same pattern. It is very clear, rather than saying 
"doesn't exists" kind of error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2500) Use GetPreemptableResource, GetRemainingGuaranteedResource in Preemption flow

2024-06-25 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2500.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Use GetPreemptableResource, GetRemainingGuaranteedResource in Preemption flow
> -
>
> Key: YUNIKORN-2500
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2500
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Use GetPreemptableResource, GetRemainingGuaranteedResource instead of 
> IsAtorAbove, WithIn, GetRemaining Guaranteed in current preemption code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2660) Validate "value" param in fixed placement rule

2024-06-25 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2660.

Resolution: Won't Fix

https://issues.apache.org/jira/browse/YUNIKORN-2657 has taken care of this 
change.

> Validate "value" param in fixed placement rule
> --
>
> Key: YUNIKORN-2660
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2660
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Placement rule "fixed" type expects queue name as part of "value" param in 
> configuration. Param "value" value needs to be validated to ensure proper 
> valid queue name has been passed as part of config validation process to 
> catch the error (if any) very much in the beginning itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2657) Validate queue generated as part of the placement rules

2024-06-25 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2657.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

> Validate queue generated as part of the placement rules
> ---
>
> Key: YUNIKORN-2657
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2657
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Currently, there is no validation or restriction on the characters used in 
> queue name being generated as part of the placement rules. However, queues 
> specified in configuration are going through validation process. Need to do 
> similar validation checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2686) Validate user and group specified in filter config

2024-06-24 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2686.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

> Validate user and group specified in filter config
> --
>
> Key: YUNIKORN-2686
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2686
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Rule filter may have user or group to be allowed or denied. These users and 
> groups are being validated. Since user validation has been changed, need to 
> enhance the test to verify the Rule filter behaviour based on the new 
> validation characters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2686) Validate user and group specified in filter config

2024-06-20 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2686:
--

 Summary: Validate user and group specified in filter config
 Key: YUNIKORN-2686
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2686
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Rule filter may have user or group to be allowed or denied. These users and 
groups needs to be validated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2684) Unused Code cleanup

2024-06-20 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2684:
--

 Summary: Unused Code cleanup
 Key: YUNIKORN-2684
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2684
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R


There are few places in preemption.go which is doubtful of going through any 
situations. Need to assess those places by reviewing and/or writing appropriate 
test cases. Below link is one of such places which had come through reviews:

[https://github.com/apache/yunikorn-core/pull/830#discussion_r1636755074]

In case if we find those code to be dead, it needs to be cleaned up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2656) Validate user name

2024-06-12 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2656.

Fix Version/s: 1.6.0
   Resolution: Fixed

>  Validate user name
> ---
>
> Key: YUNIKORN-2656
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2656
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
>  Currently, there is no validation or restriction on the characters used in 
> user name specified as part of app submission. However, users specified in 
> limit settings are going through validation process. Need to do similar 
> validation checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2667) E2E test for Gang app originator pod changes after restart

2024-06-07 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2667:
--

 Summary: E2E test for Gang app originator pod changes after restart
 Key: YUNIKORN-2667
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2667
 Project: Apache YuniKorn
  Issue Type: Test
  Components: shim - kubernetes
Reporter: Manikandan R


https://issues.apache.org/jira/browse/YUNIKORN-2665 had covered unit test for 
the changes. Need to have a test to cover the full cycle - Before and after 
restart either by writing a e2e test or using mock scheduler kind of setup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2665) Gang app originator pod changes after restart

2024-06-05 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2665:
--

 Summary: Gang app originator pod changes after restart
 Key: YUNIKORN-2665
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2665
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Affects Versions: 1.5.0, 1.4.0, 1.3.0, 1.5.1, 1.5.2
Reporter: Manikandan R
Assignee: Manikandan R


Gang app choose the first pod (who created the app) as originator pod which 
becomes the real driver pod later. While processing gang app specifically after 
the placeholder creation and in the process of replacement, restart can lead to 
the below described incorrect behaviour:

During restore, there is no guarantee on the ordering of pods from K8s lister 
especially when all the pods created with the same second timestamp. k8s use 
the seconds based timestamp, which means all pods created with in same second 
has same timestamp. During this situation, which pod comes first from lister, 
YK designate it as originator pod. So, any placeholder could become the 
originator pod and actual originator pod has lost. This change could cause 
rippling effects and needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2660) Validate "value" param in fixed placement rule

2024-06-03 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2660:
--

 Summary: Validate "value" param in fixed placement rule
 Key: YUNIKORN-2660
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2660
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R


Placement rule "fixed" type expects queue name as part of "value" param in 
configuration. Param "value" value needs to be validated to ensure proper valid 
queue name has been passed as part of config validation process to catch the 
error (if any) very much in the beginning itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2657) Validate queue generated as part of the placement rules

2024-05-31 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2657:
--

 Summary: Validate queue generated as part of the placement rules
 Key: YUNIKORN-2657
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2657
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - common
Reporter: Manikandan R






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2656) Validate user name

2024-05-31 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2656:
--

 Summary:  Validate user name
 Key: YUNIKORN-2656
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2656
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - common
Reporter: Manikandan R






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-15 Thread Manikandan R
+1 (Binding)

- Built images from source on Mac M1 MacOS Monterey (arm64) with go 1.21.8
- Verified the signatures
- Verified the licences and checksums
- Run the scheduler with a local kind cluster (version 1.29.0)
- Ran simple sleep jobs
- Verified REST APIs outputs, Web UI

Thanks,
Mani

On Tue, May 14, 2024 at 9:41 PM Desai, Mit  wrote:

> +1 (non-binding)
>
>
>   *   Built release on MacOS Sonoma (arm64)
>   *   Installed locally on Kind Cluster (1.28)
>   *   Successfully ran make test
>   *   Ran sample sleep jobs
>
> Thank you, Peter, for your efforts in driving the release.
>
> - Mit Desai
>
> From: Peter Bacsko 
> Date: Friday, May 10, 2024 at 1:41 AM
> To: dev@yunikorn.apache.org 
> Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
> Hello everyone,
>
> I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
> This is a minor release which contains only bugfixes.
>
> The release artefacts have been uploaded here:
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668929112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=DjD5Z%2BWZJwP%2Brya2vzsYf%2BMawgZ%2B57Uc6ksy6daaOLk%3D=0
> 
>
> My public key is located in the KEYS file:
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownloads.apache.org%2F%2Fyunikorn%2FKEYS=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668939209%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=bSdAxF2fZu4mbBCmWSAFCtUr3lN8Ok1j6wFG%2FjCExt8%3D=0
> 
>
> JIRA issues that have been resolved in this release:
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Ffilter%3D12353383=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668945621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=YXpRtzAMX1WVourp29T3sm6hWciTzJDOFhPtjKwNMM4%3D=0
> 
>
> The release solves a deadlock issue. If possible, test Yunikorn with
> workloads that put Yunikorn under stress (ie. thousands/tens of thousands
> of pods).
>
> Git tags for each component are as follows:
> yunikorn-scheduler-interface: v1.5.1-1
> yunikorn-core: v1.5.1-1
> yunikorn-k8shim: v1.5.1-1
> yunikorn-web: v1.5.1-1
> yunikorn-release: v1.5.1-1
>
> Once the release is voted on and approved, all repos will be tagged
> 1.5.1 for consistency.
>
> Please review and vote. The vote will be open for at least 96 hours
> and closes on Tuesday 14 May 2024, 20:00:00 CEST.
>
> [ ] +1 Approve
> [ ] +0 No opinion
> [ ] -1 Disapprove (and the reason why)
>
>
> Thank you,
> Peter
>


[DISCUSSION] Preemption Hardening

2024-05-13 Thread Manikandan R
Hi Everyone,

After preemption feature release, we have been facing some issues with its
behaviour especially with extra resource types, killing unnecessary
victims, etc.

Hence, Preemption Harding umbrella jira
https://issues.apache.org/jira/browse/YUNIKORN-2493 has been filed. I've
been working on this for a while. I've written a doc
https://docs.google.com/document/d/1nYtputEluP4Akf3CAu7DdGCfKW_WHJUD0rKjLVMJGj8/edit?usp=sharing
to explain the problem background, approach taken to solve these problems
etc. Some of the sub tasks have been completed and the whole Hardening
exercise is nearing completion. There is an important PR
https://github.com/apache/yunikorn-core/pull/830 containing the core
changes discussed in the doc pending review.

Please read the doc, review PR and share your feedback. Also go through the
other sub tasks as well. In case anyone has come across any issues or weird
behaviour, would like your extra attention on this whole exercise (either
by running your failed cases against the new code or sharing your problem
statement, revisiting the related open bugs filed by you etc) so that we
can iterate further and make the feature more stable.

Thank you.

Mani


[jira] [Resolved] (YUNIKORN-2570) Add test cases to break the current preemption flow

2024-05-13 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2570.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Add test cases to break the current preemption flow
> ---
>
> Key: YUNIKORN-2570
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2570
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Add various test cases to break the current preemption flow. These test would 
> fail now. Follow up jira's 
> [https://issues.apache.org/jira/browse/YUNIKORN-2500] should fix the problems 
> in current preemption flow so that these test cases should pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2570) Add test cases to break the current preemption flow

2024-04-18 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2570:
--

 Summary: Add test cases to break the current preemption flow
 Key: YUNIKORN-2570
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2570
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.6.0


Add various test cases to break the current preemption flow. These test would 
fail now. Follow up jira's should fix the problems in current preemption flow 
so that these test cases should pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2569) Helm upgrade behaviour

2024-04-18 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2569:
--

 Summary: Helm upgrade behaviour
 Key: YUNIKORN-2569
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2569
 Project: Apache YuniKorn
  Issue Type: Test
Reporter: Manikandan R


Need to test the Yunikorn upgrade behaviour through Helm.

For example, 

1. Create cluster using kind create.
2. Deploy old versions of Yunikorn (say, 1.2 or 1.3 or 1.4) using helm deploy.
3. Sanity checks to ensure deployed version is working as expected.
4. Upgrade YK version to the latest master (1.6) using helm upgrade.
5. Document the behaviour especially when there are any issues.

Repeat for each old versions (1.2, 1.3 and 1.4).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2507) Picking Victims should consider usage and max quota for queues at each level

2024-03-27 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2507.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

Merged to master

> Picking Victims should consider usage and max quota for queues at each level
> 
>
> Key: YUNIKORN-2507
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2507
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Queue setup: root.family.parent.child[1-2]
> Max res has been set on parent queue. Say, 10GB. Usage is slightly lesser. 
> Say, 9GB. Ask1 (say, 2 GB) had come in for Child1 Queue whose usage is 
> already lesser than its guaranteed quota. So there is a need for preemption. 
> In this case, fence selection/queue selection should not go outside the 
> parent queue hierarchy and queues in the same level as parent (parent queue's 
> siblings) should not be considered at all as accommodating the ask somewhere 
> in parent siblings hierarchy would violate the max resource quota of parent 
> queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2507) Picking Victims should consider usage and max quota for queues at each level

2024-03-20 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2507:
--

 Summary: Picking Victims should consider usage and max quota for 
queues at each level
 Key: YUNIKORN-2507
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2507
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Queue setup: root.family.parent.child[1-2]

Max res has been set on parent queue. Say, 10GB. Usage is slightly lesser. Say, 
9GB. Ask1 (say, 2 GB) had come in for Child1 Queue whose usage is already 
greater than its guaranteed quota (borrowed by some other child queues). So 
there is a need for preemption. In this case, fence selection/queue selection 
should not go outside the parent queue hierarchy and queues in the same level 
as parent (parent queue's siblings) should not be considered at all as 
accommodating the ask somewhere in parent siblings hierarchy would violate the 
max resource quota of parent queue.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2500) Use GetPreemptableResource, GetRemainingGuaranteedResource in Preemption flow

2024-03-19 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2500:
--

 Summary: Use GetPreemptableResource, 
GetRemainingGuaranteedResource in Preemption flow
 Key: YUNIKORN-2500
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2500
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Use GetPreemptableResource, GetRemainingGuaranteedResource instead of 
IsAtorAbove, WithIn, GetRemaining Guaranteed in current preemption code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2495) Remove App "Starting" state

2024-03-15 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2495:
--

 Summary: Remove App "Starting" state
 Key: YUNIKORN-2495
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2495
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Manikandan R
Assignee: Craig Condit


App Starting state is in use for while. Though it has ben introduced mainly as 
part of state aware app scheduling, all related code could be assessed and 
removed if it is no longer needed anywhere.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2484) Shim: Remove stateaware logic

2024-03-15 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2484.

Resolution: Fixed

Merged to master

> Shim: Remove stateaware logic
> -
>
> Key: YUNIKORN-2484
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2484
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2331) Add Preemption Queue tests

2024-03-15 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2331.

Resolution: Fixed

As mentioned earlier, there is whole lot of behavioural change and the way we 
uses new methods. Filed [YUNIKORN-2494 |#YUNIKORN-2494]to do those changes. As 
part of the same jira, test covered in this pr also has been covered. Hence, 
Closing this.

> Add Preemption Queue tests
> --
>
> Key: YUNIKORN-2331
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2331
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Added new Preemption Queue test suites to assert various Preemption behaviour 
> especially focussed on important methods like IsAtorAbove, IsWithIn & 
> GetRemaining. These tests required some changes in those methods as well. For 
> example, IsAtorAbove check only the child queue guaranteed resources, does 
> not take parent queues into consideration. Checks should happen all the way 
> upto the root similar to any other calculations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2494) Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation

2024-03-15 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2494:
--

 Summary: Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed 
resources calculation
 Key: YUNIKORN-2494
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2494
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - common
Reporter: Manikandan R
Assignee: Manikandan R


These 3 methods doesn't expose the actual guaranteed values and returns boolean 
value based on the calculation. There are cases, where these boolean values are 
not correct and also there is a need to know the actual guaranteed values. For 
example, How much is remaining in Guaranteed? How much can be preempted? etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2493) Preemption Hardening

2024-03-15 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2493:
--

 Summary: Preemption Hardening
 Key: YUNIKORN-2493
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2493
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2488) SI: Remove stateaware constants

2024-03-13 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2488.

Resolution: Fixed

> SI: Remove stateaware constants
> ---
>
> Key: YUNIKORN-2488
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2488
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2483) Core: Remove stateaware scheduling logic

2024-03-13 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2483.

Resolution: Fixed

Merged to master

> Core: Remove stateaware scheduling logic
> 
>
> Key: YUNIKORN-2483
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2483
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE]Release Apache YuniKorn 1.5.0 RC2

2024-03-11 Thread Manikandan R
+1 (Binding)

- Built images from source on Mac M1 MacOS Monterey (arm64) with go 1.21.8
- Verified the signatures
- Verified the licences and checksums
- Run the scheduler with a local kind cluster (version 1.28.0)
- Ran simple sleep jobs
- Verified REST APIs outputs, Web UI

Thanks,
Mani

On Mon, Mar 11, 2024 at 9:18 PM Peter Bacsko  wrote:

> +1
>
> Environment: Ubuntu 22.04 amd64
>
> - Checked signatures and checksums
> - Built from source
> - Installed on Minikube
> - Checked some API endpoints (batch API)
> - Ran sleep jobs
> - Checked the web UI
>
> I just found a minor thing regarding Minikube which I'll report soon.
>
> On Mon, Mar 11, 2024 at 8:22 AM brandboat  wrote:
>
> > +1 (non-binding)
> >
> > my environment:  ThinkPad Lenovo i9-13980HX 64GB, opensuse tumbleweed
> > - e2e tests passed in kind 1.29.1
> > - unit tests all passed
> > - test all web apis and looks fine
> >
> > Best regards,
> > Kuan Po Tseng (brandboat)
> >
> > On 2024/03/07 14:31:27 TingYao wrote:
> > > Hello everyone,
> > >
> > > I would like to call a vote for releasing Apache YuniKorn 1.5.0 RC2.
> > >
> > > The release artefacts have been uploaded here:
> > >   https://dist.apache.org/repos/dist/dev/yunikorn/1.5.0-RC2/
> > >
> > > My public key is located in the KEYS file:
> > >   https://downloads.apache.org//yunikorn/KEYS
> > >
> > > JIRA issues that have been resolved in this release:
> > >   https://issues.apache.org/jira/issues/?filter=12352958
> > >
> > > This release artifact build with go 1.21.8 to fix some CVEs issue.
> > > Compared to the RC1, the RC2 addresses several CVEs and memory leak
> > issues.
> > > Also remove reproducible build artifacts from draft release note.
> > Please
> > > read the draft release notes
> > > attached to this vote for further details.
> > >   https://github.com/apache/yunikorn-site/pull/405
> > >
> > > Git tags for each component are as follows:
> > > yunikorn-scheduler-interface: v1.5.0-1
> > > yunikorn-core: v1.5.0-3
> > > yunikorn-k8shim: v1.5.0-3
> > > yunikorn-web: v1.5.0-1
> > > yunikorn-release: v1.5.0-3
> > >
> > > Once the release is voted on and approved, all repos will be tagged
> > > 1.5.0 for consistency.
> > >
> > > Please review and vote. The vote will be open for at least 72 hours
> > > and closes on Sunday 10 March 2024, 15:00:00 UTC
> > >
> > > [ ] +1 Approve
> > > [ ] +0 No opinion
> > > [ ] -1 Disapprove (and the reason why)
> > >
> > > Thank you,
> > > Tingyao
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> > For additional commands, e-mail: dev-h...@yunikorn.apache.org
> >
> >
>


[jira] [Resolved] (YUNIKORN-2427) Use r-lock instead of rw-lock in user_tracker.go#getGroupForApp

2024-02-21 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2427.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master

> Use r-lock instead of rw-lock in user_tracker.go#getGroupForApp 
> 
>
> Key: YUNIKORN-2427
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2427
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Yu-Lin Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> see 
> https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/ugm/user_tracker.go#L103
> The function mutate nothing, so it is safe to use r-lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1544) [Umbrella] User and group quota enforcement - Phase 2

2024-02-21 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-1544.

Resolution: Fixed

> [Umbrella] User and group quota enforcement - Phase 2
> -
>
> Key: YUNIKORN-1544
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1544
> Project: Apache YuniKorn
>  Issue Type: Improvement
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2428) [Umbrella] User and group quota enforcement - Phase 3

2024-02-21 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2428:
--

 Summary: [Umbrella] User and group quota enforcement - Phase 3
 Key: YUNIKORN-2428
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2428
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2284) ERROR message when stopping Service context

2024-02-07 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2284.

 Fix Version/s: 1.5.0
Target Version: 1.5.0
Resolution: Fixed

> ERROR message when stopping Service context
> ---
>
> Key: YUNIKORN-2284
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2284
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> After YUNIKORN-2233, the scheduler core can be stopped. This causes an issue 
> inside the MockScheduler:
> {noformat}
> 2023-12-21T17:58:49.203+0100  INFOcore.scheduler.ugm  
> ugm/manager.go:136  Removing user from manager  {"user": "testuser"}
> ...
> 2023-12-21T17:58:59.209+0100  INFOcore.entrypoint 
> entrypoint/service_context.go:40ServiceContext stop all services
> ...
> 2023-12-21T17:58:59.211+0100  INFOcore.scheduler.partition
> scheduler/partition_manager.go:144  marking all queues for removal  
> {"partitionName": "[rm:123]default"}
> 2023-12-21T17:58:59.211+0100  INFOcore.scheduler.queue
> objects/queue.go:952marking managed queue for deletion  {"queue": 
> "root"}
> 2023-12-21T17:58:59.212+0100  INFOcore.scheduler.fsm  
> objects/object_state.go:81  object transition   {"object": "root", 
> "source": "Active", "destination": "Draining", "event": "Remove"}
> 2023-12-21T17:58:59.212+0100  INFOcore.scheduler.queue
> objects/queue.go:952marking managed queue for deletion  {"queue": 
> "root.singleleaf"}
> 2023-12-21T17:58:59.212+0100  INFOcore.scheduler.fsm  
> objects/object_state.go:81  object transition   {"object": 
> "root.singleleaf", "source": "Active", "destination": "Draining", "event": 
> "Remove"}
> 2023-12-21T17:58:59.212+0100  INFOcore.scheduler.partition
> scheduler/partition_manager.go:150  removing all applications from 
> partition{"numOfApps": 1, "partitionName": "[rm:123]default"}
> 2023-12-21T17:58:59.212+0100  INFOcore.scheduler.application  
> objects/application.go:608  ask removed successfully from application 
>   {"appID": "app-1", "ask": "", "pendingDelta": "map[memory:0 vcore:0]"}
> 2023-12-21T17:58:59.212+0100  INFOcore.scheduler.queue
> objects/queue.go:837Application completed and removed from queue
> {"queueName": "root.singleleaf", "applicationID": "app-1"}
> 2023-12-21T17:59:32.848+0100  ERROR   core.scheduler.ugm  
> ugm/manager.go:118  user tracker must be available in userTrackers map
>   {"user": "testuser"}
> github.com/apache/yunikorn-core/pkg/scheduler/ugm.(*Manager).DecreaseTrackedResource
>   /home/bacskop/repos/yunikorn-core/pkg/scheduler/ugm/manager.go:118
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).decUserResourceUsage
>   
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1654
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).RemoveAllAllocations
>   
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1843
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeApplication
>   /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition.go:388
> github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).remove
>   /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:156
> github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).Stop
>   /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:97
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).Stop
>   /home/bacskop/repos/yunikorn-core/pkg/scheduler/context.go:991
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).Stop
>   /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:217
> github.com/apache/yunikorn-core/pkg/entrypoint.(*ServiceContext).StopAll
>   /home/bacskop/repos/yunikorn-core/pkg/entrypoint/service_context.go:50
> github.com/apac

[jira] [Resolved] (YUNIKORN-2375) Move mocks to a central place

2024-02-01 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2375.

Resolution: Fixed

Merged to master

> Move mocks to a central place
> -
>
> Key: YUNIKORN-2375
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2375
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
>
> The ResourceManagerCallback API implementations are in a large number of 
> places for tests. The base implementation that is used often resides in 
> {{scheduler/tests}} this requires packages outside the scheduler to import 
> that one.
> There is no need for that and it could even cause circular references on 
> imports. Moving the base implementation and all mocks that we need into a top 
> level package would simplify the code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2333) Redundant min and max computation in resources.go

2024-01-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2333.

Resolution: Information Provided

Benchmark results are not good. So, we don't want to proceed with the pr 
changes further.

> Redundant min and max computation in resources.go
> -
>
> Key: YUNIKORN-2333
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2333
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Methods like `ComponentWiseMin` & `ComponentWiseMax` can avoid the redundant 
> min and max calculations for resource computed already. Not really sure about 
> the performance gain, but good to make this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2333) Redundant min and max computation in resources.go

2024-01-17 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2333:
--

 Summary: Redundant min and max computation in resources.go
 Key: YUNIKORN-2333
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2333
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Methods like `ComponentWiseMin` & `ComponentWiseMax` can avoid the redundant 
min and max calculations for resource computed already.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2209) Remove limit checks in QueueTracker

2024-01-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2209.

 Fix Version/s: 1.5.0
Target Version: 1.5.0
Resolution: Fixed

> Remove limit checks in QueueTracker
> ---
>
> Key: YUNIKORN-2209
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2209
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: Peter Bacsko
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> {{QueueTracker.increaseTrackedResource()}} contains code that is no longer 
> relevant and is a good candidate for removal.
> It verifies whether the increased resource is over certain limits. However, 
> this is not the responsibility of the tracker, at least not anymore. The 
> method returns a boolean which is no longer used by the application. 
> Worse, we ignore the increment calculation but perform the decrement part. 
> This results in a corrupted state. Even if we detect that limits are 
> violated, there's no reason to mess things up even further.
> It also has performance impacts. Lot of intermediate Resource objects are 
> created, eg. "finalResourceUsage", {{resources.NewResource()}} is called 
> multiple times. These all results in heap allocations and they immediately 
> become garbage as soon as the method returns. Actually after performing 
> YUNIKORN-2201, {{Manager.IncreaseTrackedResource()}} is a 1.5-2% contributor 
> to the overall heap and cpu usage. Not a massive save, but if it's easy to 
> gain a quick improvement, let's go for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2273) checkLimitMaxApplications fails if there is missing limit in a middle queue

2024-01-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2273.

Fix Version/s: 1.5.0
   Resolution: Fixed

> checkLimitMaxApplications fails if there is missing limit in a middle queue
> ---
>
> Key: YUNIKORN-2273
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2273
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> The checkLimitMaxApplications function rejects config which child limit is 
> large than parent limit. However, it doesn't cover all cases like child limit 
> is large than grandparent limit. Following is my test case:
> {noformat}
> func TestCheckLimitMaxApplicationsGrandLevel(t *testing.T) {
>     err := checkLimitMaxApplications(
>         QueueConfig{
>             Name: "root",
>             Limits: []Limit{
>                 {
>                     Limit:           "user1",
>                     Users:           []string{"user1"},
>                     MaxApplications: 100,
>                 },
>                 {
>                     Limit:           "user2",
>                     Users:           []string{"user2"},
>                     MaxApplications: 100,
>                 },
>             },
>             Queues: []QueueConfig{
>                 {
>                     Name: "parent",
>                     Limits: []Limit{
>                         {
>                             Limit:           "user1",
>                             Users:           []string{"user1"},
>                             MaxApplications: 50,
>                         },
>                     },
>                     Queues: []QueueConfig{
>                         {
>                             Name: "child",
>                             Limits: []Limit{
>                                 {
>                                     Limit:           "user1",
>                                     Users:           []string{"user1"},
>                                     MaxApplications: 10,
>                                 },
>                                 {
>                                     Limit:           "user2",
>                                     Users:           []string{"user2"},
>                                     MaxApplications: 150,
>                                 },
>                             },
>                         },
>                     },
>                 },
>             },
>         },
>         make(map[string]map[string]uint64),
>         make(map[string]map[string]uint64),
>         common.Empty,
>     )
>     assert.Equal(t, err != nil, true, "user2 cpu maxapplications in 
> root.parent.child should not be large than root") // this will fail
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2331) Add Preemption Queue tests

2024-01-17 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2331:
--

 Summary: Add Preemption Queue tests
 Key: YUNIKORN-2331
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2331
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


Added new Preemption Queue test suites to assert various Preemption behaviour 
especially focussed on important methods like IsAtorAbove, IsWithIn & 
GetRemaining. These tests required some changes in those methods as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2323) Soft Gang scheduling style is not working as expected

2024-01-10 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2323:
--

 Summary: Soft Gang scheduling style is not working as expected
 Key: YUNIKORN-2323
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2323
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Affects Versions: 1.4.0
Reporter: Manikandan R
Assignee: Manikandan R


For gang app, ResumeApplicationEvent would be set as part of 
timeoutPlaceholderProcessing process if needed. Which moves the app to running 
only when src is either new or accepted. A Gang App moves to the starting state 
once all placeholder ask have been added to the application. So, a situation 
wherein resume events trigger and doing the expected thing won't even arises. 
In addition, the app might have also transitioned into running state based on 
app start timer expiry (default is 5 mins). Without even being aware of current 
situations, timer moves the state to running which is not the right thing to do.
 
Ideally, in the worst case, a gang app should continue to run as a normal app 
but given the above scenarios, it doesn't happen
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2321) App Event use allocKey as object ID instead of app ID

2024-01-09 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2321:
--

 Summary: App Event use allocKey as object ID instead of app ID
 Key: YUNIKORN-2321
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2321
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Affects Versions: 1.4.0
Reporter: Manikandan R
Assignee: Manikandan R


App event _sendPlaceholderLargerEvent_ use allocKey as object ID instead of app 
ID and vice versa. While shim parsing this app event before publish, it tries 
to fetch app details using object id and throws "task event is not published 
because task is not found".

Even _sendAppDoesNotFitEvent_ does the same thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2318) Flaky test TestTimeoutPlaceholderAllocReleased

2024-01-08 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2318.

 Fix Version/s: 1.5.0
Target Version: 1.5.0
Resolution: Fixed

> Flaky test TestTimeoutPlaceholderAllocReleased
> --
>
> Key: YUNIKORN-2318
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2318
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler, test - unit
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Test case TestTimeoutPlaceholderAllocReleased can fail with the following:
> {noformat}
> 2024-01-08T05:36:30.6545422Z --- FAIL: TestTimeoutPlaceholderAllocReleased 
> (0.08s)
> 2024-01-08T05:36:30.6547474Z application_test.go:1502: assertion failed: 
> app.getPlaceholderTimer() is nil: Placeholder timer should be initiated after 
> the first placeholder allocation
> {noformat}
> The placeholder timeout is too low (5 msec). We need to use a higher value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2307) Fix codecoverage warnings in UGM

2024-01-04 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2307:
--

 Summary: Fix codecoverage warnings in UGM
 Key: YUNIKORN-2307
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2307
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R


Need to fix code coverage using warnings to free up from UGM completely and 
avoid seeing those repetitive warnings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2306) Use asserts instead of If based checks in UGM

2024-01-04 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2306:
--

 Summary: Use asserts instead of If based checks in UGM
 Key: YUNIKORN-2306
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2306
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R


UGM module tests has lot of If based checks. We need to move those to assert 
based checks. Also, Increase calls cannot be used anymore to validate the 
negative case (user is not allowed to run) because of the recent changes. 
Instead, headroom or canrunapp methods should be used appropriately.

Please refer the discussion:

https://github.com/apache/yunikorn-core/pull/758#discussion_r1441385897



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2270) GPU Preemption is not triggered as expected when all available GPUs are used

2023-12-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2270.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master

> GPU Preemption is not triggered as expected when all available GPUs are used
> 
>
> Key: YUNIKORN-2270
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2270
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> I am testing an important scenario of preemption for GPU. The design a 
> scenario is like the following:
> queue structure is pretty simple:
> {code}
> root.a (min=100, max=300)
> root.b (min=0, max=300)
> {code}
> the cluster has a total of 300 GPUs available, no autoscaling. Reproducing 
> steps:
> 1. Create 600 pods in root.b queue, each needs 1 GPU. This will consume all 
> 300 GPUs available in the cluster, and 300 pods pending
> 2. Create 100 pods in root.a queue, each needs 1 GPU. The expectation is 
> queue a will preempt 100 GPU from queue b reach the guarantee. 
> observation: a small number of pods preempted resources from queue b got 
> started on queue a, the result is not stable. it could not reach guaranteed 
> resources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2279) requiredNodePreemptor may preempt ask acquired resources through normal preemption

2023-12-20 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2279:
--

 Summary: requiredNodePreemptor may preempt ask acquired resources 
through normal preemption
 Key: YUNIKORN-2279
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2279
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R


A ask (preemptor) acquired resources through normal preemption process may be 
preempted by requiredNodePreemptor to free up spaces for daemon sets. It may 
happen once ask (preemptor) starts running or reserved on the node. So, whole 
preemption effort in making up spaces for this ask (preemptor) won't be 
beneficial as expected.

Currently requiredNodePreemptor skip only daemon sets while choosing victims 
and can cancel any reservations made on the node to acquire resources. So, we 
need to differentiate the "normal" reservations from "preempted" reservations 
to help the requiredNode preemptor to choose the victims. In addition, node 
having "preempted" reservations can be skipped for normal preemption process 
because node already has something important to run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2172) Add test cases for different applications each with different group linkage

2023-12-18 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2172.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Add test cases for different applications each with different group linkage
> ---
>
> Key: YUNIKORN-2172
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2172
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Assignee: Yu-Lin Chen
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.5.0
>
>
> Add tests to cover the todo comment in the user_tracker_test.go file:
> {code:java}
> //nolint: todo test cases for different applications each with different 
> group linkage
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2119) Add check for parent queue user/group limit lower than child queue

2023-12-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2119.

Fix Version/s: 1.50
   Resolution: Fixed

Merged to master

> Add check for parent queue user/group limit lower than child queue
> --
>
> Key: YUNIKORN-2119
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2119
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.50
>
>
> After [calculating headroom for wildcard 
> cases|https://github.com/apache/yunikorn-core/pull/642] is merged, we need to 
> an edge case about wildcard limit:
> If a non-wildcard limit is not defined in parent queue, but it's defined in 
> child queue, we need to check whether it's large than wildcard limit in 
> parent queue.
>  
> For example:
> root (user wildcard max memory 100MB) -> parent (user1 max memory 50MB) (✅)
> root (user wildcard max memory 100MB) -> parent (user1 max memory 150MB) (❌)
> root (user wildcard max memory 100MB) -> parent (user1 max memory 50MB) -> 
> child (user2 max memory 150MB) (❌)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2176) Add test for user & group max resource changes

2023-12-13 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2176.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Add test for user & group max resource changes
> --
>
> Key: YUNIKORN-2176
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2176
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2211) Replace Allocation uuid with allocationID

2023-12-11 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2211.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Replace Allocation uuid with allocationID
> -
>
> Key: YUNIKORN-2211
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2211
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler, scheduler-interface, shim - kubernetes
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> As an follow up of https://issues.apache.org/jira/browse/YUNIKORN-2204,
> replace uuid of Allocation with allocationID in all places including SI, core 
> & shim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2240) Replace Allocation uuid with allocationID

2023-12-05 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2240:
--

 Summary: Replace Allocation uuid with allocationID
 Key: YUNIKORN-2240
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2240
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: webapp
Reporter: Manikandan R


As a follow up of [Replace Allocation uuid with 
allocationID|https://issues.apache.org/jira/browse/YUNIKORN-2211], Yunikorn Web 
also need to move away from using `uuid` to `AllocationID`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2232) Fix state aware scheduling e2e test

2023-12-04 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2232:
--

 Summary: Fix state aware scheduling e2e test
 Key: YUNIKORN-2232
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2232
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Manikandan R
Assignee: Manikandan R


state aware scheduling e2e test is failing after 
https://issues.apache.org/jira/browse/YUNIKORN-809. Need to set queue policy 
config as described in the mentioned jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [ANNOUNCE] YuniKorn community sync planning (English)

2023-12-03 Thread Manikandan R
Is there any possibility to move the schedule slightly? 7.00 PM PT or so?

On Mon, Dec 4, 2023 at 8:06 AM Wilfred Spiegelenburg 
wrote:

> Hi,
>
> The last community sync for this calendar year 2023 is planned for 13
> December (AMER) or 14 December (APAC) time. Please check the community
> sync doc [1] for exact timing etc.
>
> The meeting between Christmas and New Years will be cancelled as a lot
> of companies slow down over the holiday period.
>
> The first meeting in 2024 will be on 10 January (AMER) or 11 January
> (APAC) time. That will give us a number of syncs before Chinese New
> Year and a good run up to YuniKorn 1.5.0.
>
> Wilfred
>
> [1]
> https://docs.google.com/document/d/165gzC7uhcKc5XDWiMYSRKBiPQBy2tDtXADUPuhGlUa0/edit#heading=h.6wgz1xgh0qde
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>
>


[jira] [Resolved] (YUNIKORN-1956) Add wildcard user/group limit e2e tests

2023-12-03 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-1956.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Add wildcard user/group limit e2e tests
> ---
>
> Key: YUNIKORN-1956
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1956
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: test - e2e
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Specific user/group limits + wild card user/group limits to ensure order of 
> precedence:
> 5. set user limit & wild card user limit only and ensure it has been honoured.
> 5a) When the user limit is specified, it should be considered for that 
> specific user
> 5b) When the user limit is not specified, the wild card user limit should be 
> considered for that specific user
> 6. set group limit & wild card group limit only and ensure it has been 
> honoured.
> 6a) When the group limit is specified, it should be considered for that 
> specific group
> 6b) When the group limit is not specified, the wild card group limit should 
> be considered for that specific group
> 7. Repeat the above for max resources and max applications individually and 
> together as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2204) Use ask unique id for allocation

2023-11-30 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2204.

Resolution: Fixed

> Use ask unique id for allocation
> 
>
> Key: YUNIKORN-2204
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2204
> Project: Apache YuniKorn
>  Issue Type: Improvement
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available, release-notes
>
> As of now, every allocation has its own generated unique id using uuid(). Ask 
> has its own unique id (reused from pod uuid propagated through task object). 
> Because of this nature, difficult to map these id's while troubleshooting any 
> issues through logs. Proposal is to re use the ask unique id itself even for 
> Allocation too. Since there could be ask repeats, to avoid the duplicates, 
> suffixing unique number to the ask unique id would end up in having  unique 
> Allocation id for every ask repeat. This suffix starts with 0 and keeps 
> increasing by 1 for every repeat. For example, If ask key is "alloc-1", then 
> allocation unique id would be "alloc-1-0", "alloc-1-1", so on..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2213) Handle Allocation allocation id during recovery properly

2023-11-30 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2213:
--

 Summary: Handle Allocation allocation id during recovery properly
 Key: YUNIKORN-2213
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2213
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.5.0


As a part of recovery, Allocation allocationID modifies to same allocationID 
with different suffix no (incremented by 1). For example, alloc-0 to alloc-1. 
After YUNIKORN-2204, we are seeing this behaviour. Need to handle this properly 
based on the discussion at  
https://github.com/apache/yunikorn-core/pull/740#discussion_r1408666888



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2211) Replace Allocation uuid with allocationID

2023-11-30 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2211:
--

 Summary: Replace Allocation uuid with allocationID
 Key: YUNIKORN-2211
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2211
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.5.0


As an follow up of https://issues.apache.org/jira/browse/YUNIKORN-2204,

replace uuid of Allocation with allocationID in all places including SI, core & 
shim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2204) Use ask unique id for allocation

2023-11-28 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2204:
--

 Summary: Use ask unique id for allocation
 Key: YUNIKORN-2204
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2204
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.5.0


As of now, every allocation has its own generated unique id using uuid(). Ask 
has its own unique id (reused from pod uuid propagated through task object). 
Because of this nature, difficult to map these id's while troubleshooting any 
issues through logs. Proposal is to re use the ask unique id itself even for 
Allocation too. Since there could be ask repeats, to avoid the duplicates, 
suffixing unique number to the ask unique id would end up in having  unique 
Allocation id for every ask repeat. This suffix starts with 0 and keeps 
increasing by 1 for every repeat. For example, If ask key is "alloc-1", then 
allocation unique id would be "alloc-1-0", "alloc-1-1", so on..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2169) Fix queue resource update through configmaps

2023-11-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2169.

 Fix Version/s: 1.5.0
Target Version: 1.5.0
Resolution: Fixed

> Fix queue resource update through configmaps
> 
>
> Key: YUNIKORN-2169
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2169
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.4.0
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Updating queue resources through config maps is not making the changes as 
> expected.
> For example,
>  # {memory:5} to \{memory:10}
>  # {vcores:5, memory:10} to \{memory:10}
>  # {memory:5} to nil
> These changes are not really happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2173) change log level to INFO for adding allocation in queue

2023-11-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2173.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master

> change log level to INFO for adding allocation in queue
> ---
>
> Key: YUNIKORN-2173
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2173
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> The queue has debug logging only to show that an allocation is made on the 
> queue for an application. For troubleshooting we should change that to an 
> INFO level.
> Without that we only see the allocation being processed in the partition 
> which does not have the same detail as in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2177) Add e2e tests for any config updates.

2023-11-21 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2177:
--

 Summary: Add e2e tests for any config updates.
 Key: YUNIKORN-2177
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2177
 Project: Apache YuniKorn
  Issue Type: Test
  Components: shim - kubernetes
Reporter: Manikandan R


Add e2e test for all important config changes and ensure new values has been 
updated properly in corresponding specific objects.

For example,

Update queue max resources with different resource types, nil etc and ensure 
corresponding queue object has been updated properly (probably using REST 
API's).

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2176) Add test for user & group max resource changes

2023-11-21 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2176:
--

 Summary: Add test for user & group max resource changes
 Key: YUNIKORN-2176
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2176
 Project: Apache YuniKorn
  Issue Type: Test
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.5.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2169) Fix queue resource update through configmaps

2023-11-21 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2169:
--

 Summary: Fix queue resource update through configmaps
 Key: YUNIKORN-2169
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2169
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Affects Versions: 1.4.0
Reporter: Manikandan R
Assignee: Manikandan R


Updating queue resources through config maps is not making the changes as 
expected.

For example,
 # {memory:5} to \{memory:10}
 # {vcores:5, memory:10} to \{memory:10}
 # {memory:5} to nil

These changes are not really happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2164) Use ParseUint instead of ParseInt in getEvents()

2023-11-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2164.

 Fix Version/s: 1.5.0
Target Version: 1.5.0
Resolution: Fixed

Merged to master

> Use ParseUint instead of ParseInt in getEvents()
> 
>
> Key: YUNIKORN-2164
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2164
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> The "count" and "start" query parameters are parsed with strconv.ParseInt(). 
> It's more appropriate to use ParseUint() since we cast it to this type anyway.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.4.0 RC1

2023-11-15 Thread Manikandan R
+1

- Built images from source on Mac M1 (arm64) with go 1.21
- Verified the image names
- Verified signatures and checksums
- Run the scheduler with a local kind cluster (version 1.27.1), run a few
examples and worked fine
- Verified REST APIs outputs, Web UI

Thanks,
Mani

On Thu, Nov 16, 2023 at 8:31 AM Hsuan Zong Wu  wrote:

> +1
>
> - Verified signatures and checksums - Verified LICENSE and NOTICE files
> - Built release on Ubuntu 20.04 LTS & Mac Ventura (arm64)
> - Installed locally on Kubeadm cluster (v1.27.8) & Kind cluster (v1.27.1)
> - Executed unit tests
> - Use KWOK to create 150 nodes and deploy 100 applications with 50 tasks in
> each application.
>
> Weiwei Yang  於 2023年11月16日 週四 上午8:59寫道:
>
> > +1
> >
> > - Build images from source with go 1.21 on arm64
> > - Verified the image names are correct
> > - Run the scheduler with a local K8s cluster, run a few examples and
> worked
> > fine
> > - Verified the REST APIs outputs
> > - Reviewed README, NOTICE and LICENSE files
> >
> >
> > On Wed, Nov 15, 2023 at 1:42 AM Wilfred Spiegelenburg <
> wilfr...@apache.org
> > >
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I would like to call a vote for releasing Apache YuniKorn 1.4.0 RC1.
> > > It is a large release with 250+ jiras included.
> > > Please note that K8s v1.23 and earlier are no longer supported.
> > >
> > > The release artefacts have been uploaded here:
> > >   https://dist.apache.org/repos/dist/dev/yunikorn/1.4.0-RC1/
> > >
> > > My public key is located in the KEYS file:
> > >   https://downloads.apache.org//yunikorn/KEYS
> > >
> > > JIRA issues that have been resolved in this release:
> > >   https://issues.apache.org/jira/issues/?filter=12352769
> > >
> > > The release contains a number of incompatible changes that could
> > > impact the release verification. Please read the draft release notes
> > > attached to this vote for further details.
> > >
> > > Git tags for each component are as follows:
> > > yunikorn-scheduler-interface: v1.4.0-1
> > > yunikorn-core: v1.4.0-1
> > > yunikorn-k8shim: v1.4.0-2
> > > yunikorn-web: v1.4.0-2
> > > yunikorn-release: v1.4.0-1
> > >
> > > Once the release is voted on and approved, all repos will be tagged
> > > 1.4.0 for consistency.
> > >
> > > Please review and vote. The vote will be open for at least 72 hours
> > > and closes on Saturday 18 November 2023, 10:00:00 UTC
> > >
> > > [ ] +1 Approve
> > > [ ] +0 No opinion
> > > [ ] -1 Disapprove (and the reason why)
> > >
> > >
> > > Thank you,
> > > Wilfred
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> > > For additional commands, e-mail: dev-h...@yunikorn.apache.org
> > >
> > >
> >
>
>
> --
> 吳炫宗
>


Re: [ANNOUNCE] New PMC member: Rainie Li

2023-11-15 Thread Manikandan R
Congratulations Rainie Li.

On Thu, Nov 16, 2023 at 7:51 AM Wilfred Spiegelenburg 
wrote:

> The Project Management Committee (PMC) for Apache YuniKorn has invited
> Rainie Li to become a PMC member and we are pleased to announce
> that she has accepted.
>
> On behalf of the Apache YuniKorn PMC
> Wilfred
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>
>


[jira] [Resolved] (YUNIKORN-2136) limit max resource should be greater than zero

2023-11-15 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2136.

Resolution: Fixed

> limit max resource should be greater than zero
> --
>
> Key: YUNIKORN-2136
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2136
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
>
> Limit max resource should be validated using 
> resources.strictlygreaterthanzero() check to ensure that we are not allowing 
> resources with zero values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2130) checkLimitResource fails if there is missing limit in a middle queue

2023-11-15 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2130.

Resolution: Fixed

> checkLimitResource fails if there is missing limit in a middle queue
> 
>
> Key: YUNIKORN-2130
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2130
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
>
> The checkLimitResource function rejects config which child limit is large 
> than parent limit. However, it doesn't cover all cases like child limit is 
> large than grandparent limit. Following is my test case:
> {noformat}
> func TestCheckLimitResourceGrandLevel(t *testing.T) {
>     err := checkLimitResource(
>         QueueConfig{
>             Name: "root",
>             Limits: []Limit{
>                 {
>                     Limit: "user1",
>                     Users: []string{"user1"},
>                     MaxResources: map[string]string{
>                         "cpu": "100",
>                     },
>                 },
>                 {
>                     Limit: "user2",
>                     Users: []string{"user2"},
>                     MaxResources: map[string]string{
>                         "cpu": "100",
>                     },
>                 },
>             },
>             Queues: []QueueConfig{
>                 {
>                     Name: "parent",
>                     Limits: []Limit{
>                         {
>                             Limit: "user1",
>                             Users: []string{"user1"},
>                             MaxResources: map[string]string{
>                                 "cpu": "50",
>                             },
>                         },
>                     },
>                     Queues: []QueueConfig{
>                         {
>                             Name: "child",
>                             Limits: []Limit{
>                                 {
>                                     Limit: "user1",
>                                     Users: []string{"user1"},
>                                     MaxResources: map[string]string{
>                                         "cpu": "10",
>                                     },
>                                 },
>                                 {
>                                     Limit: "user2",
>                                     Users: []string{"user2"},
>                                     MaxResources: map[string]string{
>                                         "cpu": "150",
>                                     },
>                                 },
>                             },
>                         },
>                     },
>                 },
>             },
>         },
>         make(map[string]map[string]*resources.Resource),
>         make(map[string]map[string]*resources.Resource),
>         common.Empty,
>     )
>     assert.Equal(t, err != nil, true, "user2 cpu maxresources in 
> root.parent.child should not be large than root") // this will fail
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2139) Move queuepath split from user and group trackers to manager.go

2023-11-09 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2139:
--

 Summary: Move queuepath split from user and group trackers to 
manager.go
 Key: YUNIKORN-2139
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2139
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.4.0


Instead of doing strings.split on queue path all over the places, keep it in 
one place (manager.go) for ease of maintanence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2134) Use nil resource instead of NewResource()

2023-11-09 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2134.

Resolution: Fixed

> Use nil resource instead of NewResource()
> -
>
> Key: YUNIKORN-2134
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2134
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>    Reporter: Manikandan R
>    Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>
> UGM module use resources.NewResource() to initialize resourceUsage and 
> maxResources variables where ever necessary. All follow up depending on these 
> variables need to compare with resources.NewResource() for empty check. It is 
> expensive and can be avoided initializing it to nil.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2136) limit max resource should be greater than zero

2023-11-09 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2136:
--

 Summary: limit max resource should be greater than zero
 Key: YUNIKORN-2136
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2136
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Manikandan R
 Fix For: 1.4.0


Limit max resource should be validated using 
resources.strictlygreaterthanzero() check to ensure that we are not allowing 
resources with zero values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2134) Use nil resource instead of NewResource()

2023-11-08 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2134:
--

 Summary: Use nil resource instead of NewResource()
 Key: YUNIKORN-2134
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2134
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.4.0


UGM module use resources.NewResource() to initialize resourceUsage and 
maxResources variables where ever necessary. All follow up depending on these 
variables need to compare with resources.NewResource() for empty check. It is 
expensive and can be avoided initializing it to nil.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2133) Incorrect app used resource in state dump

2023-11-08 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-2133:
--

 Summary: Incorrect app used resource in state dump
 Key: YUNIKORN-2133
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2133
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Affects Versions: 1.4.0
Reporter: Manikandan R


While troubleshooting issues using state dump, observed the following issues:
 # App used resource doesn't include placeholder resource usage into account. 
Either we need to add new fields to cover ph's res usage or add it to the 
current used resource. At the end, sum of all app used resource should match 
with the queue used resource.
 # Placeholder data section is empty but can see one placeholder in 
allocations. Need to understanding why this section is not getting initialized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2125) Remove literals from handlers_test.go

2023-11-06 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2125.

 Fix Version/s: 1.4.0
Target Version: 1.4.0
Resolution: Fixed

> Remove literals from handlers_test.go  
> ---
>
> Key: YUNIKORN-2125
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2125
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Michael Akinyemi
>Assignee: Michael Akinyemi
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 1.4.0
>
>
> [https://github.com/apache/yunikorn-core/blob/master/pkg/webservice/handlers_test.go]
> While going through the handler_test i noticed some of the functions use 
> constant variables for the returned error messages (which exist within the 
> webservice package), while others use literal strings. We should replace the 
> literals that have constant values. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2107) Allow preemption to be disabled globally

2023-11-06 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2107.

Fix Version/s: 1.4.0
   Resolution: Fixed

> Allow preemption to be disabled globally
> 
>
> Key: YUNIKORN-2107
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2107
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>
> Currently, there is no way to disable preemption on a global basis. As 
> preemption still has some know issues, it would be useful to allow a global 
> flag to disable it.
> We can bring back the old preemption configuration but change the default to 
> be enabled for backwards compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2122) remove QueueDAOInfo and QueueCapacity

2023-11-06 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2122.

Fix Version/s: 1.4.0
   Resolution: Fixed

Thank you [~vinayakhegde] for your contributions.

> remove QueueDAOInfo and QueueCapacity
> -
>
> Key: YUNIKORN-2122
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2122
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Ted Lin
>Assignee: Vinayak Hegde
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.4.0
>
>
> Code cleanup: dead code removed
> remove QueueDAOInfo and QueueCapacity in queue_info.go
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1944) Wildcard user limit settings are not honoured by applications

2023-11-03 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-1944.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Wildcard user limit settings are not honoured by applications
> -
>
> Key: YUNIKORN-1944
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1944
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Rajesh Kanhaiya Lal
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
> Attachments: test_pod.yaml, test_pod2.yaml, yunikorn-configs.yaml
>
>
> Hi Team,
> The Wildcard setting for the Users on Applications is not applying and quota 
> enforcement checks are failing.
> Example 
> PFA Yunikorn Config and Pod Spec
> [^yunikorn-configs.yaml] [^test_pod.yaml]
> Scheduling two pods with some dummy users which are not mentioned in the 
> Limit configuration and Using wildcard for the Quota enforcement.
> The Limit object restrictions are not taking effect, set 1 max-application 
> setting for wildcard user and able to schedule more than 1 application 
> without any issue.
> Please let me know if something is required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2024) Clear limits configuration only upon actual config (Group) modification

2023-11-02 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2024.

Resolution: Fixed

Thanks [~douenergy] for your contribution.

> Clear limits configuration only upon actual config (Group) modification
> ---
>
> Key: YUNIKORN-2024
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2024
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dong-Lin Hsieh
>Assignee: Dong-Lin Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> YUNIKORN-1858 Handle group resource usage properly during config changes , 
> but  `internalProcessConfig` will clear  earlier limits every time. 
> We should clear earlier limits only upon the actual queue config changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2052) Log additional information on preemption

2023-10-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2052.

Fix Version/s: 1.4.0
   Resolution: Fixed

> Log additional information on preemption
> 
>
> Key: YUNIKORN-2052
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2052
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Rajesh Kanhaiya Lal
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.4.0
>
>
> Currently, when we preempt tasks, we log some information about the victim:
> {code:java}
> // SOURCE: pkg/scheduler/objects/preemption.go
> log.Log(log.SchedPreemption).Info("Preempting task",
> zap.String("applicationID", victim.GetApplicationID()),
> zap.String("allocationKey", victim.GetAllocationKey()),
> zap.String("nodeID", victim.GetNodeID()),
> zap.Stringer("resources", victim.GetAllocatedResource()))
> {code}
> We should log some additional information to make it easier to correlate the 
> preemption with the ask that triggered it:
>  - applicationID of ask => "askApplicationID"
>  - allocationKey of ask => "askAllocationKey"
>  - queue of ask => "askQueue"
>  - queue of victim => "victimQueue"
> For clarity we should also rename a few of the existing output items:
>  - "applicationID" => "victimApplicationID"
>  - "allocationKey" => "victimAllocationKey"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2014) Add `Extra` field example to `/ws/v1/config` result

2023-10-13 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2014.

Resolution: Fixed

> Add `Extra` field example to `/ws/v1/config` result
> ---
>
> Key: YUNIKORN-2014
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2014
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>
> We export extra config in YUNIKORN-1984. We also need to update document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2018) Update full state dump REST API doc

2023-10-12 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YUNIKORN-2018.

Fix Version/s: 1.4.0
   Resolution: Fixed

> Update full state dump REST API doc
> ---
>
> Key: YUNIKORN-2018
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2018
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Kuan-Po Tseng
>Assignee: Kuan-Po Tseng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>
> YUNIKORN-1983 add configs to fullstatedump, need to mention the config part 
> in doc
> https://yunikorn.apache.org/docs/next/api/scheduler/#retrieve-full-state-dump



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



  1   2   3   4   >