[jira] [Commented] (YUNIKORN-820) Update SI dependency in the core repo

2021-08-23 Thread Ting Yao,Huang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403522#comment-17403522
 ] 

Ting Yao,Huang commented on YUNIKORN-820:
-

Since this issue is urgent and we cant fix this UT problem now, So we create 
[YUNIKORN-821|https://issues.apache.org/jira/browse/YUNIKORN-821] to track that 
UT problem.

> Update SI dependency in the core repo
> -
>
> Key: YUNIKORN-820
> URL: https://issues.apache.org/jira/browse/YUNIKORN-820
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Need to update the core dep to the latest SI
> The recent gRPC, protobuf version changes introduce minor changes. There is a 
> UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in 
> a follow up issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes

2021-08-23 Thread Ting Yao,Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ting Yao,Huang updated YUNIKORN-821:

Description: 
After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing.

Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, So 
we create this issue to track that UT problem. c.c. [~wwei]

  was:
After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing.

Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, So 
we create this issue to fix that UT problem. c.c. [~wwei]


> UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
> ---
>
> Key: YUNIKORN-821
> URL: https://issues.apache.org/jira/browse/YUNIKORN-821
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Ting Yao,Huang
>Priority: Critical
> Fix For: 1.0.0
>
>
> After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing.
> Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, 
> So we create this issue to track that UT problem. c.c. [~wwei]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes

2021-08-23 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YUNIKORN-821:
-
Fix Version/s: 1.0.0

> UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
> ---
>
> Key: YUNIKORN-821
> URL: https://issues.apache.org/jira/browse/YUNIKORN-821
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Ting Yao,Huang
>Priority: Critical
> Fix For: 1.0.0
>
>
> After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing.
> Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, 
> So we create this issue to fix that UT problem. c.c. [~wwei]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes

2021-08-23 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YUNIKORN-821:
-
Priority: Critical  (was: Major)

> UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
> ---
>
> Key: YUNIKORN-821
> URL: https://issues.apache.org/jira/browse/YUNIKORN-821
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Ting Yao,Huang
>Priority: Critical
>
> After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing.
> Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, 
> So we create this issue to fix that UT problem. c.c. [~wwei]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-820) Update SI dependency in the core repo

2021-08-23 Thread Ting Yao,Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ting Yao,Huang resolved YUNIKORN-820.
-
Target Version: 1.0.0
Resolution: Fixed

> Update SI dependency in the core repo
> -
>
> Key: YUNIKORN-820
> URL: https://issues.apache.org/jira/browse/YUNIKORN-820
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Need to update the core dep to the latest SI
> The recent gRPC, protobuf version changes introduce minor changes. There is a 
> UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in 
> a follow up issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes

2021-08-23 Thread Ting Yao,Huang (Jira)
Ting Yao,Huang created YUNIKORN-821:
---

 Summary: UT:TestSIFromAlloc() is failing after gRPC and protobuf 
version changes
 Key: YUNIKORN-821
 URL: https://issues.apache.org/jira/browse/YUNIKORN-821
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - common
Reporter: Ting Yao,Huang


After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing.

Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, So 
we create this issue to fix that UT problem. c.c. [~wwei]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-819) Update go version dependency to 1.15

2021-08-23 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403405#comment-17403405
 ] 

Weiwei Yang commented on YUNIKORN-819:
--

hi [~chia7712], the major reason is that our current tests are running against 
1.15.
such as 
https://github.com/apache/incubator-yunikorn-k8shim/blob/25a71ba3fc646cbd96dff5b2882388eceb27bf9c/.github/workflows/main.yml#L15.
 It might potentially be a bigger change if we upgrade to 1.17. 

> Update go version dependency to 1.15
> 
>
> Key: YUNIKORN-819
> URL: https://issues.apache.org/jira/browse/YUNIKORN-819
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - cache, scheduler-interface, shim - kubernetes
>Reporter: Weiwei Yang
>Priority: Major
>  Labels: newbie
> Fix For: 1.0.0
>
>
> We need to update all go dependency to 1.15
> this is defined in the top level go mod file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-819) Update go version dependency to 1.15

2021-08-23 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403350#comment-17403350
 ] 

Chia-Ping Tsai commented on YUNIKORN-819:
-

just curious. Why not updating to go 1.17 (the latest version)?

 

> Update go version dependency to 1.15
> 
>
> Key: YUNIKORN-819
> URL: https://issues.apache.org/jira/browse/YUNIKORN-819
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - cache, scheduler-interface, shim - kubernetes
>Reporter: Weiwei Yang
>Priority: Major
>  Labels: newbie
> Fix For: 1.0.0
>
>
> We need to update all go dependency to 1.15
> this is defined in the top level go mod file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero

2021-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-813:

Labels: pull-request-available  (was: )

> The capacity of undefined resource should NOT be considered zero
> 
>
> Key: YUNIKORN-813
> URL: https://issues.apache.org/jira/browse/YUNIKORN-813
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>  Labels: pull-request-available
>
> {code}
>   resources:
> max:
>   memory: 1
> {code}
> If above configuration is added to a leaf queue, the queue can't run any 
> application since the "vcore" is assumed to be zero. That obstructs us from 
> limiting only a part of resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero

2021-08-23 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403288#comment-17403288
 ] 

Chia-Ping Tsai commented on YUNIKORN-813:
-

{quote}

If a resource is undefined I think it should be considered max value/unlimited

{quote}

After tracking the code again, it seems to me the "undefined resource" of child 
should reference the resource from parent. As root always have max resources of 
vcore/memory, the other queue can get the "valid" max resource from "the 
parent".

> The capacity of undefined resource should NOT be considered zero
> 
>
> Key: YUNIKORN-813
> URL: https://issues.apache.org/jira/browse/YUNIKORN-813
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>
> {code}
>   resources:
> max:
>   memory: 1
> {code}
> If above configuration is added to a leaf queue, the queue can't run any 
> application since the "vcore" is assumed to be zero. That obstructs us from 
> limiting only a part of resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-820) Update SI dependency in the core repo

2021-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-820:

Labels: pull-request-available  (was: )

> Update SI dependency in the core repo
> -
>
> Key: YUNIKORN-820
> URL: https://issues.apache.org/jira/browse/YUNIKORN-820
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Need to update the core dep to the latest SI
> The recent gRPC, protobuf version changes introduce minor changes. There is a 
> UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in 
> a follow up issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-820) Update SI dependency in the core repo

2021-08-23 Thread Weiwei Yang (Jira)
Weiwei Yang created YUNIKORN-820:


 Summary: Update SI dependency in the core repo
 Key: YUNIKORN-820
 URL: https://issues.apache.org/jira/browse/YUNIKORN-820
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - common
Reporter: Weiwei Yang
 Fix For: 1.0.0


Need to update the core dep to the latest SI
The recent gRPC, protobuf version changes introduce minor changes. There is a 
UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in a 
follow up issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Assigned] (YUNIKORN-820) Update SI dependency in the core repo

2021-08-23 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YUNIKORN-820:


Assignee: Weiwei Yang

> Update SI dependency in the core repo
> -
>
> Key: YUNIKORN-820
> URL: https://issues.apache.org/jira/browse/YUNIKORN-820
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Fix For: 1.0.0
>
>
> Need to update the core dep to the latest SI
> The recent gRPC, protobuf version changes introduce minor changes. There is a 
> UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in 
> a follow up issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-810) Add a field in scheduler-interface to represent required node for the scheduler

2021-08-23 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-810.
--
Resolution: Fixed

> Add a field in scheduler-interface to represent required node for the 
> scheduler
> ---
>
> Key: YUNIKORN-810
> URL: https://issues.apache.org/jira/browse/YUNIKORN-810
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Ting Yao,Huang
>Assignee: Ting Yao,Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-819) Update go version dependency to 1.15

2021-08-23 Thread Weiwei Yang (Jira)
Weiwei Yang created YUNIKORN-819:


 Summary: Update go version dependency to 1.15
 Key: YUNIKORN-819
 URL: https://issues.apache.org/jira/browse/YUNIKORN-819
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - cache, scheduler-interface, shim - kubernetes
Reporter: Weiwei Yang
 Fix For: 1.0.0


We need to update all go dependency to 1.15
this is defined in the top level go mod file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-815) Fix scheduler interface makefile issues

2021-08-23 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-815.
--
Resolution: Fixed

> Fix scheduler interface makefile issues
> ---
>
> Key: YUNIKORN-815
> URL: https://issues.apache.org/jira/browse/YUNIKORN-815
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: scheduler-interface
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> After YUNIKORN-760, we are seeing issues to update the scheduler-interface 
> dependency in other repos. This is because YUNIKORN-760 has removed the GRPC 
> generated code, but we have useful code in the core repo that still needs 
> them, such as 
> https://github.com/apache/incubator-yunikorn-core/tree/master/cmd. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Reopened] (YUNIKORN-810) Add a field in scheduler-interface to represent required node for the scheduler

2021-08-23 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened YUNIKORN-810:
--

> Add a field in scheduler-interface to represent required node for the 
> scheduler
> ---
>
> Key: YUNIKORN-810
> URL: https://issues.apache.org/jira/browse/YUNIKORN-810
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Ting Yao,Huang
>Assignee: Ting Yao,Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-738) Update helm index for v0.11

2021-08-23 Thread Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton resolved YUNIKORN-738.
---
Fix Version/s: 0.11
   Resolution: Fixed

> Update helm index for v0.11
> ---
>
> Key: YUNIKORN-738
> URL: https://issues.apache.org/jira/browse/YUNIKORN-738
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Tao Yang
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-738) Update helm index for v0.11

2021-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-738:

Labels: pull-request-available  (was: )

> Update helm index for v0.11
> ---
>
> Key: YUNIKORN-738
> URL: https://issues.apache.org/jira/browse/YUNIKORN-738
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Tao Yang
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-704) [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler

2021-08-23 Thread Chaoran Yu (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403234#comment-17403234
 ] 

Chaoran Yu commented on YUNIKORN-704:
-

[~Huang Ting Yao] No worries! Thanks for working on this feature!

> [Umbrella] Use the same mechanism to schedule daemon set pods as the default 
> scheduler
> --
>
> Key: YUNIKORN-704
> URL: https://issues.apache.org/jira/browse/YUNIKORN-704
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Chaoran Yu
>Assignee: Ting Yao,Huang
>Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: fluent-bit-describe.yaml, fluent-bit.yaml
>
>
> We sometimes see DaemonSet pods fail to be scheduled. Please see attached 
> files for the YAML and _kubectl describe_ output of one such pod. We 
> originally suspected [node 
> reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41]
>  was to blame. But even after setting the DISABLE_RESERVATION environment 
> variable to true, we still see such scheduling failures. The issue is 
> especially severe when K8s nodes have disk pressure that causes lots of pods 
> to be evicted. Newly created pods will stay in pending forever. We have to 
> temporarily uninstall YuniKorn and let the default scheduler do the 
> scheduling for these pods. 
> This issue is critical because lots of important pods belong to a DaemonSet, 
> such as Fluent Bit, a common logging solution. This is probably the last 
> remaining roadblock for us to have the confidence to have YuniKorn entirely 
> replace the default scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-807) Improve performance of node sorting

2021-08-23 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403200#comment-17403200
 ] 

Craig Condit commented on YUNIKORN-807:
---

[~sunilg], thanks for the feedback. I agree with using a followup Jira to 
address specific scheduler behavior – this one is explicitly about refactoring 
/ performance improvement.

I don't foresee hotspotting in nodeUpdated handling being a real problem. 
Updates should be relatively infrequent per node, and the locks are taken 
per-node.

> Improve performance of node sorting
> ---
>
> Key: YUNIKORN-807
> URL: https://issues.apache.org/jira/browse/YUNIKORN-807
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Attachments: Node Sorting Performance Improvement.pdf
>
>
> YuniKorn currently sorts all nodes on demand whenever scheduling of a 
> container occurs. This causes significant performance degradation as the 
> number of nodes increases.
> If we replace the on-demand sorting with a B-Tree sorted proactively, we can 
> improve performance considerably.
> This is a similar approach to YUNIKORN-21, but without the associated 
> behavioral changes.
> I've attached a design document with the details of the approach and the 
> performance improvement gained.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-704) [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler

2021-08-23 Thread Ting Yao,Huang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403143#comment-17403143
 ] 

Ting Yao,Huang commented on YUNIKORN-704:
-

Hi [~yuchaoran2011] [~chenya_zhang], Since we decided to implement in other 
way, so this issue might delay. please refer 
[Yunikorn-810|https://issues.apache.org/jira/browse/YUNIKORN-810] 
[Yunikorn-811|https://issues.apache.org/jira/browse/YUNIKORN-811] 
[Yunikorn-812|https://issues.apache.org/jira/browse/YUNIKORN-812].

Sorry for the delay.

> [Umbrella] Use the same mechanism to schedule daemon set pods as the default 
> scheduler
> --
>
> Key: YUNIKORN-704
> URL: https://issues.apache.org/jira/browse/YUNIKORN-704
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Chaoran Yu
>Assignee: Ting Yao,Huang
>Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: fluent-bit-describe.yaml, fluent-bit.yaml
>
>
> We sometimes see DaemonSet pods fail to be scheduled. Please see attached 
> files for the YAML and _kubectl describe_ output of one such pod. We 
> originally suspected [node 
> reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41]
>  was to blame. But even after setting the DISABLE_RESERVATION environment 
> variable to true, we still see such scheduling failures. The issue is 
> especially severe when K8s nodes have disk pressure that causes lots of pods 
> to be evicted. Newly created pods will stay in pending forever. We have to 
> temporarily uninstall YuniKorn and let the default scheduler do the 
> scheduling for these pods. 
> This issue is critical because lots of important pods belong to a DaemonSet, 
> such as Fluent Bit, a common logging solution. This is probably the last 
> remaining roadblock for us to have the confidence to have YuniKorn entirely 
> replace the default scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero

2021-08-23 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402999#comment-17402999
 ] 

Chia-Ping Tsai commented on YUNIKORN-813:
-

[~kmarton] thanks for your feedback.

{quote}
If a resource is undefined I think it should be considered max value/unlimited
{quote}
We are on the same page :)

> The capacity of undefined resource should NOT be considered zero
> 
>
> Key: YUNIKORN-813
> URL: https://issues.apache.org/jira/browse/YUNIKORN-813
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>
> {code}
>   resources:
> max:
>   memory: 1
> {code}
> If above configuration is added to a leaf queue, the queue can't run any 
> application since the "vcore" is assumed to be zero. That obstructs us from 
> limiting only a part of resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero

2021-08-23 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402981#comment-17402981
 ] 

Kinga Marton commented on YUNIKORN-813:
---

If a resource is undefined I think it should be considered max value/unlimited. 
We are already using this approach in a couple of places in the code such as i 
queue quota check.

> The capacity of undefined resource should NOT be considered zero
> 
>
> Key: YUNIKORN-813
> URL: https://issues.apache.org/jira/browse/YUNIKORN-813
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>
> {code}
>   resources:
> max:
>   memory: 1
> {code}
> If above configuration is added to a leaf queue, the queue can't run any 
> application since the "vcore" is assumed to be zero. That obstructs us from 
> limiting only a part of resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org