[ 
https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-1121:
--------------------------------------------
    Description: 
Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…

Looking through the changes I saw files being changed that I thought would not 
require any changes. I ran the tests and they failed without the change. I was 
wondering why we were seeing those failures. I ran the tests in the debugger 
without the changes that I thought were unneeded and saw weird things.
The problem is here:
{code:java}
func (fc *MockScheduler) addTask(appID string, taskID string, ask 
*si.Resource){code}
Is hopelessly broken. The ask that gets passed in is completely ignored. That 
means every task that was created always was interpreted as a 
{{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now that 
we fixed things it gets set to 1,000,000 or the real 1M.
The breakage is triggered by the function in the resource code which does the 
right thing:
{code:java}
func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
In the old setup as long as the memory for best effort (i.e. 1) was smaller 
than the resource set for the task things would just pass without an issue. 
Since 1 was the smallest possible it would always work. Accounting on nodes etc 
was most likely way off but none of these tests checked that anyway.

This causes *all* tests that use resources within a Task using the mock 
scheduler to not test the real thing, not even close.
It also hinders us from testing failure cases. We can never create a task that 
does not fit on a node as an example unless the node is full.

  was:
Reviewing YUNIKORN-1105 I found another bug in the mock scheduler…
Looking through the changes I saw files being changed that I thought would not 
require any changes. I ran the tests and they failed without the change. I was 
wondering why we were seeing those failures. I ran the tests in the debugger 
without the changes that I thought were unneeded and saw weird things.
The problem is here:
func (fc *MockScheduler) addTask(appID string, taskID string, ask *si.Resource)
Is hopelessly broken. The ask that gets passed in is completely ignored. That 
means every task that was created always was interpreted as a 
{{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now that 
we fixed things it gets set to 1,000,000 or the real 1M.
The breakage is triggered by the function in the resource code which does the 
right thing:
func GetPodResource(pod *v1.Pod) (resource *si.Resource)
In the old setup as long as the memory for best effort (i.e. 1) was smaller 
than the resource set for the task things would just pass without an issue. 
Since 1 was the smallest possible it would always work. Accounting on nodes etc 
was most likely way off but none of these tests checked that anyway.
This causes *all* tests that use resources within a Task using the mock 
scheduler to not test the real thing, not even close.
It also hinders us from testing failure cases. We can never create a task that 
does not fit on a node as an example unless the node is full.


> MockScheduler addTask ignores resource settings
> -----------------------------------------------
>
>                 Key: YUNIKORN-1121
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1121
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Priority: Major
>              Labels: newbie
>
> Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…
> Looking through the changes I saw files being changed that I thought would 
> not require any changes. I ran the tests and they failed without the change. 
> I was wondering why we were seeing those failures. I ran the tests in the 
> debugger without the changes that I thought were unneeded and saw weird 
> things.
> The problem is here:
> {code:java}
> func (fc *MockScheduler) addTask(appID string, taskID string, ask 
> *si.Resource){code}
> Is hopelessly broken. The ask that gets passed in is completely ignored. That 
> means every task that was created always was interpreted as a 
> {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now 
> that we fixed things it gets set to 1,000,000 or the real 1M.
> The breakage is triggered by the function in the resource code which does the 
> right thing:
> {code:java}
> func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
> In the old setup as long as the memory for best effort (i.e. 1) was smaller 
> than the resource set for the task things would just pass without an issue. 
> Since 1 was the smallest possible it would always work. Accounting on nodes 
> etc was most likely way off but none of these tests checked that anyway.
> This causes *all* tests that use resources within a Task using the mock 
> scheduler to not test the real thing, not even close.
> It also hinders us from testing failure cases. We can never create a task 
> that does not fit on a node as an example unless the node is full.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to